Getting started with Scala using SBT

One of my biggest gripes with Java (and all the languages that run on the JVM) is getting my project setup and building it. Maven is not my favorite, and ant..well..I don’t like it either. Fortunately, if you want to start a new project in Scala, there is a great build tool available that takes a lot of the pain out of project management and building – SBT, simple-build-tool.

sbt is a simple build tool for Scala projects that aims to do the basics well. It requires Java 1.5 or later.

Installing SBT

I’m using Mac OS X, but the following instructions should be pretty much the same on any Unix based OS.

You can find the latest version of SBT here.

cd ~
wget http://simple-build-tool.googlecode.com/files/sbt-launcher-0.5.6.jar
sudo mv sbt-launcher-0.5.6.jar /usr/local/bin/sbt-launcher.jar
echo "java -Xmx512M -jar /usr/local/bin/sbt-launcher.jar \"\$@\"" | sudo tee /usr/local/bin/sbt
sudo chmod +x /usr/local/bin/sbt

This will install the SBT jar and create a script called sbt that will allow you to run the sbt jar.

Just type sbt and press enter, and you know have access to sbt.

$ sbt
Project does not exist, create new project? (y/N/s) : n

Creating a new Scala project

Now we will create a Hello World Scala project with SBT.

mkdir hello_scala
cd hello_scala

Running the sbt command in a directory where there is no project will prompt you to create one.

sbt
Project does not exist, create new project? (y/N/s) : y
Name: Hello, Scala!
Organization []: 
Version [1.0]: 
Scala version [2.7.7]: 
sbt version [0.5.6]: 
:: retrieving :: sbt#boot
	confs: [default]
	2 artifacts copied, 0 already retrieved (9911kB/72ms)
:: retrieving :: sbt#boot
	confs: [default]
	3 artifacts copied, 0 already retrieved (3409kB/15ms)
[success] Successfully initialized directory structure.
[info] Building project Hello, Scala! 1.0 using sbt.DefaultProject
[info]    with sbt 0.5.6 and Scala 2.7.7
[success] Build completed successfully.
[info] 
[info] Total build time: 0 s

Awesome. It handles all the Scala dependencies for us! Now let’s create a file that contains our Hello, Scala example.

Below is the directory structure of an SBT project.

$ ls
lib	project	src

Creating our HelloScala sources and running

Now we are going to create our main file, HelloScala.scala

src/main/scala/HelloScala.scala

object HelloScala {
  def main(args: Array[String]) {
    println("Hello, Scala!")
  }
}

And now we can build and run it by just issuing the following:

sbt run

And the output:

[info] Building project Hello, Scala! 1.0 using sbt.DefaultProject
[info]    with sbt 0.5.6 and Scala 2.7.7
[info] 
[info] == compile ==
[info]   Source analysis: 0 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Nothing to compile.
[info]   Post-analysis: 2 classes.
[info] == compile ==
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == run ==
[info] Running HelloScala ...
Hello, Scala!
[info] == run ==
[success] Successful.
[info] 
[info] Total time: 0 s
[success] Build completed successfully.
[info] 
[info] Total build time: 1 s

And that’s it. Setting up a new Scala project with SBT is painless. In the next part I will talk about managing dependencies and how SBT makes this also very easy.

You can read a lot more about SBT by checking out their wiki.

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

AjaxTask – a rails plugin for managing background tasks

SOAP, Background Tasks, and AJAX

Recently in Rails I’ve been interacting with various SOAP services and running them in the background with Workling. I needed to relay the SOAP response to the client’s web browser, so I decided to use AJAX to poll the status of my background tasks.

This is great if you have < 30 second background tasks running, but don't want to block a user (and a request).

The Solution

I created a Rails plugin, called AjaxTask, that has two components:

  • Methods to use in your controller to define a task handler and create tasks
  • Javascript library to manage the AJAX between the browser and the handler.

GitHub Link: http://github.com/chrismoos/ajaxtask

In a nutshell, the client initiates a task, the handler responds with a task ID, and the client polls at a user defined interval until the task has finished, or has an error.

The plugin takes the pain out of implementing the handler, as well as the Javascript. All you have to do is run code for your task, and periodically update the status.

I am using Workling to run my background tasks, as well as maintain the status using Workling’s return store.

Okay, enough with the intro, here is the example.

Example

Controller/Routes

The first thing to do is define the handler. This instructs the AjaxTask plugin to create a handler that will respond to ajax requests, as well as dispatch to your actual tasks. The only parameter to ajaxtask_handler is a symbol, which MUST be identical to a named route. This is how a URL gets from Rails to the plugin.

routes.rb:

map.ajaxtask_demo '/ajaxtask/handler/:task', :controller => :demo, :action => :ajaxtask_demo

demo_controller.rb:

ajaxtask_handler :ajaxtask_demo

Now we will define a task:

demo_controller.rb:

ajaxtask :mytask

This tells the plugin to respond to a task named mytask.

By doing this, we must implement two methods in our controller.

mytask_start is called when a browser starts a new task. You should probably fire off your background task in this method.

mytask_start should return a unique ID for the task. By using Workling and calling .async, a unique ID is returned.

def mytask_start
    return MyWorklingWorker.async_mytask
end
 
def mytask_status(uid)
    Workling.return.get(uid)
end

The Worker

The worker is the meat of our background task. In this we will do something that might take a while, and also update the status.

my_workling_worker.rb:

class MyWorklingWorker < Workling::Base
  def mytask(options)
    Workling.return.set(options[:uid], {:pending => 'i am just starting...wait up!'})
    begin
      # your long running task goes here
			sleep 10
    rescue => e
      Workling.return.set(options[:uid], {:error => e.to_s})
      return
    end
    Workling.return.set(options[:uid], {:done => 'i finished!})
  end
end

The important things to note here are what we set the return to. AjaxTask recognizes the following:

  • :error
  • :pending
  • :done

They should be pretty self explanatory. Now let’s see what the client side looks like.

The Client

For the client, we will be interacting with the AjaxTask javascript library. Make sure you copy the ajaxtask.js file to your javascripts directory, and include it in your page. The following will copy the javascript for you:

cd vendor/plugins/ajaxtask
rake ajax_task_js

Here is an example of what an HTML page that uses AjaxTask:

<html>
<head>
	<%= javascript_include_tag 'ajaxtask.js' %>
</head>
<body>
<script>
function mytaskHandler() {
	$(this).bind('onTaskError', onTaskError);
	$(this).bind('onTaskFinished', onTaskFinished);
	$(this).bind('onTaskPending', onTaskPending);
 
	function onTaskError(event, error) {
		alert('error: ' + error);
	}
 
	function onTaskPending(event, data) {
		alert("pending: " + data);
	}
 
	function onTaskFinished(event, data) {
		alert("finished: " + data);
	}
}
 
$(document).ready(function() {	
	var myTask = new AjaxTask({
		url: "<%= ajaxtask_demo_url :task => :mytask %>",
		handler: new mytaskHandler(),
		taskStatusDiv: $("#taskStatus"),
		taskStatusLoadingMsg: 'Please wait while my task runs...',
		taskStatusLoadingImg: '/images/smallactivity.gif',
		taskStatusErrorMsg: 'Oops...something bad happened.'
	});
	myTask.start();
});
</script>
<div id="taskStatus"></div>
 
</body>
</html>

Looking at the above client code, you can see how easy it is to present a background task’s processing to a user.

That does it for now, I’ll try to document and post more soon abou AjaxTask.

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

MySQL and partitioning tables with millions of rows

The Problem

I’ve been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user’s phone sends its location to the server and it is stored in a MySQL database. Each “location” entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

  • RANGE – rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
  • HASH – hashes a column and depending on the result of the hash, has a different partition
  • LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user’s location.

Getting a user’s current location

Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn’t aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had “id” in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

ALTER TABLE location DROP PRIMARY KEY

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

ALTER TABLE location 
ADD PRIMARY KEY (id, device_id)
partition BY HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let’s try a simple select with device_id in the where clause.

mysql> explain partitions select * from location where device_id = 1;
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
1 rows in set (0.14 sec)
 
mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let’s try another query, that won’t use our partitioning column.

 
mysql> explain partitions select * from location where date_added > '2009-10-10';
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
1 rows in set (1.81 sec)
 
mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn’t use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
1 rows in set (0.11 sec)
 
mysql>

That’s better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn’t be happier.

If you want to read more about MySQL partitioning, check out the manual.

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

Why I didn’t like Java 5 years ago, and why I don’t like it now

Then

I started out programming in C, which taught me a lot about the fundamentals of computer science. I learned about types, memory management, functions, and logic. As I began to evaluate other programming languages to try out, I of course ended up trying Java. My first impression of it was how heavy it felt. Of course, this was when 4GB of memory wasn’t standard in a desktop, and memory allocations and processing power — were still relatively precious.

I remember trying out Swing and that only made me more disgusted with Java, as a Swing application felt horribly slow.

Java users don’t have to worry about memory management (technically), as the garbage collection system takes care of it for the user. I think this was a huge benefit for novice developers, because dealing with memory management definitely isn’t fun — and usually presents issues if not done properly.

The next thing I tried in Java was creating a web application. I bought a book on J2EE and as I began learning the ins and outs, I began to hate it with a passion. The amount of configuration and boiler plate code to get something simple up and running, was a huge turn off to me. I was disgusted with the concept of EJBs and all the various patterns in J2EE.

After a short while, J2EE was gone with the wind for me. I moved on to scripting languages, such as the notoriously shitty PHP, which was still in my opinion, more practical than Java…but I wouldn’t settle on a good web framework and language until Python and Ruby really caught my eye.

Now

Flash forward to today…and I’m still not liking Java.

It is still plagued with lots of configuration, descriptors, assembly, and boiler plate code. And now that memory is relatively cheap and available, Java still eats it like a fat boy eating at McDonalds. PermGen errors, anyone? The JVM has moved forward a lot in the past years, but its still a memory hog, and I feel like it abstracts so much low level coding that developers tend to not pay attention to the performance of a system — just throw more hardware at the JVM.

My productivity in Java is much lower than most other languages — even C. When building enterprise software in Java the complexity of getting it setup and going seems like too much at times.

DAO’s, interfaces, implementations, proxies — its just boring to me.

Ever used a BlackBerry? It feels slow to me…and I bet if it was coded in C it would be a lot snappier…same goes for Android. I’m currently using an iPhone now and it definitely feels the most repsonsive out of all 3.

What do you think about Java? Any recommendations on feeling more productive and not slowed down?

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

libactor now at google code

libactor is now available on google code. Check it out: http://code.google.com/p/libactor/

If you have any problems or ideas, post them there!

Chris

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit