MySQL and partitioning tables with millions of rows

The Problem

I’ve been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user’s phone sends its location to the server and it is stored in a MySQL database. Each “location” entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

  • RANGE – rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
  • HASH – hashes a column and depending on the result of the hash, has a different partition
  • LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user’s location.

Getting a user’s current location

Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn’t aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had “id” in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

ALTER TABLE location DROP PRIMARY KEY

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

ALTER TABLE location 
ADD PRIMARY KEY (id, device_id)
partition BY HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let’s try a simple select with device_id in the where clause.

mysql> explain partitions select * from location where device_id = 1;
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
1 rows in set (0.14 sec)
 
mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let’s try another query, that won’t use our partitioning column.

 
mysql> explain partitions select * from location where date_added > '2009-10-10';
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
1 rows in set (1.81 sec)
 
mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn’t use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
1 rows in set (0.11 sec)
 
mysql>

That’s better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn’t be happier.

If you want to read more about MySQL partitioning, check out the manual.

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

Why I didn’t like Java 5 years ago, and why I don’t like it now

Then

I started out programming in C, which taught me a lot about the fundamentals of computer science. I learned about types, memory management, functions, and logic. As I began to evaluate other programming languages to try out, I of course ended up trying Java. My first impression of it was how heavy it felt. Of course, this was when 4GB of memory wasn’t standard in a desktop, and memory allocations and processing power — were still relatively precious.

I remember trying out Swing and that only made me more disgusted with Java, as a Swing application felt horribly slow.

Java users don’t have to worry about memory management (technically), as the garbage collection system takes care of it for the user. I think this was a huge benefit for novice developers, because dealing with memory management definitely isn’t fun — and usually presents issues if not done properly.

The next thing I tried in Java was creating a web application. I bought a book on J2EE and as I began learning the ins and outs, I began to hate it with a passion. The amount of configuration and boiler plate code to get something simple up and running, was a huge turn off to me. I was disgusted with the concept of EJBs and all the various patterns in J2EE.

After a short while, J2EE was gone with the wind for me. I moved on to scripting languages, such as the notoriously shitty PHP, which was still in my opinion, more practical than Java…but I wouldn’t settle on a good web framework and language until Python and Ruby really caught my eye.

Now

Flash forward to today…and I’m still not liking Java.

It is still plagued with lots of configuration, descriptors, assembly, and boiler plate code. And now that memory is relatively cheap and available, Java still eats it like a fat boy eating at McDonalds. PermGen errors, anyone? The JVM has moved forward a lot in the past years, but its still a memory hog, and I feel like it abstracts so much low level coding that developers tend to not pay attention to the performance of a system — just throw more hardware at the JVM.

My productivity in Java is much lower than most other languages — even C. When building enterprise software in Java the complexity of getting it setup and going seems like too much at times.

DAO’s, interfaces, implementations, proxies — its just boring to me.

Ever used a BlackBerry? It feels slow to me…and I bet if it was coded in C it would be a lot snappier…same goes for Android. I’m currently using an iPhone now and it definitely feels the most repsonsive out of all 3.

What do you think about Java? Any recommendations on feeling more productive and not slowed down?

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

libactor now at google code

libactor is now available on google code. Check it out: http://code.google.com/p/libactor/

If you have any problems or ideas, post them there!

Chris

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

libactor – a C library based on the Actor model

Recently I’ve been really interested in functional, concurrent programming languages, such as Erlang. This prompted me to have some fun in C, and try to implement a simple library that is based on the Actor Model.

Right now it is usable, although it may not be ready for production. It uses pthreads and the library handles all of the threading issues, so you don’t have to worry about any of it at all.

In a future version I plan to add more sandboxing to the actors, so that when one actor crashes, they don’t all go down!

Right now it supports the following:

  • Spawn actor
  • Sending messages
  • Broadcast message
  • Actor memory management convenience functions(when an actor dies, the memory is freed!)

Here is what a simple ping/pong example looks like:

#include <stdio.h>
#include <libactor/actor.h>
 
void *pong_func(void *args) {
        actor_msg_t *msg;
 
        while(1) {
                msg = actor_receive();
                if(msg->type == PING_MSG) {
                        printf("PING! ");
                        actor_reply_msg(msg, PONG_MSG, NULL, 0);
                }
                arelease(msg);
        }
}
 
void *ping_func(void *args) {
        actor_msg_t *msg;
        actor_id aid = spawn_actor(pong_func, NULL);
        while(1) {
                actor_send_msg(aid, PING_MSG, NULL, 0);
                msg = actor_receive();
                if(msg->type == PONG_MSG) printf("PONG!\n");
                arelease(msg);
                sleep(5);
        }
}
 
 
void *main_func(void *args) {
        struct actor_main *main = (struct actor_main*)args;
        int x;
 
        /* Accessing the arguments passed to the application */
        printf("Number of arguments: %d\n", main->argc);
        for(x = 0; x < main->argc; x++) printf("Argument: %s\n", main->argv[x]);
 
        /* PING/PONG example */
        spawn_actor(ping_func, NULL);
}
 
DECLARE_ACTOR_MAIN(main_func)

There is a more detailed example in the source distribution, located in the examples/ directory.

You can get the source here. Please, let me know what you think about it, it was really fun to write!

Documentation is available here: http://chrismoos.com/libactordocs/

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit

Colloquy for iPhone — Push Notification Server

Colloquy is a great IRC client for Mac, as well as the iPhone. Currently, to run Colloquy on the iPhone you must purchase the mobile version from the App Store. The other option is to download the source and build it yourself, but you must be a registered iPhone developer to run it on an actual iPhone. Also, if you intend to use push notifications (via the Colloquy ZNC plugin), you must have a push certificate as well.

Getting Started

I am going to assume that you have znc installed, and you know how to use it. Next, we need to install the Colloquy ZNC module, so that Colloquy can communicate with ZNC (and allow us to have push functionality). See this page for complete instructions.

The module for ZNC accepts commands from the Colloquy client, notably information on push notifications(device token, etc,.).

Modifying the Colloquy Mobile Source

Next, we need to download and build the source. Once you have downloaded the source and loaded up the Mobile Colloquy project in Xcode, we need to make a few modifications.

Certificates

To run any application on an actual iPhone device, it must be signed. Login to your Developer portal on Apple’s site and generate a certificate (and make sure to enable push). Download the certificate and install it. Next, assign that certificate to the Colloquy Mobile Project. Project…Edit Active Target. Make sure that under the Properties tab, you have the same Identifier name that you used when generating the certificate. Go to Build, and then Code Signing Identity and select the certificate.

Modifying the source

Next, find the CQConnectionsController.m file (Controllers/Connections) and change the following line from:

[self sendRawMessage:@"PUSH service colloquy.mobi 7906"];

and change it to:

[self sendRawMessage:@"PUSH service 127.0.0.1 7906"];

This will make the client tell the Colloquy ZNC module to send push notifications to our custom server.

Build the project and put it onto your device.

Push Server

I wrote a simple python script that waits on port 7906, receives a connection from the Colloquy ZNC module, and then pushes a notification to your phone (via Apple’s Push Service).

The script requires your push certificate and private key file so it can authenticate to Apple’s push service.

Export the push certificate and private key as a .p12 from Keychain Access. After that, we have to convert it to PEM format:

openssl pkcs12 -in key.p12 -out certs.pem -nodes

where key.p12 is the file exported from Keychain access.

After that, you may need to put that certs.pem file on your server that you are running ZNC on.

Installing the Colloquy Push Server

Run the following to install the Colloquy Push Server:

sudo easy_install http://www.tech9computers.com/ColloquyPushServer-0.1.tar.gz

Next, run the command colloquypushsrv.py to make sure installation was successful.

The usage looks like this:

usage: /usr/local/bin/colloquypushsrv.py [-p listen_port] [--sandbox] [-b] certs_file
	Command Summary:
		certs_file	        PEM encoded Apple Push certificate and private key file(absolute path)
		-p listen_port	Default Listen Port: 7906	
		--sandbox 	Connect to Sandbox Service
		-b		        Run in background

Now, let’s start the push server in the foreground. Make sure to specify the full path to the pem file we created.

colloquypushsrv.py ~/certs.pem

Conclusion

Finally, connect to your ZNC server with the custom built Colloquy Mobile. Your iPhone should register with the ZNC module. After, exit the Colloquy Mobile application and pull up Colloquy (or any other IRC client) on your desktop. Send yourself a message, and see if you get the push notification!

I realize this process may not be completely clear, so if you have any questions, I would be happy to walk you through the process.

Also, I just wrote this up today, so it may not be production ready :) .

Enjoy!

Share this post:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit