Archive for the ‘moostrax’ Category.

MySQL and partitioning tables with millions of rows

The Problem

I’ve been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user’s phone sends its location to the server and it is stored in a MySQL database. Each “location” entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

  • RANGE – rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
  • HASH – hashes a column and depending on the result of the hash, has a different partition
  • LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user’s location.

Getting a user’s current location

Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn’t aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had “id” in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

ALTER TABLE location DROP PRIMARY KEY

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

ALTER TABLE location 
ADD PRIMARY KEY (id, device_id)
partition BY HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let’s try a simple select with device_id in the where clause.

mysql> explain partitions select * from location where device_id = 1;
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
1 rows in set (0.14 sec)
 
mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let’s try another query, that won’t use our partitioning column.

 
mysql> explain partitions select * from location where date_added > '2009-10-10';
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
1 rows in set (1.81 sec)
 
mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn’t use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
1 rows in set (0.11 sec)
 
mysql>

That’s better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn’t be happier.

If you want to read more about MySQL partitioning, check out the manual.

MoosTrax for iPhone – Looking for testers

I just made a very basic MoosTrax location updater for the iPhone. Currently it just sends your location to MoosTrax when you press the Update Now button.

I will be adding push notifications soon so that when someone pulls up the Live Tracking page on the MoosTrax website, you will receive a notification on your iPhone requesting you to update your location. That is really all I can do for now, as background applications are still not permitted.

If anyone wants to help me test this first version, please email moostraxsupport@gmail.com.

Screenshots:


Palm Pre Support for MoosTrax

I am currently working on adding MoosTrax for the Palm Pre. Is there anyone out there that is interested in this? Let me know!

Sziasztok Magyarok!

Három hét és ottleszek Magyorarszágon. Munkát keresek szamitástechnikai területen (szoftverfejlesztö, programazó).

Email címem: chris@tech9computers.com

MoosTrax, iPhone, and Push Notifications

I recently traded my T-Mobile G1, with Android, for an iPhone 3G. I am hoping to get MoosTrax running on the iPhone, but unfortunately, even with push notifications, it might not be the same as the BlackBerry and Android versions. Here are some reasons why.

No automatic location updates

The iPhone OS does not allow applications to run in the background, which means MoosTrax cannot check your location and send it to the server at a fixed interval.

Push Notifications aren’t the whole solution

MoosTrax could send a push notification to your phone, telling it to send a location update to the server, but unfortunately it requires user interaction. It wouldn’t be too convenient to have your phone vibrating every few minutes asking you to update your location.

Manual updates

Initially, MoosTrax for iPhone will have support for sending location updates to the server manually when you open the application.

Stay tuned

Hopefully, within time, Apple will make the push notification functionality more advanced so that MoosTrax can be as powerful as it is on other platforms, such as Android and BlackBerry. For now, stay tuned for the iPhone application. I am waiting to be approved in the iPhone developer program and then you should see the application in the iTunes store soon.