Posts tagged ‘mysql’

MySQL and partitioning tables with millions of rows

The Problem

I’ve been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user’s phone sends its location to the server and it is stored in a MySQL database. Each “location” entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

  • RANGE – rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
  • HASH – hashes a column and depending on the result of the hash, has a different partition
  • LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user’s location.

Getting a user’s current location

Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn’t aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had “id” in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

ALTER TABLE location DROP PRIMARY KEY

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

ALTER TABLE location 
ADD PRIMARY KEY (id, device_id)
partition BY HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let’s try a simple select with device_id in the where clause.

mysql> explain partitions select * from location where device_id = 1;
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
1 rows in set (0.14 sec)
 
mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let’s try another query, that won’t use our partitioning column.

 
mysql> explain partitions select * from location where date_added > '2009-10-10';
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
1 rows in set (1.81 sec)
 
mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn’t use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
1 rows in set (0.11 sec)
 
mysql>

That’s better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn’t be happier.

If you want to read more about MySQL partitioning, check out the manual.

MoosTrax will return.

Yes, I recently made a post saying that MoosTrax will be going away forever. Due to all the comments and feedback, it is obvious to me that there are many people that enjoyed the service — so I’m going to bring it back.

I’m moving my database backend from CouchDB to MySQL, so my issues with CouchDB won’t prevent me from providing MoosTrax. I have been working a lot on migrating the site and fixing it up…so I really hope to have it up in the next week or so.

In the mean time, I’m also developing an Android client, so that those of you with a T-Mobile G1(myself included) will be able to use MoosTrax. Screenshot below.

After the site is up and running and the Android client is stable, I want to make an iPhone client the next priority. Also, if there are enough people that want a Windows Mobile client, I would consider that as well. Please e-mail moostraxsupport@gmail.com if you are interested in a Windows Mobile client.