MySQL and partitioning tables with millions of rows

The Problem

I've been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user's phone sends its location to the server and it is stored in a MySQL database. Each "location" entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

RANGE - rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
HASH - hashes a column and depending on the result of the hash, has a different partition
LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user's location.

Getting a user’s current location

Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn't aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had "id" in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

alter table location drop primary key

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

alter table location 
add primary key (id, device_id)
partition by HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let's try a simple select with device_id in the where clause.

mysql> explain partitions select * from location where device_id = 1;
+----|-------------|----------|------------|------|---------------|-----------|---------|-------|------|-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----|-------------|----------|------------|------|---------------|-----------|---------|-------|------|-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----|-------------|----------|------------|------|---------------|-----------|---------|-------|------|-------+
1 rows in set (0.14 sec)

mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let's try another query, that won't use our partitioning column.

mysql> explain partitions select * from location where date_added > '2009-10-10';
+----|-------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|---------------|------|---------|------|---------|-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----|-------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|---------------|------|---------|------|---------|-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----|-------------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|---------------|------|---------|------|---------|-------------+
1 rows in set (1.81 sec)

mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn't use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----|-------------|----------|------------|------|----------------------|-----------|---------|-------|------|-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----|-------------|----------|------------|------|----------------------|-----------|---------|-------|------|-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----|-------------|----------|------------|------|----------------------|-----------|---------|-------|------|-------------+
1 rows in set (0.11 sec)

mysql>

That's better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn't be happier.

If you want to read more about MySQL partitioning, check out the manual.