MySQL and partitioning tables with millions of rows

The Problem

I’ve been running a mobile GPS tracking service, MoosTrax (formerly BlackBerry Tracker), for a few years and have encountered a large amount of data in the process.

A user’s phone sends its location to the server and it is stored in a MySQL database. Each “location” entry is stored as a single row in a table.

Right now there are approximately 12 million rows in the location table, and things are getting slow now, as a full table scan can take ~3-4 minutes on my limited hardware. This means that if a user is pulling a location from history it could potentially block all other users (as the table is locked) access to the site until the query is complete.

Partitioning

Partitioning allows you to store parts of your table in their own logical space. With partitioning, you want to divide up your rows based on how you access them. If you partition your rows and you are still hitting all the partitions, it does you no good. The goal is that when you query, you will only have to look at a subset of the data to get a result, and not the whole table.

There are various ways in MySQL to partition a database, such as:

  • RANGE – rows are partitioned based on the range of a column (i.e date, 2006-2007, 2007-20008, etc,.)
  • HASH – hashes a column and depending on the result of the hash, has a different partition
  • LIST, KEY

Choosing the partition type is important, so I looked at how my application looks up a user’s location.

Getting a user’s current location

1
Location.find(:all, :conditions => {:device_id => @device.id}, :order => "date_added desc", :limit => 6)

Getting a users’s location history

1
Location.find(:all, :conditions => {:date_added => startdate.utc..enddate.utc, :device_id => @device.id}, :order => "date_added desc", :limit => 500)

At first, I thought about RANGE partitioning by date, and while I am using the date in my queries, it is very common for a query to have a very large date range, and that means it could easily span all partitions.

After a second look, it seemed that device_id might be the best, using the HASH partitioning type.

This means that all the locations would be partitioned equally by their device_id. This is great because MoosTrax is only looking at one device at a time, history or live tracking, and doesn’t aggregate the locations across devices or users.

Preparing to partition

First, to partition a table the column you want to partition by must be part of the primary key. I only had “id” in my primary key, so I modified it to include my partitioning column, device_id.

Drop the Primary Key

1
ALTER TABLE location DROP PRIMARY KEY

Partition the table

Now we are going to add our new primary key, and tell MySQL to partition, with HASH, by device_id. We also specify the option, partitions, to tell MySQL how many partitions we want it to use. I believe the limit is 1024.

1
2
3
4
ALTER TABLE location
ADD PRIMARY KEY (id, device_id)
partition BY HASH(device_id)
partitions 200

FYI: Running the above may take a while depending on the size of your table.

Does it work?

MySQL has a command that we can run, explain partitions, that will let us specify a query, and MySQL will tell us if and how it is using partitioning to get the result.

Because we partitioned by device_id, let’s try a simple select with device_id in the where clause.

1
2
3
4
5
6
7
8
9
mysql> explain partitions select * from location where device_id = 1;
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| id | select_type | table    | partitions | type | possible_keys | key       | key_len | ref   | rows | Extra |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE      | location | p1         | ref  | device_id     | device_id | 4       | const | 1    |       |
+----+-------------+----------+------------+------+---------------+-----------+---------+-------+------+-------+
1 rows in set (0.14 sec)

mysql>

If you look at the result of the explain, you can see that MySQL only needs to use partition p1 to find our result..this is great! There are way less rows in the partition than in the whole table.

Now let’s try another query, that won’t use our partitioning column.

1
2
3
4
5
6
7
8
9
mysql> explain partitions select * from location where date_added > '2009-10-10';
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table    | partitions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
| 1  | SIMPLE      | location | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30,p31,p32,p33,p34,p35,p36,p37,p38,p39,p40,p41,p42,p43,p44,p45,p46,p47,p48,p49,p50,p51,p52,p53,p54,p55,p56,p57,p58,p59,p60,p61,p62,p63,p64,p65,p66,p67,p68,p69,p70,p71,p72,p73,p74,p75,p76,p77,p78,p79,p80,p81,p82,p83,p84,p85,p86,p87,p88,p89,p90,p91,p92,p93,p94,p95,p96,p97,p98,p99,p100,p101,p102,p103,p104,p105,p106,p107,p108,p109,p110,p111,p112,p113,p114,p115,p116,p117,p118,p119,p120,p121,p122,p123,p124,p125,p126,p127,p128,p129,p130,p131,p132,p133,p134,p135,p136,p137,p138,p139,p140,p141,p142,p143,p144,p145,p146,p147,p148,p149,p150,p151,p152,p153,p154,p155,p156,p157,p158,p159,p160,p161,p162,p163,p164,p165,p166,p167,p168,p169,p170,p171,p172,p173,p174,p175,p176,p177,p178,p179,p180,p181,p182,p183,p184,p185,p186,p187,p188,p189,p190,p191,p192,p193,p194,p195,p196,p197,p198,p199 | ALL  | date_added    | NULL | NULL    | NULL | 12641367 | Using where |
+----+-------------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+
1 rows in set (1.81 sec)

mysql>

As you can see, MySQL would need to go through all 200 partitions to get the result. Fortunately, MoosTrax doesn’t use a query like that, as the device_id is always available. Therefore, if I am searching by date, I will also specify the device_id as well, so that MySQL will use the partition.

1
2
3
4
5
6
7
8
9
mysql> explain partitions select * from location where date_added > '2009-10-10' and device_id = 1;
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| id | select_type | table    | partitions | type | possible_keys        | key       | key_len | ref   | rows | Extra       |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
| 1  | SIMPLE      | location | p1         | ref  | device_id,date_added | device_id | 4       | const | 1    | Using where |
+----+-------------+----------+------------+------+----------------------+-----------+---------+-------+------+-------------+
1 rows in set (0.11 sec)

mysql>

That’s better. Now its using our partitions correctly.

As long as you always use your partitioning column in your query, you will be able to take advantage of the partitioning.

The Result

After switching to partitioning, many queries are running much much faster than before. I couldn’t be happier.

If you want to read more about MySQL partitioning, check out the manual.

Posted Sunday, January 31st, 2010 under moostrax, Technology.

Similar Posts

  • Adam

    Why would a select need an exclusive lock on the table? I don’t know much about MySQL in particular, but you should make sure you have MVCC turned on. With MVCC you can write to that table while doing a table scan. Other users might see a slowdown, but they wouldn’t block until the query finished. Newer versions of PostgreSQL (which does MVCC) even have a feature that allow concurrent large table scans share disk reads.

  • Chris Moos

    Adam:

    It looks like that is only supported if you are using the InnoDB engine…unfortunately I’m using MyISAM…but I’m sure I would benefit from InnoDB (and MVCC) + partitioning.

  • http://www.saiyine.com Saiyine

    dr_strangelove: with mysql, only if the index is fits in memory. Also, he doesn’t “use” a table lock, his database engine does.

  • http://www.saiyine.com Saiyine

    Uhm the dates of the comments don’t seem to work very well. You should order your comments by id, not by date. Or what it’s going on here?

  • http://www.ubervu.com/conversations/chrismoos.com/2010/01/31/mysql-partitioning-tables-with-millions-of-rows/ uberVU – social comments

    Social comments and analytics for this post…

    This post was mentioned on Twitter by xoclipse: #mysql #partitioning http://chrismoos.com/2010/01/31/mysql-partitioning-tables-with-millions-of-rows/...

  • Greg

    Let me share with you a great example of Mysql partitoning with huge data flows architecture: http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

  • Greg

    Let me share with you this great post about time/date based partitioning and subpartitioning widh huge data flows:
    http://www.tritux.com/blog/2010/11/19/partitioning-mysql-database-with-high-load-solutions/11/1

    It demonstrate how partitioning can be a good solution for scalable architectures.