Nov 04 2006

Postgres for the win!

Published by Brian at 11:23 am under Programming and Software Development

Had something of a victory this week, and I’ve really got to give kudos to PostgreSQL for it. There’s something magical about taking a peice of technology, pushing it well beyond what you would rationally expect it to be able to handle, and having not just work, but work well. These sorts of experiences tend to make one a rapid fan of the technology in question. Well, postgres just did that for me.

Here’s the set up. I got the following problem dumped on me tuesday. See, there’s this table we wanted to have in the database. Now, each row isn’t that large- we needed a file date the record was created, a symbol (4-8 characters), an expiration date, and two volatility quotients (floats). So we’re only looking at 40-50 bytes per record. But here’s the problem: we’re already generating something like a million records a day, and this is expected to increase, and they’re wanting to put six months worth of data, probably more, into the database. So I’m looking at a single table that’ll start with over 100 million rows, and could easily grow to half a billion rows. Worse yet, I need to access the table by at least two different ways- by file date and by some combination of symbol and expiration date, with the latter being time critical (it’s an interactive application- a human is waiting on this data).

Now here’s the punchline- the machine this monstrosity is going on. It’s an HP DL-145 2.2GHz Dual Core Opteron with 4G of memory, a nice machine- but only a single 70G 7200RPM SATA drive as storage. Doing this on $300,000 worth of hardware I’d expect- but doing it $3,000 worth of hardware? On a machine we just had kicking around? What do they think I am, a miracle worker?

Um, yeah. Fortunately for me, I was using Postgresql. One miracle, to order, comming up. Follow me for the details of how I did the (seemingly) impossible.

First, and most importantly, remember that the solution to this sort of problem is to redirect the warp drive plasma conduits through the shield generator and into the transporter. No wait, that’s star trek. The most important thing is to partition your table. This is a neat trick Postgres allows to make a host of smaller tables look like one big table. In my case, I can make the single 120 million+ row table look like 20 6 million row tables. Better yet, as the table grows towards half a billion records, I can keep subdividing it to keep the per table size sane.

I have to say the nicest feature of Postgres is it’s excellent documentation. The online manual is well written, well organized, helpfull, and pretty much complete- four attributes I don’t expect of technical manuals, wether open or closed source. In the case of Postgres, it really is the friendly manual. Thirty seconds of googling brought up this page, which includes detailed step by step instructions on how to partition a table in postgres. With complete examples- they didn’t forget or leave out steps. And they explain things well enough that you understand what you’re doing. For the documentation alone I could love this database.

After a little thought and some pointed questions to the future users, I decide to partition the table on the symbol. This is easy, makes for an obvious naming scheme, and actually speeds up the user’s queries. Of the two, I’d rather make the data inserting slower to speed up the user queries than the other way around (although it turns out that in this case I get to have my cake and eat it too, which I’ll get to a little later). Specifically, I’m paritioning based on the first character of the symbol. But here’s the cool part- if the user asks for the data for IBM, say, postgres knows it only needs to look in the table of the I’s. It’s skipping 95% of the data right from the start.

The other thing I took advantage of to help speed up queries is Postgres’ advanced query optimizations, specifically, bitmask joins. See, what Postgres can do is convert an index into a bitmap of what records satisfy a condition. It can then combine these bitmaps with the standard bit operations, like and and or. This means if you have an index on foo and a seperate index on bar, Postgres can combine both indexes to optimize queries where foo=x or bar=y, which can’t be done on a single index. In addition, single-column indexes are cheaper to maintain than multi-column indexes.

Jumping to the end of the story here for a moment, the combination of partitioning the tables and bitmap joins on the indexes means that the common query done by the user (”give me all the samples for IBM with an expiration date of 15 Nov”) completes in well under a second at this point. Better yet, I can keep the table sizes limited by increasing the granularity of the partitioning as the data grows. For 100 million rows, I currently have 20 partitions- as the table grows to half a billion rows, I can increase this (as needed) to 100 partitions and keep the partition size the same.

So with the user query speed well in hand, I turn my attention to the data import speed. This looks to be a problem due to the nature of the input file I have to deal with. It’s not just that it’s a million records- it’s that it’s a million unsorted records. Or rather, they’re sorted by when trades occurred, which isn’t at all helpfull. So MSFT records are followed by IBM records, expiration dates jump all over the place, and so on. If I try to just insert them directly into the main table, what’ll happen is that I’ll insert the IBM records, which means I need to read in the I table and it’s indexes. Then I insert a MSFT record, which means I need to read in the M table and it’s indexes, which will basically crowd the I table and it’s indexes out of memory. So the next time I try to insert another IBM record, I’ll be reading that table back into memory all over again. Inserts will take frimping forever. This is a problem not only because I need to insert a million new records each day, but also because I’ve got six months of back data I need to insert. And bosses going “how soon can we get this up and running?”

I toy briefly with the idea of dropping and recreating the indexes, but a) this is pretty slow even still, b) it encodes the structure of the table and the number and names of indexes into the load script, which is bad style and a maintainance nightmare, and c) it turns out not be necessary. Returning once again to the friendly manual, I devise the following approaching for importing the data:

1) Create a temporary table with the same columns as the main table. I don’t even need to encode what these columns are (a maintainance headache) because I can use the Postgres LIKE argument to the CREATE TABLE command. My create table command is just “CREATE TEMP TABLE temptable (LIKE maintable)”.

2) I use the COPY command to blast data into this temporary table at tens of thousands of rows a second. This is especially nice as the data file I’m reading from is effectively already in CSV format, so very little modification of the data needs to be done- I’m mainly just reading data from the disk and blasting it into the database.

3) I then do an INSERT from a SELECT, with an ORDER BY on the select, to copy the contents of the temp table into the main table. This is just “INSERT INTO maintable SELECT * FROM temptable ORDER BY symbol”. The ORDER BY causes postgres to sort the inserts for me. So first it inserts all the symbols starting with ‘A’, reading in the A table and it’s indexes. Then it inserts all the symbols start with ‘B’, which reads in the B table and flushes out the A table- but that’s OK, as we’re done with the A table anyways. I’m making maximal use of the disk cache this way.

4) When the insert is done, I drop the temptable and exit.

Doing the inserts this way is reasonably simple, maintainance friendly, and fast- inserting a million new rows only takes 5-10 minutes. Which means inserting the six months of backlog took me like 10-20 hours. The merely difficult we do immediately, the impossible you’ll have by the time you get in tommorow.

At this point, the only complaint I have is that Postgres is doing so much with so little that I’ll never get cool hardware to play with. I’ll be stuck with cheap low-end boxes forever. Sigh.

The next person who claims “Postgres is slow” is going to get an earfull from me. Maybe Postgres 7.0 was slow, I don’t know- but I know for damned sure that Postgres 8.1 isn’t. There are, I think, three things that make Postgres seem slow:

1) The default configuration. I joke that the default configuration of Postgres was designed for a 386 with 16 meg of memory. This isn’t exactly true (it’s more like a 486 with 32M of memory), but it makes my point. The default configuration of postgres is seriously minimal- it’s good for getting the database up and making sure it’s running, but it’s not going to provide decent performance at all. For that, you need to go in and tune it a little bit, telling it how many resources to use. Which is good- you may be running the database on a machine also running other things, and may want to limit the resource utilization of the database. But if you’re reading a performance comparison between postgres and some other database, and it starts with “we tested both databases with their default configuration…” you can stop reading at that point.

2) Lack of transactional awareness. Transactions change things. Especially with Postgres, where every command that isn’t part of another transaction is it’s own transaction, and transactions are ACID. So, in Postgres, when you do a single INSERT statement, it is it’s own transaction, and doesn’t complete until it’s written out to disk. So you can only do as many INSERTs per second as you can do seeks per second. This means you’re only going to get a few hundred INSERTs per second if you’re lucky, if you’re not aware of transactions. Which could easily give you the impression that Postgres is dog-slow. However, once you’re aware of transactions, you can speed things up enormously by doing more per transaction. Grouping 100 inserts into a single transaction makes inserts about 100x times faster, as the cost of the transaction is spread over all 100 transactions- and this still keeps you ANSI SQL compliant. If you’re willing to ditch ANSI SQL and be Postgres specific, you can use things like the COPY command, which is signifigantly faster yet.

3) Most performance tests don’t really push the scalability of a database. OK, your database is really fast on tables with 10,000 rows, so what? On modern hardware, just about any database will be fast enough on small tables. Performance becomes important when you’re dealing with large things- either really large tables or lots and lots of connections. If the latter, transactions stop being optional- the database has to be consistent despite the asynchronous actions of the user. And if it’s the former, well- doing well on small tables does not necessarily mean that you do well on large tables. A table with 100 million rows in it is a radically different beast than a table with 10 thousand rows.

I’ve pushed postgres on performance. And rather than finding it slow, I’ve found it shockingly fast and scalable. The traditional reward for a job well done is usually another, harder, job. But I’m not worried. I’m using Postgres. The worst that’ll happen is that I’ll get an excuse to actually get new hardware.

Popularity: 84% [?]

36 Responses to “Postgres for the win!”

  1. Candideon 04 Nov 2006 at 3:47 pm

    Wow.

    This goes to show one of the things about the open source world which is so wonderful — since your users are the ones doing the coding, they tend to only code in stuff that’s actually useful to them.

  2. Alleagraon 25 Dec 2006 at 7:23 am

    Sorry to be pedantic and seemingly trivial & irrelevant but as a programmer you know the importance of getting details right down to the last period or semicolon or whatever. I hope you’ll agree that precise communication is also important in writing English.

    The phrase “nicest feature of Postgres is its excellent documentation ” taken literally is meaningless because it can only mean “…..is it is excellent documentation” which actually sounds almost correct but isn’t because I haven’t made a typo.

    Even very smart people seem to find this so difficult to appreciate or at least to carry out in practice. Why?

    “It’s” means “it is” and never indicates possession as in ‘its cooling fan’ or ‘its documentation or ‘the old VW had its engine at the back’.

  3. Social Content Headline Newson 25 Dec 2006 at 7:33 am

    Postgres for the win!…

    [link] [more]…

  4. Olegon 25 Dec 2006 at 8:18 am

    “it is its own transaction” ->
    “it is its own transaction”

  5. Jonathan Hseuon 25 Dec 2006 at 9:54 am

    Doing the inserts this way is reasonably simple, maintainance friendly, and fast- inserting a million new rows only takes 5-10 minutes. Which means inserting the six months of backlog took me like 10-20 hours. The merely difficult we do immediately, the impossible youll have by the time you get in tommorow.

    I’m sorry, but that’s REALLY, REALLY slow and actually too slow for the stuff we do. On Microsoft SQL server 2k5 with a similar machine to yours, loading a million records from a CSV takes at most 30 seconds, and I can expect 100 million to be done in an hour. PostgreSQL is very slow in my experience and your article didn’t change my mind :)

  6. Volker Grabschon 25 Dec 2006 at 10:05 am

    That splitting into multiple tables and the pre-sorting of input data is simply not necessary. Neither for speed nor for saving memory. You should learn how indexes work. I bet your setup would become even faster if you didn’t split your table.

    You have nothing reached but complicating your code. And you’re doing worse than the Database indexes would do. Generally, don’t ever make pre-mature optimizations.

    Also I don’t see the reason for grouping 100 INSERTs into one transaction. Why not just putting *all* your INSERTs into one transaction? That would reduce the transactional overhad even more. Isn’t PostgreSQL able to handle such big transactions?

  7. KBKon 25 Dec 2006 at 10:29 am

    it’s == it is
    its ==
    Thanks for the informative article!

  8. Glennon 25 Dec 2006 at 10:48 am

    Oracle DBA here.

    Are your disks mirrored? If not, you should think about it. A strait mirror would not cost too much (does your motherboard have it already?). You will burn that disk out in a year or so if you are lucky.

    I have read through most of the postgresql documentation and would love to spend time playing with it - but no time. If I read and remember correctly, by partitioning your tables your are also partitioning the index. So instead of scanning the whole index for your match, you are only scanning that partition of the index. Postgresql’s bitmap index - not sure if that changes with a partition or even if it is partitioned.

    How is your backup done? What is the size of your DB and estimated size after 500 million records?

    Thanks for sharing,

    Glenn

  9. Juan Joson 25 Dec 2006 at 1:48 pm

    EXCELLENT article. Really good use of PostgreSQL features :)

    In response to #1: That means that Open Source Software is only meant for developers? Becuase if it only has what is useful to developers, then normal users have not what they want (detailed explanations of what’s going, etc…) :D

  10. gongoferon 25 Dec 2006 at 2:36 pm

    Want to try VERY small SQL engine - sqlite?
    It’s small, typeless (I like it!), VERY powerful (nested SELECTs with LIMITs; MySQL, hello?) and FAST.

    And it’s free.

    I use it for my VoIP projects (billing, CDR storage). About 0.5mln records a day, on 40+ columns.

    It’s comply to ACID too. And it loves transactions.

    And Richard Hipp, the creator of SQLite, is very friendly and responsive (see mailinglist archives on their site).

    In my examples SQLite from 4x to 9x faster than Mysql (on my data).

  11. gongoferon 25 Dec 2006 at 2:37 pm

    Sorry, the homepage of SQLite is http://www.sqlite.ORG/ (not .COM, sorry!)

  12. Buragon 25 Dec 2006 at 5:37 pm

    Grouping 100 inserts into a single transaction makes inserts about 100x times faster

    This is not what transactions are designed for. Transactions are supposed to complete multiple operations in ACID. What if the 50th row fails? Do you rollback ? Or just keep going..

    Also, I don’t see anything fascinating about the fact that you can partition the table using a function. Concept has been around for ages. Oh wait, I forgot we are talking open source and they have quite a bit of catching up to do :-)

  13. Nifty Stuffon 25 Dec 2006 at 5:59 pm

    Nice article, I enjoyed your writing style and how you actually explained things in detail.
    Thanks for sharing with the rest of us.

  14. Super Mikeon 25 Dec 2006 at 7:49 pm

    Postgres is giving Oracle a run for its money. I’m glad the only database I prefer on mid to large scale projects is postgres. Now if I could just figure out why my employer wants to keep wasting money on Oracle.

    And get this. The other day some managers above me wanted to build a system on Linux. When it came time to discuss the database, the manager asked, “what do we have paid licences available for?” The thought never came into his mind, unfortunately, on what was FREE!

  15. Robinson 25 Dec 2006 at 8:02 pm

    Well, what it also shows is this:

    1. Postgres (specifically) is designed well and that they’d rather get things going correctly (first!) than fast.

    2. The author had a problem, looked for a good solution, read the documentation well, thought on it again, and implemented it sensibly.

    We need far-far more detailed articles like this one to compensate the tons of crap out there trying to populate the (incorrect) idea that mysql’s of the world are in for world domination.

    They all have their niches, and (after using postgres 8.2) I believe that postgres is beginning to enlargen its own.

  16. maxon 26 Dec 2006 at 1:27 am

    is its excellent documentation

    it’s = it is

    its = belonging to it

  17. anonon 26 Dec 2006 at 10:04 am

    No no no, that’s “Postgres for Great Victory

  18. Rahul Pathakon 26 Dec 2006 at 6:23 pm

    Hi there,

    Great post, thanks for sharing. A potentially stupid question - would this strategy of “blast into temptable, select into maintable” work on any other DB? Or is it the table partitioning in POSTGRES that makes this work?

    Very cool strategy btw.

    Cheers,

    Rahul

  19. Christopher Smithon 27 Dec 2006 at 3:28 am

    Have you tried without the partitioning? Technically it shouldn’t help you and if anything it should slow you down unless you regularly drop records by what the first letter of the key is. I’m a bit worried you’ve done some premature optimization here.

  20. Candideon 31 Dec 2006 at 10:21 am

    Everyone who posted here recently should take a look at this post — thanks, and welcome aboard.

    ~~ Your Friend Blog Admin.

  21. Enfranchised Mind » What the heck…?on 31 Dec 2006 at 2:02 pm

    [...] Apparently my blog got onto some mailing list, because there’s suddenly 40 new, good posts from people I don’t know, mainly griping at BHurt’s Postgres for the Win! post. [...]

  22. Candideon 31 Dec 2006 at 3:05 pm

    Oh, hey. A comment intended for me.

    While the software created by open source may not be intended just for developers (although see BHurt’s Clued People Wanted), the open source process itself really is based on having user-developers. You don’t have to actually write code and submit it to the project to be classes as a “developer” here — but you need to be the kind of person who would be able to do so if they felt like it.

    In the case of DBs, this is even moreso. After all, who are your “users” of a DB exc€ept DBAs, infrastructure gurus, and developers?

    Now, that all said, I object to the basis of the argument: namely, that there are things a “user” would want that a “developer” wouldn’t. Developers are users, too, and they want the same things: I’ve been turned off from more than one potentially good open source project by a lack of usability and support (including being a reasonably late-adopter of Linux). Particularly with the introduction of Wiki-/comment- based documentation and the increasing competition among open source projects, we’re seeing better and better documentation.

  23. Matton 02 Jan 2007 at 2:45 pm

    Would have preferred it if you’d posted performance data before and after partitioning. I use Postgres in several systems and have only had to partition one. Partitioning is probably best reserved for situations where the references are concentrated particular ranges - ie: dates - this month, last week, etc. so that a mixture of fast/expensive/small and slow/cheap/large storage solutions can be used.

  24. [...] Came across a great blog post on performance in postgres on the Enfranchised Mind Blog. Great post - thanks for sharing it. [...]

  25. [...] just bumped into Brian’s post on Postgres’s performance while researching running JIRA on [...]

  26. Elseon 21 Jun 2007 at 4:35 pm

    Nice article. I’m looking at optimising a postgresql DB now and it’s answered some questions for me.

    Thanks.

  27. [...] who has authored the popular posts on Postgres, Nazis, and Lazy Lists, [...]

  28. Daveon 08 Oct 2007 at 3:35 pm

    Thanks, this is exactly what I was looking for. I recently partitioned a 1.5 billion row table into
    175 tables. Insert performance dropped through the floor.
    Before partitioning I could load 15000 rows in 1 minute, after partitioning it was 14 minutes,
    individual inserts, not bulk loads. During the transfer I used bulk loads of 2 million rows at
    a time and was getting about 100,000 rows per minute. I hadn’t thought of using tmp tables
    in production and didn’t now about the CREATE LIKE syntax of table create.

  29. Arun Prasannanon 10 Dec 2007 at 7:09 pm

    Very informative, thanks.

  30. Rudd-Oon 11 Dec 2007 at 2:34 am

    What do you get by partitioning a table that you aren’t already getting with the default index? Isn’t partitioning used to, you know, spread the load between two disks by putting each table on a different disk?

    Now, I would understand that sorting stuff previous to inserting (which yields an approximate sort on disk layout) would speed locality of access if your queries usually fetch data with spatial locality. Is that your case? Because if it isn’t, there’s really no point in sorting stuff.

    If you feed a million insert statements within a transaction begin/end block, you don’t need to worry about index update performance because the index will only be updated WHEN the transaction ends, not in the interim.

  31. Funny_Falconon 13 Dec 2007 at 1:11 am

    Partitioning matters: it really increased select speed. But I use it for archiving not actual data.
    Problem with single table approach is growing index: when index, which ordered differently relative insertion order, is larger then shared_mem, insertion becomes slow.
    So that, I do insertions into main table in hierarchy, and then archive not so actual data into partitions archive tables: data are visible every moment, cause moving done in transaction, and insertion still really fast.

  32. maton 20 Feb 2008 at 7:20 pm

    >>“It’s” means “it is” and never indicates possession as in ‘its cooling fan’ or ‘its documentation or ‘the old VW had its engine at the back’.
    It can also mean it has. They can be singular. And we’ve split the infinitive.

  33. Scott Nelsonon 11 Apr 2008 at 11:01 pm

    Very good article. Thanks for taking the time to explain your issue.

    From what I’ve read, can I assume partitioning large tables is good for increasing select query speed but not so good for update or insert query speed?

  34. Brian Hurton 13 Apr 2008 at 2:17 pm

    Scott: basically, yes. Although it can help with insert and update as well, especially if the inserts or updates only touch a small number of tables.

    I will comment- if I had to do it over again, I wouldn’t have partitioned this table. I have many (non-partitioned) tables with tens of millions of rows, and a couple of tables with hundreds of millions of rows, and Postgresql performance on these tables is just fine without partitioning. Basically, I was prematurely optimizing.

  35. Georgeon 24 May 2008 at 12:01 am

    Brian, thanks for the great post.

    I dumped about 18 million records to the file using COPY TO and it took only about 10 minutes to do so, may be less - I did not measure precisely. Now I’m inserting the same 18 million records into a different database using COPY FROM, and it’s already been taking more than 12 hours. In fairness, I neither removed indexes and foreign key constraints nor increased check point segments; there is not partitioning and I’m in the default PostgreSQL configuration. But I am using a single transaction, TRUNCATE + COPY FROM instead of INSERT, and archive mode is disabled. Obviously I’m doing something wrong. What would be the best starting point for performance investigation: tweak configuration, remove indexes/foreign keys, increase check point segments?

    As a side note, using backup and restore (pg_dump -Fc + pg_restore) on the same 18 million record DB takes about 10 minutes. (Again, I did not measure precisely, just a rough estimation)

  36. Georgeon 24 May 2008 at 12:17 am

    I gave up on my stupid straightforward approach and canceled the job. Then I dropped indexes and foreign key constraints and repeated the exercise, except instead of dumping to a file first I just pushed data through the pipe, like below:

    psql old_db -c “copy my_table to stdout” | psql new_db -c “begin; truncate my_table; copy my_table from stdout; commit;”

    I could not believe my eyes when 18 million records were copied in less than 4 minutes!

    Somebody mentioned earlier that loading a million records from a CSV takes at most 30 seconds using Microsoft SQL server 2k5. How about 18 million records in less than 4 minutes using PostgreSQL 8.3? And this was not a simple import, this was export + import combined, and it did not require any extra disk space because of pipes.

Trackback URI | Comments RSS

Leave a Reply

Green Web Hosting! This site hosted by DreamHost.