UTOSC 2010 Day 3: Dumb Simple Postgresql Performance

The last day of the conference started with a fabulous presentation by Joshua Drake from Command Prompt. From my former days as a Postgres admin, I recall he was quite the presence on the Postgres mailing list so I was quite excited. He went from bottom to top (could have been top to bottom; hard to tell) on how to make Postgres fast. Among the key concepts were that RAID5 is always to be avoided (use RAID10), put your transaction log on a separate spindle (RAID1 please), and take time to learn EXPLAIN ANALYZE. The details take much longer to learn but if I ever get the chance to run a Postgres database again, these will come in very handy.

Miscellaneous Notes:

  • hard drives are the slowest
  • raid1 great for pg_xlog
  • split transaction log
  • raid10 for data, separate raid1 for transaction log
  • transcation log on ext2 because it *is* a journal
  • don't use raid5
  • always always always use a battery backup unit (bbu)
  • many controllers will disable cache if no bbu
  • max out ram. 98% of data sets fit in 4GB of ram.
  • sata is great. get a bbu and get twice as many spindles.
  • pg is process-based. more cores, better performance.
  • autovacuum is not optional. do it.
  • default autovacuum works 99% of the time
  • shared_buffers - pre-allocated working cache of tuples
  • rule of thumb is 20% of available ram
  • check kernel.shmmax
  • work_mem - amount a sort, etc can use before it spills to disk
  • set work_mem to a larger value if you know a query needs it
  • explain analyze shows the sort method and the amount of mem
  • maintenance_work_mem - maintenance tasks - analyze, vacuum, create index, reindex
  • set maint as high as possible. ceiling, not allocation.
  • effective_cache_size - very misunderstood. hint.
  • % of cached + shared_buffers = effective_cache_size
  • generally 40% to 70%
  • log_checkpoints - off by default. correlate between checkpoints and spikes in %IOWait from sar
  • checkpoint_timeout - force timeout. 15-20 minutes. default 5.
  • checkpoint_completion_target - do not change this
  • checkpoint_segments - default 3. set to 10. how many transactions logs before a checkpoint is forced.
  • use checkpoint_warning to see if you need more
  • each segment is 16MB
  • recovery means going through each segment
  • wal_sync_method - leave default
  • synchronous_commit - wait for wal to be written before client completes
  • turn off for faster commits. generally turn off.
  • default_statistics_target - arbitrary value to determine stats collected by vacuum
  • analyze will take longer, but plan is better
  • can be set per-column
  • know to increase it with explain analyze. if plan == actual, good.
  • sqlite - great unless you need concurrent writes
  • seq_page_cost - related to rand_page_cost. cost of fetching from disk. with raid10 set the same.
  • cpu_operator_cost - default is 0.0025. 0.5 is better
  • cpu_tuple_cost
  • connection pooling - use pgbouncer
  • skype uses postgresql
  • plproxy
  • prepared queries hang onto plan for the life of the connection. pooled connections can be problems.
  • functions are great for ORM
  • execution_cost - higher means function costs more
  • no function hints to the planner
  • 9 has replication. not ready for production data yet.
  • londiste - some kind of skype-made replication
  • drbd


Subscribe to RSS - postgresql Subscribe to - All comments