UTOSC 2010 Day 3: Dumb Simple Postgresql Performance

The last day of the conference started with a fabulous presentation by Joshua Drake from Command Prompt. From my former days as a Postgres admin, I recall he was quite the presence on the Postgres mailing list so I was quite excited. He went from bottom to top (could have been top to bottom; hard to tell) on how to make Postgres fast. Among the key concepts were that RAID5 is always to be avoided (use RAID10), put your transaction log on a separate spindle (RAID1 please), and take time to learn EXPLAIN ANALYZE. The details take much longer to learn but if I ever get the chance to run a Postgres database again, these will come in very handy.

Miscellaneous Notes:

  • hard drives are the slowest
  • raid1 great for pg_xlog
  • split transaction log
  • raid10 for data, separate raid1 for transaction log
  • transcation log on ext2 because it *is* a journal
  • don't use raid5
  • always always always use a battery backup unit (bbu)
  • many controllers will disable cache if no bbu
  • max out ram. 98% of data sets fit in 4GB of ram.
  • sata is great. get a bbu and get twice as many spindles.
  • pg is process-based. more cores, better performance.
  • autovacuum is not optional. do it.
  • default autovacuum works 99% of the time
  • shared_buffers - pre-allocated working cache of tuples
  • rule of thumb is 20% of available ram
  • check kernel.shmmax
  • work_mem - amount a sort, etc can use before it spills to disk
  • set work_mem to a larger value if you know a query needs it
  • explain analyze shows the sort method and the amount of mem
  • maintenance_work_mem - maintenance tasks - analyze, vacuum, create index, reindex
  • set maint as high as possible. ceiling, not allocation.
  • effective_cache_size - very misunderstood. hint.
  • % of cached + shared_buffers = effective_cache_size
  • generally 40% to 70%
  • log_checkpoints - off by default. correlate between checkpoints and spikes in %IOWait from sar
  • checkpoint_timeout - force timeout. 15-20 minutes. default 5.
  • checkpoint_completion_target - do not change this
  • checkpoint_segments - default 3. set to 10. how many transactions logs before a checkpoint is forced.
  • use checkpoint_warning to see if you need more
  • each segment is 16MB
  • recovery means going through each segment
  • wal_sync_method - leave default
  • synchronous_commit - wait for wal to be written before client completes
  • turn off for faster commits. generally turn off.
  • default_statistics_target - arbitrary value to determine stats collected by vacuum
  • analyze will take longer, but plan is better
  • can be set per-column
  • know to increase it with explain analyze. if plan == actual, good.
  • sqlite - great unless you need concurrent writes
  • seq_page_cost - related to rand_page_cost. cost of fetching from disk. with raid10 set the same.
  • cpu_operator_cost - default is 0.0025. 0.5 is better
  • cpu_tuple_cost
  • connection pooling - use pgbouncer
  • skype uses postgresql
  • plproxy
  • prepared queries hang onto plan for the life of the connection. pooled connections can be problems.
  • functions are great for ORM
  • execution_cost - higher means function costs more
  • no function hints to the planner
  • 9 has replication. not ready for production data yet.
  • londiste - some kind of skype-made replication
  • drbd


apt-get update Fails Due To Gzip Error

This morning I was greeted by a collection of apt-get failures. I use cron-apt on all my servers so I can quickly apply patches. A few of my servers reported the following error when they ran last night:

gzip: stdin: not in gzip format
Failed to fetch  Sub-process gzip returned an error code (1)
E: Some index files failed to download, they have been ignored, or old ones used instead.

A little googling revealed that it's some sort of bug in apt. The solution was pretty easy, simply remove all the files in /var/lib/apt/lists/partial.

The oddest thing about this issue is that it struck multiple servers of mine on the same day. Seems there must have been something on the Debian servers that triggered it. Maybe the web server crumbled and reset connections, leaving a partial file? That would be my guess but I'll probably never know for sure.


Kamailio Hashtables

A few weeks ago I completed an upgrade from OpenSER 1.2 to Kamailio1.5. Overall it's been working quite well and actually was not nearly as hard as I feared it might be. Now that I've got this new version with a ton of new bells and whistles, I thought I'd try out a few of them. One of the most exciting is shared memory hashtables.

First a quick primer on hashtables. Currently whenever the proxy needs a value from the database it simply queries the database and gets the value. Straightforward, right? But as the system grows, the database can be greatly put under strain. In my Kamailio cluster the single biggest CPU user is in fact the MySQL database by a large margin. And since the database changes pretty infrequently it would be nice if we could cache those values.

The problem is that Kamailio is a multi-process daemon, so what's cached in one isn't going to be cached in another. Enter shared memory hashtables. The same cache is shared among all the processes. Usage is pretty simple. Here's an example:

# IP based authentication
if ($sht(ht1800=>$var(fU)::ipauth) == null){
	if (is_user_in("From", "ipauth")){
		$sht(ht1800=>$var(fU)::ipauth) = "yes";
		$sht(ht1800=>$var(fU)::ipauth) = "no";
	$var(ipauth) = $sht(ht1800=>$var(fU)::ipauth);
	$var(ipauth) = $sht(ht1800=>$var(fU)::ipauth);

if ($var(ipauth) == "yes"){
	# do whatever...

Don't worry too much about the nitty-gritty. First we check to see if the value is cached (($sht(ht1800=>$var(fU)::ipauth) == null)). If it's set, we just use the cached value and continue. If not, we have to ask the database (is_user_in("From", "ipauth")) and then save the result in the cache. It actually is that easy.

But the $64,000 question, has it helped? Luckily I thought ahead and got some data before I made the change. I grabbed the query log for a day before and after the change. I made sure to select two days where the call usage was the similar (1% difference). Overall there was a 20% reduction in SQL queries.

I only put in the one change as a proof of concept. Since it sure seems to have panned out I'll continue wrapping all my database calls with hashtable caches. One thing I still need to work out is how to invalidate the cache when I make a change through my provisioning system. And once I'm able to upgrade to Kamailio 3.0 I'll switch over to memcached, which has all the benefits of a shared memory hashtable but it's also clustered. Ooh, I'm so excited!


dpkg --configure -a

So, just a word to the wise: dpkg --configure -a is not the same as dpkg-reconfigure -a. The former will continue configuring packages where an interrupted session left off. That's a good thing. The latter will reconfigure every package on your system. That, I think I can safely say, is a bad thing.

And no, I don't know anyone who's made that mistake recently. Nope, nobody.


OpenWRT and WPA

For whatever reason last night my OpenWRT access point decided to stop working. My laptop would connect for a second and then disconnect. I tried another wifi card and it did the same thing, which eliminated my laptop as the culprit. I turned off WEP and was able to connect. The weirdest thing was when I would try to turn WEP back on, I got this error (in bold):

[root@alberto ~]# iwconfig wl0
wl0 IEEE 802.11-DS ESSID:""
Mode:Master Frequency:2.437 GHz Access Point: 00:0F:66:4A:DF:08
Tx-Power:19 dBm
RTS thr:2347 B Fragment thr:2346 B
Encryption key:<too big>

I'm not comfortable without encryption (although honestly I would like to run a captive portal alongside my encrypted network), so after a few reboots and unsuccessful attempts to get WEP working I gave WPA a shot. It's really what I should be running anyway but I've had WEP set up for so long I just didn't feel like dealing with it.

But luckily, it worked out quite flawlessly. Following a guide I found here I got it set up quite simply. I did have to reboot to get the changes to take, but not a big deal. The only remaining problem is reliably getting my laptop to use WPA. That's the real reason I've delayed. I've gotten it to work before, but it's sometimes been a pain. All in the name of progress, I suppose.


A Rose By Any Other Name

Stephen Dubner, the famed economist behind Freakonomics (an excellent book, btw), posed the question, is it time to rename 'Digital Piracy'?" The answer is an unequivocal "yes". Despite what Shakespeare said, sometimes a name can mean everything. I refuse to accept the word "piracy" as anything other than high-seas pillaging. I suppose one positive outcome of the recent surge in Somali piracy is that people are realizing that copyright infringement hardly warrants such a strong word, especially when we have a perfectly adequate one. Let's not let the geniouses at the RIAA, who've had a wonderfully successful program of suing their customers, dictate our terminology on the matter.


VoIP Routing Loop

The other day my boss and I managed to create a call routing loop between two of our phone systems. It was kinda fun, in one of those painful ways. Here's how it went down.

Phone system A has extensions in the range of 2XXX. Phone system B has extensions in the rage of 1XXX. The PBXes have an IAX2 trunk between them to allow direct dialing. The phones themselves have a dialplan which recognizes 1XXX and 2XXX as patterns, which allows for faster dialing. Users must dial a 9 to reach an outside line.

There was just a slight problem. Due to a misconfiguration on my part, B was set up to route any unknown 1XXX extensions back over to A. For a real extension, say 1000, the rule wouldn't match because B would know to send that call to the local user. But let's say that somebody's boss forgot to dial a 9 when dialing a long distance number. So the phone sees the first 4 digits, 1 + area code, and assumes it's an extension. A sends the call to B. Well, that extension didn't exist so B sent the call back to A. That would be the point when the fun began.

I would have thought that the PBXes would create a whirlwind of calls until the original call eventually timed out and broke the chain, thus bringing all the calls to a halt. But somehow that isn't what happened. The system stabilized at 400 or so active channels. Sat that way for a few hours I believe. What finally did stop it was my removing the rule to allow 1XXX over the IAX2 trunk from B to A. Within just a few seconds all the calls came to a grinding halt (in a good way).

I've decided then that having local extensions that start with a 1 is a less than desirable thing. It's not the end of the world, but avoiding it can relieve some potential headaches so I intend to eschew it where possible.



My first day at the Utah Open Source Conference is just about done. I really need to get to bed so I don't fall asleep during my presentation tomorrow. That would be embarrassing. I only made it down at 6:30pm for the dinner and keynotes, which I'm glad I attended. Mac's talk was great and so was Paul's, although it did seem to drag on. Maybe that's because I needed to go to the bathroom. I also had a chance to visit with a few Linux newbies at my table, which is always fun. But the highlight of my day had to be meeting Harleypig. Thankfully, no speedos were involved.


VoIP QoS With Wondershaper

Hans and I were discussing QoS the other day, specifically regarding using Wondershaper from the LARTC. I had managed to mess mine up and I subsequently noticed a horrible turn for the worse in my VoIP calls. Wondershaper has to be adapted for use by OpenWRT and in the process I misspelled sch_ingress.o as sch_insmod.o. Too much insmodding that day, I think. The net effect was that download speeds were not shaped at all.

Once I got it corrected, I decided to do a few tests just to confirm that using Wondershaper actually made a difference. I'll cut to the chase for the lazy: it did. I made 45 second calls to music on hold from my softphone, Twinkle. In the background I had Wireshark running. I used the RTP analyzer in Wireshark to look at the statistics after all was said and done. I used both versions of Wondershaper, the CBQ and HTB. I had a single download running the whole time eating up all spare bandwidth.

With no shaping: 4.4% loss (95 packets), 60ms jitter
With CBQ Wondershaper: 0.2% loss (5 packets), 35ms jitter
With HTB Wondershaper: 0.3% loss (6 packets), 28ms jitter

So my unscientific conclusion is that both versions of Wondershaper work about the same and they both make a huge difference. I could easily hear the packet loss on the first call, but not so much on the other two calls.



Subscribe to RSS - linux Subscribe to - All comments