UTOSC 2010 Day 3: Dumb Simple Postgresql Performance

The last day of the conference started with a fabulous presentation by Joshua Drake from Command Prompt. From my former days as a Postgres admin, I recall he was quite the presence on the Postgres mailing list so I was quite excited. He went from bottom to top (could have been top to bottom; hard to tell) on how to make Postgres fast. Among the key concepts were that RAID5 is always to be avoided (use RAID10), put your transaction log on a separate spindle (RAID1 please), and take time to learn EXPLAIN ANALYZE. The details take much longer to learn but if I ever get the chance to run a Postgres database again, these will come in very handy.

Miscellaneous Notes:

  • hard drives are the slowest
  • raid1 great for pg_xlog
  • split transaction log
  • raid10 for data, separate raid1 for transaction log
  • transcation log on ext2 because it *is* a journal
  • don't use raid5
  • always always always use a battery backup unit (bbu)
  • many controllers will disable cache if no bbu
  • max out ram. 98% of data sets fit in 4GB of ram.
  • sata is great. get a bbu and get twice as many spindles.
  • pg is process-based. more cores, better performance.
  • autovacuum is not optional. do it.
  • default autovacuum works 99% of the time
  • shared_buffers - pre-allocated working cache of tuples
  • rule of thumb is 20% of available ram
  • check kernel.shmmax
  • work_mem - amount a sort, etc can use before it spills to disk
  • set work_mem to a larger value if you know a query needs it
  • explain analyze shows the sort method and the amount of mem
  • maintenance_work_mem - maintenance tasks - analyze, vacuum, create index, reindex
  • set maint as high as possible. ceiling, not allocation.
  • effective_cache_size - very misunderstood. hint.
  • % of cached + shared_buffers = effective_cache_size
  • generally 40% to 70%
  • log_checkpoints - off by default. correlate between checkpoints and spikes in %IOWait from sar
  • checkpoint_timeout - force timeout. 15-20 minutes. default 5.
  • checkpoint_completion_target - do not change this
  • checkpoint_segments - default 3. set to 10. how many transactions logs before a checkpoint is forced.
  • use checkpoint_warning to see if you need more
  • each segment is 16MB
  • recovery means going through each segment
  • wal_sync_method - leave default
  • synchronous_commit - wait for wal to be written before client completes
  • turn off for faster commits. generally turn off.
  • default_statistics_target - arbitrary value to determine stats collected by vacuum
  • analyze will take longer, but plan is better
  • can be set per-column
  • know to increase it with explain analyze. if plan == actual, good.
  • sqlite - great unless you need concurrent writes
  • seq_page_cost - related to rand_page_cost. cost of fetching from disk. with raid10 set the same.
  • cpu_operator_cost - default is 0.0025. 0.5 is better
  • cpu_tuple_cost
  • connection pooling - use pgbouncer
  • skype uses postgresql
  • plproxy
  • prepared queries hang onto plan for the life of the connection. pooled connections can be problems.
  • functions are great for ORM
  • execution_cost - higher means function costs more
  • no function hints to the planner
  • 9 has replication. not ready for production data yet.
  • londiste - some kind of skype-made replication
  • drbd

tags: 

UTOSC 2010 Day 2: Business Models for Open Source

Rounding out the day I attended David Richards' presentation about business models. The class was unfortunately a little small. I think the discussion could have been much better with a few more people to drum up conversation. He did a good job of covering the business perspective from start to finish. It's easy for a hacker like myself to look at a project that is fun, exciting, and important and overlook all the nitty gritty like how to acquire customers and make money. Obviously the latter is important in achieving the former. It's also never quite so cut and dried as many Free Software advocates presume. The case-study we attempted made it clear that working out the fine points is indeed quite a challenge.

Miscellaneous Notes:

  • confluence of having fun, doing it well, getting paid
  • too cheap to meter (marginal cost) - bandwidth, storage, processing power
  • scarcity - money, time, respect
  • often times it's a one-time shot
  • moving target
  • from $1 to $0 is a huge gap
  • "open source is the greatest endeavor" - linus (really?)
  • wikipedia vs traveling encyclopedia salesman
  • buying benefits, not features
  • 4 Steps To The Epiphany - book
  • customer discovery - want evangelists
  • customer validation - why are customers satisfied
  • customer acquisition - scale to huge numbers
  • company development - dotcom bubble did this first
  • getting started - high margin, low investment, foothold position
  • segment market into a matrix. who wants what.
  • customer lifetime value
  • avoid knee jerk reactions
  • why, when, what to open
  • google mapreduce
  • case study - Informant - document management system

tags: 

UTOSC 2010 Day 2: Exploring The Radio Frequency Spectrum

I attended Robert Bolton's presentation on amateur (aka ham) radio. I was excited about this one for two reasons. First, radio is radio and I hope that in learning amateur radio I'll better understand the digital radios I use at work. Second, I've had a desire to be a ham ever since I was exposed to it as a wee little Boy Scout. I'll be taking my test in the next couple weeks, if everything goes according to plan (doesn't it always?). The one thing that came out of this presentation and excited me more than anything was that Asterisk has a radio interface (app_rpt) which you can use to create a phone patch system. I realize it's not exceptionally useful, but it does sound like a great project.

Miscellaneous Notes:

  • modulation analog: AM, SSB, FM. digital: PSK, FSK, ASK, QAM. spread spectrum: DSSS, FHSS
  • ke7zea
  • packet radio built into linux kernel
  • very slow speeds
  • gordon west book on radio theory
  • gpredict - satellite tracker
  • aprs - gps-based location reporting
  • software radio - gnu radio, universal software radio peripheral
  • asterisk app_rpt
  • ddwrt - can use more power w/ amateur license. call sign as ssid. no encryption.
  • echolink - windows based software
  • irlp - internet repeater link project

tags: 

UTOSC 2010 Day 2: Custom Puppet Facts

After a delicious trip to Pirate O's for lunch, it was back for a presentation by Joseph Hall on custom Puppet facts. Even ignoring the fact that I came in late, a lot of this was way over my head. It reminded me that I would do well to invest some more time into my Puppet system as there are many many more things I could do with it. If anybody has some free time I could borrow, please let me know.

Miscellaneous Notes:

  • manage fstab
  • hosts
  • require "facter"
  • Facter.add
  • external node classifier. use any language as long as output is yaml
  • classifier runs on puppetmaster
  • custom facts run on client
  • http://docs.puppetlabs.com/guides/external_nodes.html
  • subscribe/notify

tags: 

UTOSC 2010 Day 2: Advanced Git

Git has become my favorite version control of late, so I was very excited about this presentation. With that, I attended Tim Harper's presentation on Advanced Git. We got started 30 minutes late, he had some technical issues with his content, and we ended up with only 15 minutes of class time. That was a real disappointment. I did learn about gitk, a repository visualizer. I think I have a somewhat better idea on how to browse around my branches.

Miscellaneous Notes:

  • rebase
  • clean commits
  • bisect
  • always review before you commit
  • git add -p - yes/no before it adds
  • gitk - repository visualizer. gitk --all
  • tig - console visualizer
  • git log branch1...branch2 - show changes that are same in both branches
  • git reflog
  • lose your head

tags: 

UTOSC 2010 Day 1: Twisted

Last presentation for the day was from Gabe Gunderson. We were all a little concerned that Gabe would show up. Turned out due to a miscommunication he thought the presentation started half an hour later. Oh, well. Life goes on.

Gabe gave us an introduction to Twisted which is an event driven networking framework. Sounds like a nice way to abstract away the details of a network application. I was particularly interested in this since I'm using Zenoss, which is Python based. So far I can't see quite how this will fit into anything I'm doing.

Miscellaneous Notes:

  • event driven network framework for python
  • MIT license
  • protocols and transports separated
  • network and gui share event paradigm
  • event driven means debugging is a pain
  • events means you have to save context
  • "deferreds" are returned by functions that run asynchronously
  • common classes: reactor, protocol
  • protocol.Protocol - wire protocol (pop3, etc or write your own)
  • protocol.ClientFactory - creates instances of class for each connections
  • perspective broker

tags: 

UTOSC 2010 Day 1: TORQUE

Second class I unfortunately came in a little late, so I missed a few things (including a seat!), which is sad because it was a good presentation. Scott McQuay showed us TORQUE. It's a cluster resource manager, which is a fancy way to say that it handles distributing jobs to a cluster of computers. Overall I was impressed with it. The only weak point appears to be the scheduler where the default scheduler is naive and a more advanced one is proprietary only. And unfortunately I can't see anywhere that I could actually use this software, but it's a good one to keep in mind.

Miscellaneous Notes:

  • resource manager
  • pbs_mom - per-node daemon
  • pbs_server - master
  • qstat
  • pbs_sched - basic fifo scheduler
  • moab - commercial scheduler with more brains
  • set number of cpus per back-end
  • jobs can be wall time limited
  • dancer shell w/ public keys

tags: 

UTOSC 2010 Day 1: VMLB IP Load Balancer

First session of the day was about VMLB IP Load Balancer, presented by Clint Tinsley of Inductys. It's a formerly proprietary application whose development stopped in 2006. They're now looking to relaunch development and build it as Open Source. Overall sounds like an interesting project which could turn into something if the project works out. At this point it's not something I could actually use, unfortunately. I'll have to keep an eye on it.

Miscellaneous Notes:

  • project dormant since 2006
  • persistent connections to real server
  • designed for 2 interfaces
  • supports 4 interfaces
  • no vlan support
  • no snat support
  • mgmg ip separate from lb ip
  • layer 7 content type awareness
  • redundant controllers
  • support for ssl processor cards
  • http/https/ftp/imap/pop/ldap/smtp
  • roundrobin/weighted/least conn/src dst hash/expected delay/never queue
  • service checks
  • redhat/centos 3.0
  • uses ultra monkey - stopped dev in 2005 - or not?
  • snort built-in
  • built-in help system
  • und.cgi - management web app
  • ssl proxy for back-end http
  • real-time monitoring is missing
  • code release next week
  • source going in github (ossvmlb)
  • release on sourceforge
  • custom health checks? does basic alive (ping?). adding new ones might be hard.

tags: 

Utah Open Source Conference 2010

In just under two weeks I'll be presenting at the annual Utah Open Source Conference again. This will be my third year attending and presenting and as always I'm excited. The speakers are top notch and the presentations are not to be missed. This year I'll be covering Zenoss, which my company has recently rolled out for our internal monitoring system. If you've ever complained about your monitoring system, or the lack thereof, you will want to check this out.

tags: 

Pages

Subscribe to zmonkey.org RSS Subscribe to zmonkey.org - All comments