PostgreSQL Monitoring and Performance Tuning

As our PostgreSQL database needs grow, I needed to adjust how it used memory to make the most use of what we are paying for. Tuning Your PostgreSQL Server was really helpful in understanding the parameters to adjust, what they affect, and their relative importance. shared_buffers, work_mem, and effective_cache_size are the parameters that I was mostly looking at to get memory use right.

screen-shot-2016-12-08-at-9-24-08-pm

In order to get a good picture and know if my changes were effective I needed to monitor how we were using memory. I set up our servers to record metrics to Graphite and configured a Grafana dashboard to show usage over time.

Tracking Memory Usage

My first question was, “How much memory is PostgreSQL using?” It’s really hard to know for sure, but this article describes how to traditional tools are likely to report incorrectly (because resident memory, or RSS, counts memory between between processes for each process). A better tool to use is smem which can report on proportional memory, or PSS.

I downloaded the latest smem release and copied the executable to /usr/local/bin. The I used a small script to pull out USS (freeable memory), PSS, and RSS from the smem results for the postgres user and send them to Graphite. Lines like this (the full script is available in this Gist):

/usr/local/bin/smem -u | grep postgres | awk '{print "'$API_KEY'.postgres.'$HOST'.pss " $5 " " strftime("%s")}' | nc $SERVER $PORT

Capturing Temporary File Size

In reading about work_mem, it states that when queries are too large to be handled in the allotted memory it will swap out blocks to disk as temporary files in the pgsql_tmp folder. I want to reduce this swapping as much as possible, so I added a chart to track the amount of disk space consumed in that folder.

du -sk /data/pgsql/9.5/data/base/pgsql_tmp/ | awk '{print "'$API_KEY'.postgres.'$HOST'.tmp " $1 " " strftime("%s")}' | nc $SERVER $PORT

Configuring the Dashboard

The next step was to set up a Grafana dashboard. This was pretty straightforward. For each metrics (tmp, uss, pss, rss) I set up a chart to pull that from all sources. There were two tricks:

  1. Set the left Y axis to use “kilobytes” as the unit (which auto adjusts to MB, GB, etc.)
  2. Add consolidateBy(max) function to each metrics so that when you zoom out it will show the largest value for the period (although I think this isn’t quite right because Graphite is still aggregating by average)

This has been a real boon and helped me better understand whether memory was being used. By looking at the proportional memory chart I can see that we’re using about 10% of our memory for postgres tasks on average and peaking at about 20%. The balance of our memory is being consumed by disk cache (which PostgreSQL depends on for SQL query caching). Because we front our database with significant caching, and we tend to run lots of different, large, complex queries, I can accept smaller disk cache, and increase memory available to worker processes (i.e. work_mem) so that we’re not writing temporary files to disk as much.

Update

I posted a follow up to this with additional charts in PostgreSQL Monitoring and Performance Tuning: Phase 2.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s