Breaking Up a Monolithic Rails App with an Engine and a Shared Database

When we first built the app that became Keylime Toolbox, we used Ruby on Rails because it was what the team was familiar with and it gave us lots of tools to get started quickly. Along the way, as we grew, we knew we were going to run into the dreaded “monolithic Rails app” problem if we didn’t separate out our code. So early on we created our own gems to segregate common, “low churn” code, and also spun up a second Rails app for a service interface as we added more data. We learned a whole lot in that first service including that data should not be segregated along a dimension, but that asynchronous and long-running tasks (batch data loading) make a great seam to split on.

Fast-forward a year or two as we brought that same asynchronous data loading to bear on a new set of data and we decided to take another stab at breaking up the app. With the previous service we had put data loading and analytics into a single service that “owned” the data source. This time around I wanted to split that up. Following the command-query separation pattern, I wanted a service that was responsible for data loading (and everything associated with that), “command” (or “mutate”), but I wanted to keep the analytics part, “query”, in the main analytics app. In order to do that I needed a database that was shared between the service and the main app.

Keylime Command-Query Service Model
Continue reading

An SEO Research Project on CTR and Rank

I did a fun research project at the day job last week. We analyzed nearly five million Google search queries to see how click through rates are affected by ranking and how averages apply across industry segments.

We determined that those whole-web or industry-wide CTR-by-rank charts that many marketers use to predict performance have little bearing on their specific site or topic.

Bottom line? We found that averages, even when segmented by query type, didn’t provide much actionable data for a specific site. When we compared averages to site-specific data, we didn’t find much that was similar.

However, we did find that average click through rates within a site tended to hold fairly steady, and so using actual averaged click through rates for your own site can be very useful for things like calculating market opportunity of new content investments, estimating impact of rankings changes, and pinpointing issues with the site’s listings on the search results page.

Why we gave up on continuous integration

Four years and three companies ago we (I’ve worked with the same core team across these transitions) ditched our continuous integration server and we haven’t gone back. We spent too much time dealing with impedance mismatch between the CI environment and development/production. Instead we just keep our test suite short enough (runs in less than 2 minutes) so that developers run it often and “continuously”. And of course, with every merge and deploy.

Continue reading

Rendering thousands of items in a browser list

Over at Keylime Toolbox, we have a feature that lets you test filter patterns against your query data. To make it “fast” we limit the data to the most recent day of data. But this can sometimes be 50,000 or more queries. So when rendering all those into a list (with some styling, of course), it would make the browser unresponsive for a time and sometimes crash it.

dead_page

After hours of debugging and investigating options, I finally fixed this by limiting the number we render to start with and then adding “infinite scroll” to the lists to add more items as you scroll.

Continue reading

Pork Tenderloin and Roasted Vegetables

This is a simple, rustic, Pork-and-Vegetables dish that’s pretty easy to make. It’s done in about an hour but takes only 20 minutes or so of prep/work time. I made this up based on what I thought was a recipe in Pork and Sons, but it’s not there. To me, though, it’s kind of the quintessential recipes for that book. The first time I made this I declared it, “the best pork I’ve ever cooked.” Which is not something I say lightly.

Rustic Pork and Vegetables

One pork tenderloin (about 1-⅓ to 1-½ pounds)
6″ sprig of rosemary
4-6 sage leaves
12 juniper berries
Salt and pepper
One small onion, cut into eight wedges
Four small, red potatoes, quartered
Four roma tomatoes, halved
Four cloves garlic, skins on
3 Tbsp olive oil

Pre-heat your oven to 400 degrees F.

Take a third of the rosemary leaves, four juniper berries, and a pinch of salt and pepper and grind them in a mortar and pestle. Rub this on all sides of the tenderloin and set aside to rest.

Prep the vegetables and put them into a 9×13 baking dish. Sprinkle the remaining rosemary leaves and juniper berries, the sage leaves, and some salt and pepper over them. Drizzle with the olive oil. When the oven is hot, place the pan of vegetables in the oven to roast.

After about 40 minutes, put a cast iron pan on high heat on the stove top. When it’s good and hot sear the tenderloin for about 2 minutes on each side. Place the tenderloin in the pan of vegetables in the oven to finish. It should be done in 10-20 minutes, depending on thickness and how done you want it.

I’d serve this with a Burgundy or Côtes du Rhône.

Upgrade PostgreSQL on Scientific Linux

We updated two database servers this weekend, one from postgresql 9.1 and the other from 9.2 and brought both of them to 9.3. What follows are my combined process notes in the hopes that it will help you.

Preparation

To do this, you must have enough free disk space on your data drive to make a duplicate of the existing cluster (that is, all databases hosted on the server). So for example, our data drive on one server had 55% usage and I had to clear it to 50% (the drive is dedicated to database storage). On the other server it was 66% consumed. In both cases I removed files that were unrelated to the cluster (backups and WAL archives) and moved them off-server. In the case of the second server this wasn’t enough. If you can easily install or mount a new drive, that’s much easier than these steps but we didn’t have that luxury.

You can free up disk space by re-indexing the databases, running vacuum full, or dump and restore. Re-index can be done without taking the database offline. The other two require taking down the database and may take hours or days for a multi-gigabyte cluster. Restoring the database from backup file took 18 hours on a 250GB database (13GB gzipped, pg_dump backup file) and 39 hours for our 450GB cluster (25GB backup file). From everything I’ve read, for databases in the hundreds of gigabytes and larger, vacuum full will basically take forever. It’s faster to dump and rebuild the database.

However, you can recover a significant amount of space by re-indexing. We recovered 100GB of our 600GB cluster by running re-index on each database. Note that this took 3 hours for one 260GB database and 4.5 hours for a different 250GB database. The major difference of the two was the latter had older data — so the indexing was more fragmented.

    sudo su - postgres -c 'psql app-database-production'
    REINDEX DATABASE "app-database-production";

Instructions

We’re using Scientific Linux. The PostgreSQL Global Development Group has made a repository of builds available for binary distributions. You install the repository by installing the RPM for the repository (this was weirdly meta for me but it works). This creates a pgdg93 repository. See the repository packages page for more links.

    sudo rpm -ivh http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-redhat93-9.3-1.noarch.rpm

You can then install the new PostgreSQL 9.3 releases:

    sudo yum install postgresql93 postgresql93-devel postgresql93-libs postgresql93-server postgresql93-contrib

postgresql93-contrib is only needed for the pg_upgrade tool we’re going to use. You can remove it after the upgrade if you want.

Make a database folder

Create the new database data folder as postgres user. The Scientific Linux distro puts postgresql into /usr/pgsql-VERSION. The default data file location is /var/lib/pgsql/VERSION/data, although ours is mounted on a separate drive.

    sudo su postgres -c '/usr/pgsql-9.3/bin/initdb -D /var/lib/pgsql/9.3/data'

Disable database access

Stop all connections to the database and disable your web application. This next phase can take several hours so you’ll want to make sure you have time. Our 845GB cluster took a little over 2 hours of server downtime.

In our case, closing connections meant stopping resque workers that we have managed by monit, and disabling the web applications with capistrano maintenance mode. We also stop monitoring the database postmaster process to ensure that monit doesn’t restart it while we’re doing the upgrade. Obviously these are meant to jog your thoughts, your own infrastructure will look different.

    worker-server$ sudo monit -g resque-workers stop

    database-server$ sudo monit unmonitor postmaster
    database-server$ sudo /etc/init.d/postgresql-9.1 stop

    dev$ cap production deploy:web:disable REASON="a scheduled system upgrade" UNTIL="at 11pm Pacific Time"

Run the upgrade

Run the new pg_upgrade to migrate from the old version (-b,-d) to the new version (-B,-D). This is the part that takes a couple hours per server.

    sudo su postgres -c '/usr/pgsql-9.3/bin/pg_upgrade -B /usr/pgsql-9.3/bin -b /usr/pgsql-9.2/bin -D /var/lib/pgsql/9.3/data -d /var/lib/pgsql/9.2/data'

Verify the new cluster

Manually inspect the differences between the startup scripts:

    diff /etc/init.d/postgresql-9.?

Transfer any important things to the 9.3 script and remove the 9.2 one. In our case we have a custom PGDATA setting.

Similarly compare the pg_hba.conf and postgresql.conf files in the old data directory with the new ones. The postgresql.conf can be tedious if you’ve done a lot of tuning. (p.s. Anyone know of a good diff tool for configuration files that can compare uncommented lines in either version with their commented pairs in the other?)

    diff /var/lib/pgsql/9.2/data/postgresql.conf /var/lib/pgsql/9.3/data/postgresql.conf
    diff /var/lib/pgsql/9.2/data/pg_hba.conf /var/lib/pgsql/9.3/data/pg_hba.conf

Start the new postgresql and analyze the new cluster to optimize your database for the new version. (Note that analyze_cluster.sh is installed into the working directory when pg_upgrade is run.) The analyze script has three phases. The minimal one will get the database up and running in a couple minutes. So you can bring things back online at this point or wait until it’s fully complete.

    sudo /etc/init.d/postgresql-9.3 start
    sudo su postgres -c ./analyze_cluster.sh

Bring things back online

If you’re running monit (or god or something) to manage your postgresql server, you’ll need to modify the script with new references.

Now, bring everything back up online that you disabled earlier.

    database-server$ sudo monit reload

    worker-server$ sudo monit start all

    dev$ cap production deploy:web:enable

Test that things are working with everything up and running.

Clean up

If you’re satisfied with the new system, you can delete the old cluster. This script is installed in the working directory that you ran pg_upgrade from.

    sudo su postgres -c ./delete_old_cluster.sh

If you’d rather be a little more careful (after all it only copied the database files over) you can delete the old data/base folder, which is the bulk of the storage, and keep other configuration files around in case you need to recover them.


References:

1. How to install PostgreSQL 9.2 on RHEL/CentOS/Scientific Linux 5 and 6
2. pg_upgrade
3. REINDEX
4. How to optimize PostgreSQL database size