Breaking Up a Monolithic Rails App with an Engine and a Shared Database

When we first built the app that became Keylime Toolbox, we used Ruby on Rails because it was what the team was familiar with and it gave us lots of tools to get started quickly. Along the way, as we grew, we knew we were going to run into the dreaded “monolithic Rails app” problem if we didn’t separate out our code. So early on we created our own gems to segregate common, “low churn” code, and also spun up a second Rails app for a service interface as we added more data. We learned a whole lot in that first service including that data should not be segregated along a dimension, but that asynchronous and long-running tasks (batch data loading) make a great seam to split on.

Fast-forward a year or two as we brought that same asynchronous data loading to bear on a new set of data and we decided to take another stab at breaking up the app. With the previous service we had put data loading and analytics into a single service that “owned” the data source. This time around I wanted to split that up. Following the command-query separation pattern, I wanted a service that was responsible for data loading (and everything associated with that), “command” (or “mutate”), but I wanted to keep the analytics part, “query”, in the main analytics app. In order to do that I needed a database that was shared between the service and the main app.

Keylime Command-Query Service Model

Many companies (e.g. Pivotal Labs, Task Rabbit) have broken up their Rails app and a common method is to use Rails engines. As I had a shared database (the “analytics” database in the diagram above), it made sense to me to share the ActiveRecord models between the two main app and the processing service. A Rails engine is just a Rail app, packaged as a gem, that runs in another app, so this provided the structure to share models.

The first question people ask when adding models into an engine, is, “What do I do about migrations?” The Rails default behavior is to copy the migrations into the app that hosts the engine and run them as part of the app deployment. This assumes that the database for the engine is the same as the database for the app and that no apps share a database. This is a general Rails assumption (each app has a single dedicated database) and it makes sense as a rule of thumb. For example, if you were to tack a forum onto your app, you would want all those tables installed into your app’s database. That wasn’t going to work for me.

Pivotal Labs, as part of one of their “break up the monolith” projects, declared instead that you should “leave your migrations in your Rails engines.” This sounded great, except that they were still running the migrations in the main app against a single database. Not the solution for me.

What I needed was an engine that exposed migrations that could be run from one of the two apps (we decided the processing service because data loading precedes data reading) and would run the migrations in a database that was not the app’s primary database. In the digram above, you can see that each app has a primary database that deals with its own transactional data. In the main analytics app, that’s configuration types of things. In the processing service, that’s audit trails of data loading and details of data state.

So I wanted to keep the migrations in the engine gem and run them when we deployed. Following the ideas in this StackOverflow post I realized that the migrations needed to be separate from the app’s migration, and the schema.rb (or in our case the structure.sql) needed to be a different file (because it is a completely different database). As covered in the linked Rails issue, I decided that we really needed to completely separate the migration management, which actually made things a lot simpler. If the engine exposes Rake tasks to run its own migrations (with a database named by convention) then it can own the migrations, the schema file and everything else. And we can decide when and where to run those migrations. Which is a clean, simple solution.

The Goal

Here’s how this works when it is all done. In each app, we add a reference to the shared analytics database.

# config/database.yml

analytics-development:
  <<: *default
  database: analytics-development

analytics-test:
  <<: *default
  database: analytics-test

Then to run the migrations (in development or as part of our deployment scripts) we just have a namespaced Rake command:

$ rake analytics:db:migrate

Building the Engine

To get this all working, I needed the shared models to connect to the shared database. So they all inherit from a common base class to establish the connection (and are namespaced following engine best practice).

module Analytics
  class AnalyticsModel < ActiveRecord::Base
    self.abstract_class = true
    establish_connection :"analytics-#{Rails.env}"
  end
end

I added the db/migrate folder to the gem and created migrations there as you normally would (e.g. rails generate migrations ...).

Finally, I added the custom Rake tasks to handle migration management.

namespace :analytics do
  # Custom migration tasks to manage migrating the engine's dedicated database.
  namespace :db do
    desc 'Migrates the analytics-* database'
    task :migrate => :environment do
      with_engine_connection do
        ActiveRecord::Migrator.migrate(File.expand_path("../../../db/migrate", __FILE__), ENV['VERSION'].try(:to_i))
      end
      Rake::Task['analytics:db:schema:dump'].invoke
    end

    task :'schema:dump' => :environment do
      require 'active_record/schema_dumper'

      with_engine_connection do
        File.open(File.join(Rails.root, 'db', 'analytics_schema.rb'), 'w') do |file|
          ActiveRecord::SchemaDumper.dump ActiveRecord::Base.connection, file
        end
      end
    end

    task :'schema:load' => :environment do
      with_engine_connection do
        load File.join(Rails.root, 'db', 'analytics_schema.rb')
      end
    end
  end
end

# Hack to temporarily connect AR::Base to your engine.
def with_engine_connection
  original = ActiveRecord::Base.remove_connection
  ActiveRecord::Base.establish_connection "analytics-#{Rails.env}".to_sym
  yield
ensure
  ActiveRecord::Base.establish_connection original
end

That’s all there is to it. Sometimes, it is about finding a simple solution and avoiding making things “clever”.

Testing

I wrote unit tests for the models, but these require connecting the the database. The gem only has the shared database, so I just need to create the test database on my dev box and run the migrations (with the task I created). I ran these commands (and added it to the README file for the gem):

$ psql -c 'CREATE DATABASE "analytics-test" with owner <your-username> encoding='"'"UTF8"'"';' postgres
$ rake analytics:db:migrate

That made the appropriate database available for testing and then my models could be tested.

It also generated a db/schema.rb file, which I didn’t want checked in, so I list that in my .gitignore to exclude it for the project.

But how do I test that that an application that incorporates the gem can actually talk to the database? Easy. The scaffolded gem includes a complete “dummy” Rails app in the test folder. So following what I will have to do when I incorporate this into our two apps, I added the database references to spec/dummy/config/database.yml. Then I wrote a couple controllers in the dummy app just to test read/write against the shared models. In the specs, I added integration tests (using Capybara) that verify that I can read and write through the models.

One final note. We use factory_girl for creating fixtures for testing. Because we namespaced the model, though, factory_girl can’t figure out how to find the class just from the name of the factory. So I had to add the class attribute to the factory to help it out.

FactoryGirl.define do
  factory :log_stat, class: 'Analytics::LogStat' do
    date { Date.today }
  end
end

Incidentally, these factories are only available to the gem for its own testing. I haven’t investigated it yet, but it would be great to expose these the the containing app so that it could use them for integration tests as well. That might just have to be a generator, because I want them exposed to the test framework for the containing app, but not to be loaded into the app itself when deployed.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s