When we first built the search analytics app that became Keylime Toolbox we knew we wanted to use Resque for background jobs. Because that’s based on Redis, we decided to use Redis for the Rails cache. But as things grew we realized pretty quickly that these are two very different configurations.
Cached data is ephemeral. We keep it in memory so it’s easily accessible, but if the Redis instance fails it’s OK if we lose some of the data (we can always rebuild it).
Resque worker jobs, on the other hand, are not ephemeral. When we queue a job we expect it to be run and if the Redis instance crashes we want to make sure we can recover where we left off.
While we continued with Redis for both, we spun up two distinct Redis instances and with different configurations.
When we first built the app that became Keylime Toolbox, we used Ruby on Rails because it was what the team was familiar with and it gave us lots of tools to get started quickly. Along the way, as we grew, we knew we were going to run into the dreaded “monolithic Rails app” problem if we didn’t separate out our code. So early on we created our own gems to segregate common, “low churn” code, and also spun up a second Rails app for a service interface as we added more data. We learned a whole lot in that first service including that data should not be segregated along a dimension, but that asynchronous and long-running tasks (batch data loading) make a great seam to split on.
Fast-forward a year or two as we brought that same asynchronous data loading to bear on a new set of data and we decided to take another stab at breaking up the app. With the previous service we had put data loading and analytics into a single service that “owned” the data source. This time around I wanted to split that up. Following the command-query separation pattern, I wanted a service that was responsible for data loading (and everything associated with that), “command” (or “mutate”), but I wanted to keep the analytics part, “query”, in the main analytics app. In order to do that I needed a database that was shared between the service and the main app.
I did a fun research project at the day job last week. We analyzed nearly five million Google search queries to see how click through rates are affected by ranking and how averages apply across industry segments.
We determined that those whole-web or industry-wide CTR-by-rank charts that many marketers use to predict performance have little bearing on their specific site or topic.
Bottom line? We found that averages, even when segmented by query type, didn’t provide much actionable data for a specific site. When we compared averages to site-specific data, we didn’t find much that was similar.
However, we did find that average click through rates within a site tended to hold fairly steady, and so using actual averaged click through rates for your own site can be very useful for things like calculating market opportunity of new content investments, estimating impact of rankings changes, and pinpointing issues with the site’s listings on the search results page.
Four years and three companies ago we (I’ve worked with the same core team across these transitions) ditched our continuous integration server and we haven’t gone back. We spent too much time dealing with impedance mismatch between the CI environment and development/production. Instead we just keep our test suite short enough (runs in less than 2 minutes) so that developers run it often and “continuously”. And of course, with every merge and deploy.
This was a big year for house projects although I didn’t post many. But this is what we accomplished:
Over at Keylime Toolbox, we have a feature that lets you test filter patterns against your query data. To make it “fast” we limit the data to the most recent day of data. But this can sometimes be 50,000 or more queries. So when rendering all those into a list (with some styling, of course), it would make the browser unresponsive for a time and sometimes crash it.
After hours of debugging and investigating options, I finally fixed this by limiting the number we render to start with and then adding “infinite scroll” to the lists to add more items as you scroll.
This is a simple, rustic, Pork-and-Vegetables dish that’s pretty easy to make. It’s done in about an hour but takes only 20 minutes or so of prep/work time. I made this up based on what I thought was a recipe in Pork and Sons, but it’s not there. To me, though, it’s kind of the quintessential recipes for that book. The first time I made this I declared it, “the best pork I’ve ever cooked.” Which is not something I say lightly.
One pork tenderloin (about 1-⅓ to 1-½ pounds)
6″ sprig of rosemary
4-6 sage leaves
12 juniper berries
Salt and pepper
One small onion, cut into eight wedges
Four small, red potatoes, quartered
Four roma tomatoes, halved
Four cloves garlic, skins on
3 Tbsp olive oil
Pre-heat your oven to 400 degrees F.
Take a third of the rosemary leaves, four juniper berries, and a pinch of salt and pepper and grind them in a mortar and pestle. Rub this on all sides of the tenderloin and set aside to rest.
Prep the vegetables and put them into a 9×13 baking dish. Sprinkle the remaining rosemary leaves and juniper berries, the sage leaves, and some salt and pepper over them. Drizzle with the olive oil. When the oven is hot, place the pan of vegetables in the oven to roast.
After about 40 minutes, put a cast iron pan on high heat on the stove top. When it’s good and hot sear the tenderloin for about 2 minutes on each side. Place the tenderloin in the pan of vegetables in the oven to finish. It should be done in 10-20 minutes, depending on thickness and how done you want it.
I’d serve this with a Burgundy or Côtes du Rhône.