How I Hacked a Reddit-powered Valentine’s Day Gift (and how to re-use it)

This year I decided do something a little different for Valentine’s Day. In addition to the usual flowers and sweets, I wanted to do something different. Something that leveraged my “nerd skills” for good that would spread some cheer automatically. My company, Klaviyo, had just released an API for rendering and send emails and I knew my girlfriend was a big fan of corgis and /r/corgi So I decided I’d put our new API to the test. I’d create a daily email of the top images from the Corgi subreddit that she’d get every morning. Here’s what the end result looks like:

Email Screenshot

If you’re interested in doing this or something similar, you need a few things to get started (all free):

  1. A free account with a service that sends emails. I’d recommend Mandrill or Sendgrid (note: I’m not an affiliate with either of those services).
  2. A free Klaviyo account. You need this to create an email template and to get an API key to make calls to render and send emails. You’ll also need to link your email service with Klaviyo so it can send emails.
  3. A script (mine is Python) to fetch the data and make the API call.

Then, conceptually, here’s how it works. I created an email template in Klaviyo with placeholders for the JSON data I’d be sending. I wrote a short Python script to fetch the top five Corgi images and captions from the Reddit API. I did a little data manipulation and made a request to the Quick Mail API to render my template and then send the email. Once I tested it out, I set it up as a cron task to schedule that script to run each morning and that’s was it. Here’s what the script looks like:

Valentine’s Day is over, but if you’ve got an hour and want to want to make yourself or someone else smile, give it a try. Happy hacking.

Hacking stubborn APIs: getting paginated results when there’s no paging option

First off, APIs are awesome. I remember back in 2006 when I was interning for a company in Redmond, WA and I didn’t even know what an API was or even what API stood for. Well, the situation is much different today. Making services and technologies talk to each other enables some amazing things. And it’s really great when APIs are well designed so they make the most common use cases easy and are extensible enough they can handle more nuanced situations as well.

Well, this is a short story about an API that was probably designed years ago and doesn’t give you a way to handle very common use: namely paginating results. In this case, we were doing some work to integrate Klaviyo, a great new approach to CRM, with Magento, an extremely popular open source, e-commerce platform. Magento has a SOAP API and here’s the documentation for retrieving a list of orders. (Side note: Magento just released a REST API in the past two months, so it’s too new for most Magento users and it also doesn’t support pagination.)

Start easy: try the docs

If you take a look at the docs, you’ll notice no there’s no way obvious to get, say, the first 100 orders. This is highly problematic if you’re dealing with a large e-commerce site. Iterating over all orders will, at best, be extremely slow, and, depending on the number of orders, might even cause server issues with the amount of memory required to construct and consume the response.

But if you look closely, there is a filters argument. Maybe I can do something to get pagination with a little “less than” and “greater than” magic. Let’s give it a go.

After setting up a Magento instance to play with, I tried making the following request to test it out (note I’m using Python and SUDS, but the ideas are the same in PHP, etc.):

from suds import client
client = Client('http://mymagentoinstance/api/v2_soap?wsdl=1')
session = client.service.login('username', 'password')
orders = client.service.salesOrderList(session, {
  'complex_filter' : [
    {
      'key' : 'order_id',
      'value' : {
        'key' : 'gteq',
        'value': 100,
      },
    },
    {
      'key' : 'order_id',
      'value' : {
        'key' : 'lt',
        'value': 200,
      },
    },
  ],
})

So what orders did I get? Well the first thing I noticed was I had 200 orders. Hmmm, not good. Well, looking at the Magento source code, here’s why:

// parse complex filter
if (isset($filters->complex_filter) && is_array($filters->complex_filter)) {
  foreach ($filters->complex_filter as $value) {
    if (is_object($value) && isset($value->key) && isset($value->value)) {
      $fieldName = $value->key;
      $condition = $value->value;
      if (is_object($condition) && isset($condition->key) && isset($condition->value)) {
        $this->formatFilterConditionValue($condition->key, $condition->value);
        $parsedFilters[$fieldName] = array($condition->key => $condition->value);
      }
    }
  }
}

Damn! Magento only allows one filter per field, so there goes any chance to do some less than/greater than trickery.

Next step: try the Google

Alright, well now I feel like I’ve given it a pretty serious go, let’s try the Google. Well, there are some results, but nothing that helps. As an aside, I have to give this solution an A for effort for trying to chunk things by the first character. I think (I hope) I can do better.

Roll up sleeves: read the source and find a way

Digging through the Magento source code for the list of available filters, I found this:

$conditionKeyMap = array(
  'eq' => "{{fieldName}} = ?",
  'neq' => "{{fieldName}} != ?",
  'like' => "{{fieldName}} LIKE ?",
  'nlike' => "{{fieldName}} NOT LIKE ?",
  'in' => "{{fieldName}} IN(?)",
  'nin' => "{{fieldName}} NOT IN(?)",
  'is' => "{{fieldName}} IS ?",
  'notnull' => "{{fieldName}} IS NOT NULL",
  'null' => "{{fieldName}} IS NULL",
  'gt' => "{{fieldName}} > ?",
  'lt' => "{{fieldName}} < ?",
  'gteq' => "{{fieldName}} >= ?",
  'lteq' => "{{fieldName}} <= ?",
  'finset' => "FIND_IN_SET(?, {{fieldName}})",
  'regexp' => "{{fieldName}} REGEXP ?",
  'from' => "{{fieldName}} >= ?",
  'to' => "{{fieldName}} <= ?",
  'seq' => null,
  'sneq' => null
);

Interesting, there’s a regexp filter. Okay, this isn’t going to be pretty, but it might just do the trick. What if I just created a regexp that would match only numbers in a certain range? Here’s the Python code to generate that regular expression:

def regexp_for_pagination(self, from_, to):
  return '^(%s)$' % '|'.join(map(str, xrange(from_, to)))

so regexp_for_pagination(0, 10) would produce "^(0|1|2|3|4|5|6|7|8|9)$". And then a test call:

orders = client.service.salesOrderList(session, {
  'complex_filter' : [
    {
      'key' : 'order_id',
      'value' : {
        'key' : 'regexp',
        'value': regexp_for_pagination(0, 10),
      },
    },
  ],
})

and voila, pagination when they said I couldn’t have it!

Is this a hack? Yep. Am I proud of it? You better believe it.

What APIs have you hacked to do things they “weren’t suppposed to do?”

Using Fabric to deploy on AWS

Fabric is pretty awesome and last time I discussed how I use Fabric to make one command deployments easy. One thing I didn’t cover was how I use boto, a great Python package for working with AWS, to make sure I don’t bring down my site while I’m deploying.

The basic problem is this: if you have a few web servers in an ELB or other load balancer and you start deploying to each one in turn, when you restart the service responsible for your app, there will be a few seconds where that server is unavailable. And if you have your load balancer sending traffic to it, you’re going to end up with people getting 502s and the like. Not good.

Conceptually, what we want to do is remove each server from the load balancer, push code to it, restart the service, when the service is back up, add that server back to the load balancer. In fact, you can imagine a number of tasks that might require us to remove a server from a load balancer, complete that task, and then add the server back. So let’s create Fabric tasks that’ll add and remove servers from ELBs as well as a decorator we can use to wrap other tasks so they are automatically “managed.”

Here’s the code I added to the servers.py file we created in the first post:

from boto.ec2.elb import ELBConnection
 
def elb_operation(operation, instance_id, lbs):
    conn = ELBConnection(env.aws_key, env.aws_secret)
    for lb in conn.get_all_load_balancers():
        if lb.name in lbs:
            getattr(lb, operation)(instance_id)
 
def remove_from_elbs():
    host = db.get_hosts_by('host', env.host)
    instance_id, elbs = host.instance_id, host.elbs
    elb_operation('deregister_instances', instance_id, elbs)
 
def add_to_elbs():
    host = db.get_hosts_by('host', env.host)
    instance_id, elbs = host.instance_id, host.elbs
    elb_operation('register_instances', instance_id, elbs)

The code is pretty simple. When we deploy to each server, the env.host variable is set to the host name for that server, so we can use the HostManager object we setup before to lookup which ELBs that host belongs to. Then we iterate over those ELBs and pull the server out of that configuration. I also have my AWS keys in a Fabric settings file, so those are available as well.

Here’s a decorator that we can wrap tasks with to automatically manage the ELBs:

from functools import wraps
 
def elb_managed(func):
    @wraps(func)
    def decorated(*args, **kwargs):
        remove_from_elbs()
        func(*args, **kwargs)
        add_to_elbs()
    return decorated

and here’s an example of using the decorator:

from deploy.servers import elb_managed
 
@elb_managed
def deploy():
    git_pull()
    buildout(False)
    restart()

and now we don’t have to worry about our ELBs sending traffic to servers that shouldn’t be available. This also works well in the case that the deployment fails for some reason. In that case, the server is taken out of the ELB, the deployment fails, but the server isn’t added back to the ELB, so I don’t have to worry about anyone hitting that server. Instead, I can figure out what the problem is, and just redeploy, which will automatically add that server back to that ELB once the deployment is successful.

 

Extending Backbone views to configure non-DOM events

Quick post today. One of the things I like about Backbone views is how you configure events. It’s very easy and avoids a lot of boilerplate code. For example, if I have a table and want to listen for clicks on the cells, all I need to do is:

var MyView = Backbone.View.extend({
    events: {
        'click td': '_onCellClick'
    },
 
    _onCellClick: function (e) {
        // do stuff...
    }
});

There are lots of nice things about the way Backbone view events work. However, one limitation is they only apply to DOM events. Often I’ll use Backbone events to abstract away what should happen between views or models. Compare the two examples below. In the first when I update the data, everything is tightly coupled together. In the second, they’re decoupled:

var TightlyCoupledView = Backbone.View.extend({
    updateData: function (data) {
        this.model.set('data', data);
        this.updateGraph();
    },
 
    updateGraph: function () {}
});
var LooselyCoupledView = Backbone.View.extend({
    initialize: function (options) {
        this.model.on('change:data', this.updateGraph, this);
    },
 
    updateData: function (data) {
        this.model.set('data', data);
    },
 
    updateGraph: function () {}
});

It’s a bit more code, but what’s nice is that if I need to add more things to do when the data changes or have more than one point where I update data, it’s not hard to do:

var MoreComplicatedView = Backbone.View.extend({
    initialize: function (options) {
        this.model.on('change:data', this.updateGraph, this);
        this.model.on('change:data', this.updateTable, this);
    },
 
    getDataFromAnotherObject: function (obj) {
        this.model.set('data', obj.getData());
    },
 
    refreshDataFromServer: function (data) {
        var self = this;
 
        $.get('/data/latest', function (data) {
            self.model.set('data', data);
        });
    },
 
    updateGraph: function () {},
 
    updateTable: function () {}
});

That’s all good, but you can see what ends up happening. I have a lot of boiler plate hooking things together — the same way you would with DOM events if it weren’t for the way Backbone allows you to configure them. So what I do is an extended view that all my views inherit from that gives them this functionality:

// Cached regex to split keys for `delegate`.
var eventSplitter = /^(\S+)\s*(.*)$/;
 
var ExtendedView = Backbone.View.extend({
    delegateViewEvents : function (events) {
        if (!(events || (events = this.viewEvents))) return;
 
        if (_.isFunction(events)) events = events.call(this);
 
        this.undelegateViewEvents();
 
        for (var key in events) {
            var method = this[events[key]];
 
            if (!method) {
                throw new Error('Event "' + events[key] + '" does not exist');
            }
 
            var match = key.match(eventSplitter);
            var eventName = match[1], selector = match[2];
 
            if (selector === '') {
                this.bind(eventName, method, this);
            } else {
                this[selector].bind(eventName, method, this);
            }
        }
    },
 
    // Clears all callbacks previously bound to the view with `delegateEvents`.
    undelegateViewEvents: function(eventName) {
        this.unbind(eventName);
    }
});

It looks very similar to how Backbone sets up and manages DOM events. And here’s how it’s used in practice:

var MoreComplicatedView = Backbone.View.extend({
    viewEvents: {
        'change:data model': '_onModelDataChange'
    },
 
    initialize: function (options) {
        this.delegateViewEvents();
    },
 
    updateGraph: function () {},
 
    updateTable: function () {},
 
    _onModelDataChange: function () {
        this.updateGraph();
        this.updateTable();
    }
});

It’s simple and mirrors the way you’re used to doing things with DOM events. What do you think? How do you handle this issue?

Surprising JavaScript rendering benchmark

I recently wrote about how I like to structure and render views in Backbone. Two of the most criteria when I write code are readability and performance. In JavaScript performance can be even more important because the range of devices your code runs on can vary dramatically.

I asserted that the single biggest performance hit in JavaScript rendering is accessing the DOM too much. I also said that worrying about optimizing things down to joining an array vs. creating elements via jQuery and document.createElement wasn’t worth it unless you’ve tested to see that’s the case.

I decided it’d be easy enough to write a few test cases to see what actually is the fastest. The use case tested was rendering a table with rows that contain different data. I created a quick benchmark with JsPerf. You can see the code and run it yourself here. There are four test cases:

  1. Baseline – a baseline case based on how I typically write Backbone views from the previous post.
  2. Add all elements to the DOM before rendering contents – a version to demonstrate the penalty of accessing the DOM too much. I was expecting this to be much slower.
  3. Render joining strings – A version where I don’t use jQuery to create elements, but instead just join strings together to create the HTML.
  4. Optimized render joining strings – This version is like #3, but I don’t use a child view, so I avoid creating view objects and calling additional methods. I was expecting this to be the fastest.

All test cases use Backbone models and views, but I set up the views and models beforehand, so only the different render implementations are benchmarked. For each test case, a <table> is rendered with a certain number of rows outputting names from Backbone models. The results are in iterations/s, so higher is better.

In total I did four runs. Two in Chrome 19 on my Mac and two in IE 7 on XP (couldn’t get an IE 6 box). For each browser, I did a run with 100 rows and another with 100o rows. The results are below.

Results: Surprisingly, the baseline version was the fastest in all cases. Being even faster than joining strings and setting the HTML is quite surprising. If anyone has ideas why, I’m very curious. Definitely will keep me from worrying about joining strings in the future.

Other interesting bits:

  • In IE, the other three test cases were all within the margin of error of each other. This makes sense for cases #3 and #4 (creating a new more objects and calling methods aren’t a huge performance hit), but it was surprising that the performance was  similar to adding table rows and cells to the DOM before modifying them.
  • In Chrome, it looks like there is something interesting going on between cases #3 and #4. Interestingly creating the TableRowView objects and using them to render rows is faster than doing it all in the TableView. Maybe some optimization the JS compiler is doing on the render method to make it faster?
Rendering 100 rows:

IE 7 / 100 rows

Rendering 1000 rows:

Chrome 19 / 1000 rows

IE 7 / 1000 rows

A different approach to rendering Backbone sub-views

I read some good discussion today on Hacker News about how to render subviews in Backbone. There’s no one way to skin this cat, but since I’ve done a fair amount of JavaScript development and have been using Backbone a lot recently, I figured it’s be helpful to put another approach out there since I’ve solved this problem a number of times.

To take the exact problem posited in the blog post, say you have a table that contains a list of names and you want to render the table. You’re planning on having a lot of detail in the individual rows, so it seems like a good idea to split the table view apart from the table row view. Makes perfect sense. Now you want to render the table and you want to delegate to each row to render itself so you’re not interleaving logic. How do you do it?

I put my code below the next few paragraphs because it’s kind of long. Here’s a link to a jsFiddle if you want to play with it: http://jsfiddle.net/N5rUQ/.

Rendering strategy

My strategy for rendering is based on two ideas.

First, it should be intuitive. Event based programming is great and can dramatically simplify things, but only if it’s a conceptual fit. For instance, broadcasting an event saying data has changed and allowing different parts of the UI to react is a great way to decouple code. But, personally, I think of a table as having rows and when a table needs to be rendered, I expect to see the code creating the rows and rendering them in the render method of the table (or not far from it).

Second, JavaScript performance is most impacted by touching the DOM. So you’ll notice I create elements in jQuery, but I never attach them to DOM. I only touch the DOM at the very end when I do this.$el.append(tableEl.children());. That’s a huge performance win. The difference between joining strings and creating elements with jQuery (basically calling document.createElement) is not nearly as significant as the performance hit you get from accessing the DOM repeatedly. I need to find some benchmarks here (or create my own), but I know they exist.

So, when you look at the code below, I hope you find it intuitive. The table creates the sub views for each row, renders them, and then appends that row’s element to a wrapper element that I use to create the final element structure. This is how I’d envision rendering happening in my head, so it’s nice that my mental model matches how the code works.

What do you think? What would you do differently? If you like this post, give it a vote on Hacker News.

A few additional comments:

  • I’m not using a JS templating engine, instead I’m building HTML with jQuery. If you’d prefer to use something to do the templating, you can see where it would fit in.
  • Notice that I’m also opting not to set an id attribute on the table rows. I really dislike storing information in the id and then having to parse/re-render it constantly. jQuery.data is great for solving this problem and so are data-* attributes. Curious if anyone has a reason to not do this? The only thing I can think of is it’s hard to find a row quickly from the TableView. That’s why I store the this.rowViews map. It’d be great if browsers could add data-* selectors to their selector engines. You can do $('tr[data-id="123"]'), but jQuery does most of the work and might have to access a lot of elements to filter it down, so it could be quite slow.

 

How I deploy with Fabric

In the last post I covered how I structure projects with Django, virtualenv and Buildout. Now I’m going to talk about how I deploy code via Fabric. If you’re not familiar with Fabric, it’s a Python package and set of command line tools you use to deploy or run tasks on systems accessible via SSH. Fabric is great because you can script it in Python and it has an active community. Note: I started using Fabric in 2010 and newer versions of Fabric might have made what I’m about to share easier. If that’s the case, definitely let me know in the comments.

One more note. I have to give a lot of credit to my friend Elias Torres, who was my CTO at Performable and is now making amazing things happen at Hubspot. A lot of the ideas in this post are from working together managing deployments at Performable and he gets all the credit for introducing me to Fabric.

Alright, so let’s dive in.

What is our goal?

The reason I want to use Fabric is simple. I want to take the setup I have in my development environment and move it to staging, production, wherever I want you my code to be live. Ideally, the production setup will look very similar to my development setup so I don’t have to worry about issues arising from differences in environments.

So, you can imagine, if I had some command I could run from Terminal that would simple deploy code to all my servers, life would be great. So let’s build that.

Getting Fabric

The first thing you’ll need is Fabric. Easy enough, just go edit the setup.py file we created in the last blog post and add Fabric to the install_requires section:

install_requires=[
  ...
  'Fabric == 1.4.2',
  ...
]

While you have your setup.py file open, we’re going to do one more thing. We want to create a way to run Fabric easily for our project. To do that, we’ll add an entry point in our setup.py file. Add the following as argument to the setup method in your setup.py file:

entry_points="""
  [console_scripts]
  fab=fabric.main:main
"""

What’s going on here? Without going into detail, when Buildout runs, this will create a python script in the bin directory that will run the specified function when you execute it. So in this case, we’ll end up with a file that contains:

import fabric.main
 
if __name__ == '__main__':
  fabric.main.main()

So when you run bin/fab, you’ll be able to run Fabric. Cool? Okay, run bin/buildout to get the Fabric package and create the fab script in the bin directory.

The fabfile.py

By default, Fabric automatically looks for a file called fabfile.py to find tasks it can run. You could just stick all of the tasks you’re going to use all in that one file, but we’re going to do it a bit differently. Instead, we’re going to split our tasks into four files. The first is fabfile.py and the other three are going in a deploy directory. First, we need an __init__.py file and then two more files, app.py and servers.py.

Here’s how I’ll organize the code. app.py will contain tasks with logic specific to our project (so things like how to deploy). The servers.py will contain tasks we can reuse that deal with figuring out which servers we want to deploy to. Finally, the __init__.py and fabfile.py will tie it all together. Why do it this way? To run tasks, Fabric needs to know which servers to target and then what to do. With Fabric, you can specify the target servers via the command line or you can hardcore them with your tasks, but I prefer to separate them into a separate file and also autogenerate some nice server groupings.

Okay, so let’s look at the fabfile.py:

from fabric.api import *
from deploy import *
from deploy import app
 
env.user = 'ubuntu'
env.hosts = []
 
setup_hosts(globals())
 
for m in [app]:
  load_module(m, globals())

Okay, I know I’m doing a few things here that’ll probably upset a lot of people. I’m definitely being a little “magical.” The import *‘s could list out what they’re importing, but because rarely edit this file and I might want to add additional tasks in other files, this makes that process simpler. You’ll also notice I’m referencing globals(). I need to do this because Fabric expects all the tasks you’ll run in the fabfile.py. I’m going to autogenerate some tasks and I’ve found the best way to do that is to pass a reference to globals() and add the autogenerated ones to that dictionary.

There are a few other things to note here:

  • env.user = 'ubuntu' is my hard coded user because I’m deploying to Ubuntu servers where I’m using the default user. If you need to parameterize the user, you can look at the Fabric docs to see how to do that.
  • If you’re wondering where setup_hosts and load_module come from, they’re being imported from the __init__.py and servers.py files, respectfully.

Before I dive into what’s in the other files, the general flow is that setup_hosts will create Fabric tasks to assign servers to the env.hosts variable and load_module will load tasks from app.py and namespace them. The namespacing is there in case I want to add more tasks in a separate file later.

The __init__.py file

This is fairly simple, so I’ll cover what’s in here first. Remember, this file is located at deploy/__init__.py. Here’s what in there:

I define one function which will take the name of functions defined in the __all__ property in a module and create a Fabric task prefixed with the module name. So for example, if our app.py we’ll have a task called deploy. When we run load_module() on the app module, we’ll end up with a Fabric task called app_deploy.

The servers.py file

The servers.py file holds the configuration of our servers (this could should be split out, I just haven’t gotten to it yet) and the logic to autogenerate tasks that will setup env.hosts. Here’s what it looks like:

from fabric.api import *
 
__all__ = ['setup_hosts', 'db', 'print_hosts',]
 
class Host(object):
  def __init__(self, host, name, instance_id, elbs):
    self.host = host
  self.name = name
  self.instance_id = instance_id
  self.elbs = elbs
 
  def __str__(self):
    return self.__repr__()
 
  def __repr__(self):
    return '&lt;fabfile.Host host="%s", name="%s", instance_id="%s", elbs="%s"&gt;' % (self.host, self.name, self.instance_id, self.elbs)
 
class HostManager(object):
 
  def __init__(self, hosts=None):
    self.hosts = set()
    self.host_lookup = dict()
    if hosts:
      for h in hosts:
        self.add_host(h)
 
  def add_host(self, host):
    if isinstance(host, dict):
      host = Host(**host)
        self.hosts.add(host)
        self.host_lookup['host:' + host.host] = host
        self.host_lookup['name:' + host.name] = host
        self.host_lookup['instance_id:' + host.instance_id] = host
    for elb in host.elbs:
      key = 'elb:' + elb
      if key not in self.host_lookup.keys():
        self.host_lookup[key] = set()
      self.host_lookup[key].add(host)
 
  def get_all_hosts(self):
    return self.hosts
 
  def get_hosts_by(self, method, key):
    return self.host_lookup['%s:%s' % (method, key)]
 
db = HostManager([
  # app
  Host(host='ec2-12-34-56-78.compute-1.amazonaws.com', name='production-1', instance_id='i-abcdefgh', elbs=['production']),
  Host(host='ec2-12-34-56-79.compute-1.amazonaws.com', name='production-2', instance_id='i-ijklmnop', elbs=['production']),
  Host(host='ec2-12-34-56-80.compute-1.amazonaws.com', name='staging-1', instance_id='i-qrstuvwx', elbs=['staging']),
  Host(host='ec2-12-34-56-81.compute-1.amazonaws.com', name='staging-2', instance_id='i-yz123456', elbs=['staging']),
])
 
########################################################
# M E T H O D S T O S E T U P H O S T S
########################################################
 
def create_host_setter(_filter):
  def wrapper():
    env.hosts = list(set([h.host for h in db.get_all_hosts() if _filter(h)] + env.hosts))
  return wrapper
 
# By ELB
def _filter_by_elb(elb):
  def _filter(host): return elb in host.elbs
  return _filter
 
# By instance
def _filter_by_name(name):
  def _filter(host): return host.name == name
  return _filter
 
def setup_hosts(g):
  for elb in ['staging', 'production']:
    g[elb] = create_host_setter(_filter_by_elb(elb))
 
for host in db.get_all_hosts():
  g[host.name] = create_host_setter(_filter_by_name(host.name))
 
def print_hosts():
  print env.host

At the top of the file, I define a wrapper Host object that represents a server/host I’m deploying to and a HostManager which holds all of the Host instances and has lookups to find servers by instance ID, name and ELB. Instance ID and ELB are both logical EC2 attributes if you’re deploying to EC2s on AWS. If you’re not deploying to AWS, you can remove those attributes, but the same logic still applies. The idea is that I want to use Fabric to deploy to a specific machine or set of machines based on attributes that are convenient. If you have other metadata you want to incorporate, it’s straightforward to do so.

After the HostManager class, you’ll see my server configuration hardcoded. Whether it’s in this file or elsewhere, this is super convenient because it’s easy to modify what servers you want managed and it’s okay to check this file into source control because it doesn’t contain any sensitive information. (Remember Fabric deploys via SSH, so if your key isn’t in the authorized_keys for that server, you can’t access it).

Finally, the the block below the comment is what it claims to be — methods to set up hosts. Specifically methods to generate tasks to set the env.hosts environment variable Fabric uses. I’m using Python to create some generators that will take a filtering function to run through all the Host objects and then assign them to env.hosts. This is really great because now I can deploy to a specific set of servers by a canonical name.

At this point, if you have this file setup, you can go to the command line and run:

bin/fab staging print_hosts

and it’ll output which servers are in that group. Also, because of how we’re setting env.hosts, you can use multiple hosting groups at once. For instance:

bin/fab staging production print_hosts

will print all four servers in our configuration.

One more thing to point out. The pattern I use for deploying is: bin/fab [task to setup hosts] [task to run]. Just as the tasks to set hosts can be chained, you can chain the tasks to run, but it’s important that all the host setting tasks precede the tasks that actually “do stuff.”

 The app.py file

Okay, so now I have Fabric setup to do everything except actually deploy my code. Here’s what the app.py file looks like:

from fabric.api import *
 
__all__ = [ 'deploy', ]
 
def git_pull():
  with cd('app'):
    run('git reset --hard')
    run('git pull')
 
def buildout(fetch=True):
  if fetch:
    git_pull()
  with cd('app'):
    run('python bootstrap.py')
    run('bin/buildout')
 
def restart():
  sudo('service app restart', pty=False)
 
def deploy():
  git_pull()
  buildout(False)
  restart()

One of my favorite parts of Fabric is how readable it is. If you look at the deploy function and you’re wondering what it does, well, it’s pretty easy to see. Our deployment process involves doing a git pull, running Buildout and then restarting the server. There are many more tasks I have in my actual app.py file that do all sorts of things like setup a server from scratch, install apt-get packages, restart other services and much more (let me know if there are specific use cases you’d like to see and I’ll cover then in a future post). For this post, I’m just going to focus on deploying code and assume you’ve already manually SSHed to each server and git clone‘d your app to a directory called “app.”

So let’s do a quick walkthrough of what this does. First, git_pull changes directory to app and does a git reset followed by a git pull. I do the reset in case I happen to have changes floating around on a server from debugging or something like that. Notice how Fabric uses the incredibly pythonic with cd('app') to execute commands in a specific directory. Love that.

The buildout function just runs bin/buildout like we do in our development environment. Then the restart command starts the service running our app. The restart command I use in production has more logic to wait for the restart to complete and do a few other checks to make sure life is good before we say we’re done. I’ve simplified it here just to illustrate the idea. (Side note: I run my apps in production with gunicorn If anyone’s interested, I can do a post on how I do it.)

And that’s it! If you have everything setup with a configuration pointing to your hosts, all you need to deploy is:

bin/fab staging deploy

My development process boils down to building features/fixing bugs/running tests, git add ., git ci -m "Helpful message", git push and then bin/fab staging deploy. This makes deployment a single command and allows deployments to occur as soon as the code is ready.

If you followed this far, thanks for reading. If you have any questions, you can post in the comments and I’ll respond. If you liked this, I’d appreciate it if you voted for this over on Hacker News. Thanks!

How I use Django, Virtualenv and Buildout together

I’ve been doing web development in Python for a little over two years now and thought I’d share how I typically set up my development environment for a new project. I’m going to talk about three Python technologies that each do a different job and together make it easy to create and deploy projects. First, let me cover what they are:

Django – If you’ve done Python, you know Django. It’s the most popular Python web framework. Now, that said I’ve used Tornado, Google’s App Engine framework (is it called webapp?) and Flask for other projects. They’re all good in their own ways, but I prefer Django because I can get it up and running quickly and, with the community’s support and my past experiences, I know when and where Django will eventually cause trouble. Hint: it’s not until you have serious usage on your site.

Virtualenv – I can’t describe virtualenv better than the site itself, so I’ll just quote it. “virtualenv is a tool to create isolated Python environments.” There are all sorts of reasons you’d want to work on a project in isolation of other projects, the first and most obvious is, what do you do if different projects rely on different versions of the same package? We’ll, I’ll tell you. You’re screwed unless you use virtualenv. It’s so easy to use, there’s really no reason not to. The docs are great, so start there and ask questions.

Buildout – Buildout is a build system in Python. There are a lot of great “recipes” prepackaged with it which will cover almost all use cases. Buildout solves the problem of “how do I package up the dependencies of my project so it’s easy to deploy?” Between the buildout configuration file and a setup.py file, you’re pretty much good to go.

First step: virtualenv

Alright, so here’s the setup: you’re about to start work on a new Django app. I’m going to walk through exactly how I get things setup and it starts with virtualenv. I’m working on a Mac and I typically stick all my projects in a subfolder of a “dev” directory in my home directory. So let’s say our app will be code named “fenway.” In Terminal, I’ll run the following:

sudo pip install virtualenv
cd ~/dev
virtualenv fenway --no-site-packages

What’s this doing? Well, first we make sure we have virtualenv installed. If you don’t know what pip is, it’s used for installing and managing packages in Python. If you don’t have it installed, go take care of that first. The second line takes me to where I put my projects and the last line creates a new virtualenv in a directory called “fenway” with no site packages. What does no site packages mean? Just that I don’t want to link to any globally installed site packages. This is good because it prevents us from having accidental dependencies.

Okay, so our virtualenv is ready to go, but we need to “activate” it so python in Terminal will point to the python in our virtualenv, not the system one. So now I run:

cd fenway
source bin/activate

Now we’re good to go. You might notice that the prompt changes (this’ll depend on your settings) and is now prefixed with the virtualenv you’re currently in. This is helpful for remembering where you are in case, you know, you get lost.

Setup Buildout

The next step is to get Buildout setup. In doing so, we’ll also create the basic directory structure for our project. To setup Buildout you might think, pip install buildout, but that’s not how we’ll do it. Instead, we’re going to fetch a bootstrap file and run that to get us setup:

wget http://svn.zope.org/\*checkout\*/zc.buildout/trunk/bootstrap/bootstrap.py

If you try to run python bootstrap.py to bootstrap Buildout, it’ll be looking for the configuration file: buildout.cfg. So let’s create that in this directory. I’m putting my template buildout.cfg below:

[buildout]
parts = django scripts
develop = .
eggs = fenway
eggs-directory = /opt/fenway/buildout/cache/eggs
download-cache = /opt/fenway/buildout/cache/download
download-directory = /opt/fenway/buildout/cache/download
unzip = true
 
[versions]
django = 1.4
 
[django]
recipe = djangorecipe
project = fenway
projectegg = fenway
settings = settings
test = app
wsgi = true
eggs = ${buildout:eggs}
 
# We add this extra path so the settings and urls files can be imported
# Maybe these belong somewhere else? Not sure of the best layout.
extra-paths = ${buildout:directory}/src
 
[scripts]
recipe = zc.recipe.egg:scripts
eggs = ${buildout:eggs}
extra-paths = ${buildout:directory}/src

Now let me pause to make a few comments on this:

  • The eggs-directory, download-cache, download-directory are locations for Buildout to cache packages it fetches. You don’t need those settings, but I strongly suggest having them because it they speed things when you re-run Buildout. Knowing those locations is also really useful if you ever want to look at the source for those packages and even add debugging. I can’t tell you the number of times I’ve run mate /opt/fenway/buildout/cache/eggs/some-package.1.2.3/ to have a look at the code. One of the big advantages of working with Python (or any scripted language) is you can take a look at the source of any dependencies whenever you want.
  • The unzip = true is also important because it’ll make sure to unpack eggs. This is really useful for the debugging I mentioned above.
  • Everywhere I have fenway, you’ll want to replace with whatever you’re code naming your project.
  • You’ll notice I have test = app. This is part of the djangorecipe for Buildout that’ll automatically create a test runner for the Django app inside our project. I tend to name my Django app “app” so I know which app is the main one I’m working on. This might be a bad idea if you know you’ll be splitting your project into multiple Django apps, but I’ve found that splitting my project into multiple Django apps early on causes more headaches than solves problems.
  • The last thing to point out is the comment where I add extra-paths=${buildout:directory}/src. This is actually a bit of a hack I’ve been unable to find a better solution for. The problem results from how I structure my projects and that the settings.py file can’t be found in the PYTHONPATH without this. If anyone has suggestions on a better way to do this, I’m all ears. Adding that directory to the path doesn’t cause any problems, it just “feels wrong.”

Okay so now we’re ready. Go ahead and run:

python bootstrap.py

Once that’s done, it’ll create a script called buildout in the bin directory. The last step is to scaffold some of files and directories we’ll need.

Your setup.py file

If you’re not familiar with setup.py files, they are how Python packages declare themselves and their dependencies. Your project is no different from any other Python package, so let’s create a basic setup.py file in this directory. Here’s an example:

#!/usr/bin/env python
 
from setuptools import setup
 
VERSION = '0.0.1'
 
setup(
  name='fenway',
  version=VERSION,
  description='Rebuilding Fenway Park in code',
  author='Andrew Bialecki',
  author_email='andrew.bialecki@example.com',
 
  # I'm not sure what's ideal, but I think we'd like to move these apps down a directory
  # so instead of "src/fenway/app," we'd have "src/app."
  # Anyway, for now the value in this is that you don't have to write "import fenway.app,"
  # you can write "import app."
  packages=['app',],
  package_dir={ '' : 'src/fenway' },
 
  install_requires=[
    'django == 1.4',
  ]
)

Short and sweet. Remember to change fenway and app to match your project and Django app names, respectifully. You’ll notice again I need to do a little work in defining the packages and package_dir so match the structure I’m going to use. Also note the only requirement is django == 1.4, which is the current version at the time of this writing.

Scaffolding our app

Last thing before we run Buildout, we need to create the Django project and app.  This part follows the Django tutorial closely, so you can look there for more information. To do this, we need Django installed directly in our virtualenv:

pip install django==1.4

Once that’s done, we just need to be careful to run commands in the right place. Here’s what you should do:

cd src
../bin/django-admin.py startproject fenway
mv fenway/fenway/* fenway
rmdir fenway/fenway
python manage.py startapp app

You’ll notice I moved some files around. With Django 1.4, they modified the default project structure. Personally, I liked the old way and it’s tighter, so I move things around to compensate for that. If you’d prefer to stick with Django’s defaults, that’s okay, just make sure you go back and modify the paths we set in earlier files to reflect the new locations.

Running Buildout

Okay, no more waiting, we’re good to go. Run:

cd ../..
bin/buildout

Once that finishes, you’ll notice there is now a django script in the bin directory. That’s how we’re going to execute django commands from now on. So instead of python manage.py command we’ll use bin/django command.

Directory structure

Let’s pause for a second and take a look at the structure of our project. It’s going to look like this:

bin/
  django
  buildout
bootstrap.py
buildout.cfg
setup.py
src/
  fenway/
    __init__.py
    manage.py
    settings.py
    urls.py
    app/
      __init__.py
      models.py
      urls.py
      tests.py
      views.py

If you’re familiar with Django, you should notice that what’s inside the src directory is just a regular Django project. I’m not going to cover what’s in there because the Django folks have already done a nice job of that. Everything else is files we’ve created or Buildout autogenerated for us.

You might notice that Buildout created a number of other directories like develop-eggs, build, include. You don’t have to worry about those, they’re automatically created and we’ll exclude them from our project repo when we create a git repo later.

Running the test server

Okay, let’s test this out. Try:

bin/django runserver

The test server should start and if you go to http://localhost:8000/ in your browser you’ll see the “It worked!” message. Congratulations! We’re all done.

Going further

Okay, that seemed like a lot of work. Why bother? There are lots of reasons, but let’s start with one. Say you now start working on your Django app and you decide you’d like to use South to manage database migrations. Okay, we need to install South, how do we do that?

Super easy. Edit your setup.py file and update the install_requires section like so:

install_requires=[
  'django == 1.4',
  'South == 0.7.5',
]

Save that file, run bin/buildout again and you’re done. You can now run bin/django schemamigration app --initial and it’ll generate a migration. You can then run that migration with bin/django migrate. Pretty simple, huh?

It gets even better when integrating this into your deployment process. I’ll cover how I typically do that next time. I hope that was helpful. If you have any builds let me know in the comments.

Introducing YUI Effects, Data Storage and Lightbox

Phew! Just finished building out documentation and examples for my first ever YUI 3 gallery contributions.  The motivation for all three is roughly the same — take existing, popular, awesome JavaScript packages built on other JavaScript libraries and bring them to YUI.  So the three I decided on include two that I’ve always wished YUI had and one more that I wanted to play with (Lightbox).

And I’m happy to say all three are ready to go.  Here’s a quick break down of what’s being released.  You can go to http://projects.sophomoredev.com for information about all three.

Effects

http://projects.sophomoredev.com/yui-gallery-effects/

I’ve always loved how you can quickly add slick animations in jQuery and other JavaScript library with a simple $("#elephant").fade() and other similar commands.  Now YUI 3’s Anim module can definitely handle a simple fade and a bunch more, but the code you have to write is a little verbose.  The “gallery-effects” module is meant to provide a simple framework and API for using the YUI 3 Anim utility that will allow you to quickly and easily animate nodes.  You’ll see that the framework and API are derived from Scriptaculous (heck, even the examples are taken from their documentation). The goal is to make animation with YUI as easy, if not easier, than with other JavaScript libraries.

In addition, I’ve tested this thoroughly on all Grade-A browser so it will “just work”

Lightbox

http://projects.sophomoredev.com/yui-gallery-lightbox/

Lightbox and it’s numerous spin-offs and in use all over the Web.  YUI 3 has great support for overlays and the widget framework is great, but no Lightbox yet.  So I’ve taken care of that with a literal port of Lightbox to YUI and plans to make it more flexible going forward (think non-image content, slideshow support, etc.).  Although if you’re looking for immediate slideshow support, there are some great module already available in the YUI 3 Gallery.

Data Storage

http://projects.sophomoredev.com/yui-gallery-data-storage/

Finally, although you never feel like you’ll need it, I’ve been constantly wishing YUI had support for data storage on individual nodes ala jQuery.  It feels like every jQuery plugin I look at uses it in some way and it’s just an awesome convenience.  Well no more, I’ve ported it to YUI so it can be used natively with Y.Node instances and any old JavaScript object.  Also, the YUI 3 Node class has begun to introduce support for this already, so all I really had to do was build on top of that.

Anyway, that’s all for today.  Definitely let me know if you have questions, issues, whatever.  I’ll document any progress on these modules as I go.  Also, if anyone has any suggestions for the best way to document code with examples, I’m all ears.  I felt it was a pretty painful exercise for this project, but I could just be complaining since my bracket was busted while I was documenting. :P

Creating my first complete Django project on Snow Leopard

Surely there were going to be some issues getting Django working on my Mac (I mean aren’t there are always some problems?), so I decided to crack open a text document and keep track of the things that happened so that in the future when I do this, I don’t run into any problems.

Just as a bit of context, I’ve been playing with Google App Engine recently and it’s been my first attempt at writing Python code. Python and Django? Love ‘em. App Engine? Eh, good at what it does, but I don’t really like working in Google’s sandbox. It just feels so restrictive. So for my next project I decided to skip the App Engine bit and try to build a site still using Django, but with a MySQL backend. Because of my work with App Engine, I already had installed Django and obviously Python 2.6 and 2.6 come with a standard Snow Leopard install, so I had those as well.

So from there I set off. The first thing I knew I would need was MySQL. I’ve used MySQL before, but since doing a clean install of Snow Leopard a few months ago, I hadn’t put it back on, so first to handle that.

Finding the binary installer for Snow Leopard was easy, but here I made a mistake.  I figured I’d install the 32-bit version which ended up causing me some headaches.  What I didn’t realize is that Snow Leopard installs most programs as 64-bit if your machine is 64-bit.  How do you find out if your Mac is 64-bit or not?  See this short post on how to tell from simply looking at your processor.  So I mistakenly installed the 32-bit version and was on my way.

The next thing you need to do is install a MySQL Python adapter.  The source for that is here.  I chose to sudo python setup.py build/install, but you could use easy_install if you want.  Once I installed, I tried to import the MySQLdb package and got the following error:

File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 13, in
    raise ImproperlyConfigured("Error loading MySQLdb module: %s" % e)

This post and others indicated it was related to the build architecture of Python and MySQL. Bah, so this is when I realized I needed to go back and install the 64-bit version of MySQL.  Okay, so I downloaded the 64-bit installer and ran that.  (See further below how that worked, but not completely and I eventually did a clean install.)

Then I ran the following to try to re-install the MySQLdb adapter stuff:

cd MySQL-python-1.2.3c1
ARCHFLAGS='-arch x86_64' python setup.py build
ARCHFLAGS='-arch x86_64' python setup.py install

No dice, still not working. Poking through the comments of the post above, I came across this indicating that re-building doesn’t actually do it from scratch and you need to go into the build directory and manually rm *.  Once I did that, then I was in business — the import MySQLdb statement now worked.

Following the Django tutorial, I created by app, code-named “sanddollar.” So far so good until I got to:

python mange.py syncdb

and crossed my fingers. Forgot to create the database, but after that I still have a problem. I’m getting an error related to MySQL being able to create/write to a file:

File "build/bdist.macosx-10.6-universal/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
_mysql_exceptions.InternalError: (1, "Can't create/write to file '/usr/local/mysql/data/sanddollar/auth_permission.MYI' (Errcode: 2)")

This was a tough one.  It spawned this Stack Overflow question, but after some poking around I noticed that despite the fact I was able to create databases through the MySQL Workbench, they weren’t creating new database directories under the mysql/data folder.  My suspicion was that multiple installs on MySQL had gotten MySQL in a messed up state.  Solution: remove all traces of MySQL and then do a clean install.

This sounds pretty straightforward, but it wasn’t.  In the end, it took two sources (here and here) to get all the instructions for wiping MySQL where the second set of instructions is specific to Snow Leopard.  Alright, once I had done that, I used the same .pkg installer to install the 64-bit version of MySQL one more time.

Then I tried to connect to MySQL, and I get the following error:

ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’ (2)

Fairly straightforward fix.  No idea where the actual mysql.sock file is, but because I could connect via MySQL Workbench over TCP, someone in the IRC channel tipped me off to run:

SELECT @@socket

which outputted the location of the mysql.sock file.  In my case, it was located at /var/lib/mysql/mysql.sock.  So then I ran the following from a bash shell:

$sudo ln -s /var/lib/mysql/mysql.sock /tmp/mysql.sock

and voila, now I can connect from Terminal.  Almost there.  I created a new user for the “sanddollar” database named “sanddollar” and granted the user privileges for basic CRUD operations at host 127.0.0.1 (for some reason localhost wouldn’t work).  Tried to connect through Terminal, worked great. Then tried the syncdb command again and this time paydirt.

Final thoughts: 32-bit/64-bit is complicated, more complicated than it feels like it should be.  This makes working with Snow Leopard as a development environment a pain.  However, I learned about the “file [filename]” unix command to determine the architecture of programs.  No idea what it means but opened it up on Stack Overflow with this question. In the end, not too bad for getting things started.  Feels like it could be simpler, but less than a day isn’t that bad, so onwards!