Setting up a Bastion host on AWS

When setting up my latest development environment at AWS, I wanted to deploy all EC2 services on a private VPC and simply route all traffic to them through a Bastion host (aka jump host). AWS maintains a an excellent CloudFormation quickstart guide & template here and it’s what I used to get started. After completing that guide I put a new entry for the Bastion host in my ssh config like so:

1
2
3
4
5
6
Host bastion
Hostname ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com
User ec2-user
IdentityFile ~/path/to/your/ssh-key.pem
AddKeysToAgent yes
UseKeychain yes

Those last two lines are optional - but I like to leverage the Keychain on my MacBookPro for storing keys and their respective passphrases. Accepting the host key into your known_hosts file can avoid some pain going forward, so I recommend connecting directly to the bastion before continuing.

To test my setup, I launched a EC2 instance in my private VPC and configured it in my ssh config like so:

1
2
3
4
5
6
Host 10.100.*.*
ProxyCommand ssh [email protected] -W %h:%p
User ec2-user
IdentityFile ~/path/to/your/ssh-key.pem
AddKeysToAgent yes
UseKeychain yes

The magic happens on the second line with the ProxyCommand instructing SSH to jump through Bastion to route to any of the private IP addresses used in my development environment VPC. Unfortunately, my first attempt to connect failed. I poked around in the Bastion’s SSHD config and realized that the CloudFront template was so hardened that it wasn’t going to support the transparent sessions I wanted. I needed to modify /etc/ssh/sshd_config and add the following just below the PAM section:

1
2
PermitOpen any
AllowTCPForwarding yes

After making the change be certain to restart the sshd service on Bastion.

Part of the beauty of the CloudFormation design is that it uses an AutoScaling group to automatically tear down unresponsive Bastions and spin up replacements - even adapting to availability zone issues. When that happens the replacement Bastion will be configured with template defaults, so to ensure my sshd_config change would survive a failover, I copied the LaunchConfig and edited the user data to include the following:

1
perl -pi -0 -w -e 's/UsePAM yes/UsePam yes\n\n# Added by ets to support transparent pass through\nPermitOpen any\nAllowTCPForwarding yes/' sshd_config

Then I replaced the currently configured LaunchConfig on my AutoScaling group with my copy.

Now simply asking SSH to connect to my new services in the private VPC worked transparently!
To make session more efficient using this type of proxy, I added the following to my Bastion host entry in ssh config:

1
2
3
ControlPath ~/.ssh/cm-%[email protected]%h:%p
ControlMaster auto
ControlPersist 10m

If you’re an ansible fan like me, you can take advantage of that same technique with a change to ansible.cfg:

1
2
ssh_args = -o ControlMaster=auto -o ControlPersist=30m
control_path = ~/.ssh/ansible-%%[email protected]%%h:%%p

If you’re curious how to further protect your Bastion host with a whitelist for port 22 while also supporting users who are behind dynamic IPs then you’ll want to check out my aws-lambda-firewall project.

Share Comments

Knock first firewall for AWS Security Groups

I recently setup a Bastion host to secure a development environment on AWS. The Bastion only exposes port 22 for SSH and I wanted to restrict access to a whitelist of authorized IP addresses rather than leave port 22 open to the internet. Further - I wanted to restrict 443 and 80 inbound to the development environment so that only authorized users/developers could access the pre-release builds deployed there.

But I and others on my team switch IP addresses often and don’t want to hassle with manual manipulation of AWS Security Groups each time I move networks in meatspace. So I threw together a “knock for access” firewall using API Gateway & lambda to conveniently add and expire authorized IP addresses on the applicable Security Groups. Enjoy!

Share Comments

Social Tables Retrospective

My last post was over a year and a half ago - just after I joined an interesting Bessemer Ventures backed firm in the Hospitality sector called Social Tables.
My role here has been 95% people / 5% tech and always intense. I’m retroactively posting some of the internal content I circulated as the technology leader at Social Tables - mostly high-level explanations of what my team was up to at the time.

It’s been a fun ride with unique challenges and as I move on to my next post, I wanted to capture a few takeaways for posterity:

  • Avoid greenfield rewrites of established applications. Specifics may warrant a true rewrite - but recognize the extreme risk to all involved if that’s the selected path. While I certainly agree with what Joel says about this here, I’ll add one more point of support - that a rewrite is a fight on two fronts. It’s up against the formidable foes of customer adoption of a new thing and the golden project phenomenon - where a golden (too big to fail) project’s scope continually increases without bound.
  • Hire for squads and fire quickly on non-fit. My worst HR calls at Social Tables involved attempts to reassign an employee to a different team in an attempt to establish fit. This is especially difficult where the employee in question was inherited as opposed to hired under my watch. There’s a strong desire to give the inherited employee multiple chances to demonstrate fit. Don’t do it - if the employee is not a good enough for the first team she joins (or is assigned to) then do her and the rest of the team a favor - cut her loose with dignity. Such an evaluation might rightly not be considered fair - c’est la vie.
  • Adopt 2-3 cutting-edge technologies, but no more. This worked really well for us - especially on the recruiting tip. Our best hires came through our broadcast of niche (at the time) tech such as WebGL, React, and RethinkDB. But it’s a double-edged sword. Our worst tech adoption failures came from naive pushes into Falcor, Dynamo, and Horizon.

Onward & Upward!

Share Comments

Explaining Docker to suits

I posted the following on an internal blog at Social Tables in an attempt to explain a bit about what our Software Engineering group was up to at the time.


What’s Docker you say? Glad you asked…

What’s Docker?

Docker’s a technology that allows us to wrap our applications & services as self-contained deliverables that are isolated from the host operating system upon which they are deployed.

What’s Docker … in English (preferably with pictures)?

So…visualize a long-haul shipping container.

Appreciate how much easier it makes moving cargo between ship, train, and truck.



In a similar way, Docker helps us package our software into shipping containers and we can then run that software unchanged on a laptop, a dedicated server, or a cloud platform.

So that’s a good thing?

Yes. Plus Docker’s mascot is a cute whale.
Who doesn't like a cute whale?

Share Comments

Explaining Query Tuning to suits

I posted the following on an internal blog at Social Tables in an attempt to explain a bit about what our Software Engineering group was up to at the time.


Today we’re going to talk briefly about query tuning - a black art in the craft of software engineering that predates the Internet but is still incredibly relevant. I don’t mean relevant in the “you should really pay attention to national politics” sense. I mean relevant as in “this can (when done poorly) wreck your whole day.”

What’s query tuning?

A query is a request for information from a database. It can be as simple as “find the email address of the Social Tables user named Inigo Montoya”, or more complex like “find the ASP of all Social Tables opportunities closed on Fridays that followed a Thursday evening Happy Hour.”

Since database structures are complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from a database by accessing it in different ways, through different data-structures, and in different orders. Each different way typically requires different processing time. Processing times of the same query may have large variance, from a fraction of a second to hours, depending on the way selected. The purpose of query tuning, is to find the way to process a given query in minimum time.

What’s query tuning … in English (preferably with pictures) and how can it wreck my whole day?

Query tuning is an effort to minimize the work performed while answering a query. If done poorly, database queries execute slowly and your application responds slowly. When your application responds slowly bad things happen - like Mary posting a screenshot to #devops (aka your whole day is wrecked).

Last week, Michael Dumont spent some time tuning several queries in our platform and his visuals of that effort are worth > 1,000 words:

Orange = a measure of platform activity
Blue = a measure of how hard the database is working

Note: unlike Social Tables’ Engineers, our database refuses to work harder than 100% … so those blue peaks are signs of unhappy Social Tables users.

Here are the same measurements after tuning:

Happier database === Happier users

Share Comments

Explaining SDF to suits

I posted the following on an internal blog at Social Tables in an attempt to explain a bit about what our Software Engineering group was up to at the time.


We’ve got a special subject this issue to take the edge off that geek-withdrawal : “Drawing Text with Signed Distance Fields” or using SDF text rendering for those in the know.

What’s SDF?

Signed distance field (SDF) is a technique for rendering bitmap fonts without jagged edges even at high magnifications. It was first introduced in 2007 with this bit of light reading by Valve and couples the use of a bitmap and glyphs with a GLSL shader that can sample and sharpen said glyphs.

What’s SDF … in English (preferably with pictures)?

So…you know what sounds easy but is really really hard? Drawing legible letters in Venue Mapper. Wait - what? Why’s that hard you ask?

Here’s a screenshot of “Social Tables” rendered in 12pt font by Microsoft Word:

Completely legible. But here’s that exact same image after I zoom in a few times:

Not so great. Possibly legible but if we rendered text like this as users zoom in and out in Venue Mapper we’d have to bundle subscriptions with aspirin.

SDF is a technique we’ve just begun using to more efficiently render text in Venue Mapper 3 - it performs better (app is faster) and looks prettier than the techniques we’ve used previously.

Here’s a movie of SDF in action in VM3 courtesy of our own Van Drunen.

Share Comments

MySQL Identifying Blocking Transactions

Recording this incredibly helpful query from Bill Karwin’s post to SO
You can find the source query that’s blocking your DELETE by using the INFORMATION_SCHEMA.LOCK_WAITS and INNODB_TRX tables.

1
2
3
4
5
6
7
8
9
10
11
SELECT r.trx_id waiting_trx_id,
r.trx_mysql_thread_id waiting_thread,
r.trx_query waiting_query,
b.trx_id blocking_trx_id,
b.trx_mysql_thread_id blocking_thread,
b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON
b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON
r.trx_id = w.requesting_trx_id;

See more information at http://dev.mysql.com/doc/refman/5.5/en/innodb-information-schema.html#innodb-information-schema-examples, under “Example 14.2 Identifying Blocking Transactions”.

Share Comments

Fix YAML parsing errors in WebGen

After a clean install of Mac OS X Yosemite I’ve had to reinstall much of my development environment. This process was mostly beneficial as I moved to the latest and greatest tooling but several of my static websites leverage the prior major version of Thomas Leitner’s Webgen and getting version 0.5.x up and running has been a challenge.

After working through several Ruby & Gem issues I ran into a particularly opaque error when executing webgen on a project that previously generated without issue: “(): found character that cannot start any token while scanning for the next token”

This error is usually a sign that your content contains tab characters but I’d already ensured that wasn’t the case. Instead of diving deep into the workings of the libraries at play I was able to use this post from Thomas to resolve the issue by reverting to the syck TAML implementation instead of its replacement. Problem solved!

Share Comments

Resolving H2 table lock during JUnitRunner test

After adding a new test to a project that uses Play Framework 1.2.X and thus leverages H2 for Junit functional testing, I began encountering a table lock during setup() when the framework was attempting to delete the contents of an entity table.

I initially blamed the lock on bad model design and wasted a significant amount of time tweaking JPA entity relationships and corresponding CRUD operations in an attempt to resolve it. I ultimately resolved the issue after realizing it was in fact caused by a badly designed test.

I was directly modifying entities within the transaction that wrapped my functional test and then emulating API calls that relied upon the state of those same entities. Long story short, my test was attempting something similar to this:

class FunctionalTest { //Runs entire test in the same transaction

    setup(){
        //reset H2 datastore in an distinct transaction
    }

    atest(){ 
        User u = User.findById(1);
        u.address = new Address();
        u.save();

        //test continues
    }

    btest(){
        Request req = GET("/user/1");

    }
}

In my contrived example above, after Junit runs atest() and just before running btest() - it will call setup() and attempt to reset the database in a separate transaction. But, atest() claimed a lock for insert on User and in H2 that locks the entire table by default thus setup() will be unable to reset the User table and will fail with a lock timeout.

To resolve this - you’ll either need to redesign your test to avoid manipulating entities directly in the testcase or run each test in a distinct transaction.

Share Comments

Moved again ... this time to github.io

Blogger was just too ugly & clumsy so I used a fork of juniorz’s import.rb to migrate from blogger to Octopress. Here’s my fork:

I found MeetDom’s gist helpful while installing all the dependencies required by the import script.

Share Comments