We Won! PSBJ Top 100 Fastest Growing Companies!

We are delighted to announce that we rank #42 on Washington’s 100 Fastest-Growing Private Companies list by Puget Sound Business Journal!

Co-partners Matt Munson (our Chief Operating Officer) and Will Iverson (our Chief Technology Officer) accepted the award on behalf of our team at a sold-out awards banquet in Seattle earlier this month.

On winning the award, Will said, “It’s a really, really nice marker for us. We don’t wake up in the morning thinking about what awards we are going to win, but it really helps us benchmark how we are doing in a fun way!”

This award is meaningful to us because we believe it’s a testament to the dedication and passion that our team and clients contribute to the Dev9 experience.

“We started with an idea – providing best in class software development, using Continuous Delivery as a roadmap. That resonates with people as a vision, and from there it’s just a matter of treating people well,” said Will.

While rapid growth is great, it remains just that unless there’s a means to sustaining that growth and we recognize what that entails. “We will keep focusing on our strengths, which include core software development and automation,” Will noted. “New platforms drive new opportunities for revenue and growth for our clients, and so we are always investing.”

Cheers to growing!

Java Release Process with Continuous Delivery

Note: A lot of the release specifics were pioneered by Axel Fontaine.

One of the most interesting things we deal with is releases. Not a deployment -- which is actually running the new software. A release, in our parlance, is creating a binary artifact at a specific and immutable version. In the Java world, most of us use Maven for releases. More pointedly, we use the maven-release-plugin. I am going to show you why you should stop using that plugin.

Why Change?

This is a question I field a lot. There are several reasons, but the primary one is this: In a continuous delivery world, any commit could theoretically go to production. This means that you should be performing a maven release every time you build the software. So, let's revisit what happens inside your CI server when you use the maven-release-plugin properly:

  • CI checks out the latest revision from SCM
  • Maven compiles the sources and runs the tests
  • Release Plugin transforms the POMs with the new non-SNAPSHOT version number
  • Maven compiles the sources and runs the tests
  • Release Plugin commits the new POMs into SCM
  • Release Plugin tags the new SCM revision with the version number
  • Release Plugin transforms the POMs to version n+1 -SNAPSHOT
  • Release Plugin commits the new new POMs into SCM
  • Release Plugin checks out the new tag from SCM
  • Maven compiles the sources and runs the tests
  • Maven publishes the binaries into the Artifact Repository

Did you get all of that? It's 3 full checkout/test cycles, 2 POM manipulations, and 3 SCM revisions. Not to mention, what happens when somebody commits a change to the pom.xml (say, to add a new dependency) in the middle of all this? It's not pretty.

The method we're going to propose has 1 checkout/test cycle, 1 POM manipulation, and 1 SCM interaction. I don't know about you, but this seems significantly safer.

Versioning

Before we get into the details, let's talk about versioning. Most organizations follow the versioning convention they see most frequently (often called Semantic Versioning or SEMVER), but don't follow the actual principles. The main idea behind this convention is that you have 3 version numbers in dotted notation X.Y.Z, where:

  1. X is the major version. Any changes here are backwards-incompatible.
  2. Y is the minor version. Any changes here are backwards-compatible, but there may be bug fixes or new features.
  3. Z is the incremental version. All changes here are backwards-compatible.

However, most organizations do not use these numbers correctly. How many apps have you seen that sit at 1.0.x despite drastic breaking changes, feature addition/removal, and more? This scheme provides little value, especially when most artifacts are used in-house only. So, what makes a good version number?

  • Natural order: it should be possible to determine at a glance between two versions which one is newer
  • Build tool support: Maven should be able to deal with the format of the version number to enforce the natural order
  • Machine incrementable: so you don't have to specify it explicitly every time

While subversion offers a great candidate (the repository commit number), git does not have the same. However, all build systems, including both Bamboo and Jenkins, expose an environment variable that is the current build number. This is a perfect candidate that satisfies all three criteria, and has the added benefit that any artifact can be tied back to its specific build through convention.

What about Snapshots?

Snapshots are an anti-pattern in continuous delivery. Snapshots are, by definition, ephemeral. However, we're making one exception, and that's in the POM file itself. The rule we're following is that the pom.xmlalways has the version 0-SNAPSHOT. From here on out, no more snapshots!

The New Way

So, we're going to use the build number as the version number, and not have snapshots (except as described above). Our POM file is going to look a little something like this:

<project ...>
  ...
  <version>0-SNAPSHOT</version>
</project>

This is the only time we will use -SNAPSHOT identifiers. Everything else will be explicitly versioned. I am assuming your distributionManagement and scm blocks are filled in correctly. Next, we need to add 2 plugins to our POM file:

<build>
    ...
    <plugins>
    ...
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>versions-maven-plugin</artifactId>
            <version>2.1</version>
        </plugin>
        <plugin>
            <artifactId>maven-scm-plugin</artifactId>
            <version>1.8.1</version>
            <configuration>
                <tag>${project.artifactId}-${project.version}</tag>
            </configuration>
        </plugin>
    </plugins>
</build>

The devil is in the details, of course, so let's see what should happen now during your release process. Note that I am using Bamboo in this example. You should make sure to modify it for your CI server's variables. The process is:

  • CI checks out the latest revision from SCM
  • CI runs mvn versions:set -DnewVersion=${bamboo.buildNumber}
  • Maven compiles the sources and runs the tests
  • Maven publishes the binaries into the Artifact Repository
  • Maven tags the version

    Steps 3, 4, and 5 are run with one command: mvn deploy scm:tag.

That's it. We have one specific revision being tagged for a release. Our history is cleaner, we can see exactly which revision/refs were used for a release, and it's immune to pom.xml changes being committed during the process. Much better!

Gotcha!

Ok, this all works great, unless you have a bad setup. The primary culprit of a bad setup is distinct modules having snapshot dependencies. Remember how I told you snapshots are an anti-pattern? Here's the general rule: if the modules are part of the same build/release lifecycle, they should be put together in one source repository, and should be built/versioned/tagged/released as one unit. If the modules are completely separate, then they should be in a separate source repository, and you should have fixed-version dependencies between them to provide a consistent interface. If you are depending on snapshot versions, you are creating non-repeatable builds, as the time of day you run the build/release will determine which exact dependency you fetch.

Dev Environments with Vagrant

If you work with a number of clients, one issue pops up over and over: setting up a new machine. Sometimes, you're lucky and a client will let you use your own machine. More often than not, though, you're forced to use their hardware. This usually involves reading a bunch of out-of-date wiki documents, asking people around you, and maybe contributing back to the wiki for the next person. If you're lucky, you'll get this done in a day or two. More typically, it can take a week or so.

If you're a manager, this should also worry you. You're making these developers, who you likely spent a good amount of money on recruiting and compensation for, spend a week or so of down time just setting up their computer. Even taking a conservative estimate of $65/hr, that means you're spending $2600 for somebody to get up and running. Now imagine you're paying prevailing market rate for consultants, and that figure rises dramatically.

At Dev9, we like to automate. Typical payback times for automation projects may be in the months or even years, but imagine you could shave 2-3 days off of new machine setup time for each developer you onboard. This kind of tool could pay for itself with your first new developer, with better returns for each additional developer. So, what do we do?

Code

This article is going to involve some code. If you want to play along at home, you can view our repo at https://github.com/dev9com/vagrant-dev-env.

Enter Vagrant

Vagrant is a tool perfectly designed for our use case. It utilizes virtual machines (I use Oracle VirtualBox). VMs used to be clunky, and kind of slow. But we're living in an age where laptops come with 16GB RAM and 500+GB SSD drives, along with 8-core processors. We are living in an age of abundance here, and it would be a shame to let it go to waste :).

The Build

What we are going to build is a development machine image. While companies can benefit from creating this and handing it to new hires, it's just as valuable if you have multiple clients. I can transition between provided hardware with ease, because I'm just using them all as a host for my VM. In addition, I can make a change to the provisioning of one VM, and propogate it quickly to the others.

This VM is going to be a headless VM. That means there is no UI. We will interact with it over SSH. This helps keep it fast and portable. I have no problem using IntelliJ IDEA on Windows or Mac or Linux, but what I always want is my terminal and build tools. So, that's the machine we're going to build.

Initial Setup

First, get Vagrant and VirtualBox installed. Maybe clone our git repo if you want to follow along. That should be all for now!

This is something that only comes with research, but our base image is going to be phusion/ubuntu-14.04-amd64. This is the foundation of all of our images. This one was chosen because it plays really nicely with Docker. Full disclosure, we are Docker's PNW partner, so this is actually important to me :).

Step 1: A Basic Box

The first step in anything software related seems to be hello world. So, to create a Vagrant instance, we create a Vagrantfile. Clever, right? And even better, your Vagrantfile is just Ruby code -- like a Rakefile. The simplest possible Vagrantfile for what we're doing:

box      = 'phusion/ubuntu-14.04-amd64'
version  = 2

Vagrant.configure(version) do |config|
    config.vm.box = box
end

Let's go through this. As I mentioned above, our base box is that Ubuntu distro. You can just as easily choose CentOS, SUSE, CoreOS, or any number of other images. People even have entire dev stacks as one image! The version identifier is just signalling to Vagrant which configuration API to use. I've personally never seen anything except 2, but given the concept of versioned APIs in the REST world, it's not difficult to see how they plan to use it in the future.

So, to run this, we just type vagrant up:

[10:50:48 /ws/dev9/vagrant-dev-env/step1]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step1_default_1409766665528_9289
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2200 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2200
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step1

[10:51:23 /ws/dev9/vagrant-dev-env/step1]$

Notice that this took all of about 35 seconds. Most of the output is rather self-explanatory. So, this box is "up" -- how do we use it?

[10:51:23 /ws/dev9/vagrant-dev-env/step1]$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2
vagrant@ubuntu-14:~$

That's it. There's your Ubuntu VM! Let's say we want to take it down, delete it, and bring it back up:

vagrant@ubuntu-14:~$ exit
Connection to 127.0.0.1 closed.

[10:55:23 /ws/dev9/vagrant-dev-env/step1]$ vagrant destroy -f
==> default: Forcing shutdown of VM...
==> default: Destroying VM and associated drives...

[10:55:30 /ws/dev9/vagrant-dev-env/step1]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step1_default_1409766945197_31521
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2200 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2200
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step1

[10:56:02 /ws/dev9/vagrant-dev-env/step1]$

So under a minute to destroy a VM and bring up an identical one. Not bad, Future. Not bad. A box like this is fine and dandy, but we probably want to do more with it.

Step 2: Basic Provisioning

Even at a base level, let's say we want Java. So, let's tweak our Vagrantfile a bit:

box      = 'phusion/ubuntu-14.04-amd64'
version  = 2

Vagrant.configure(version) do |config|
    config.vm.box = box

    config.vm.provision :shell, :inline => "apt-get -qy update"
    config.vm.provision :shell, :inline => "apt-get -qy install openjdk-7-jdk"
end

If you now run vagrant up, you'll get a machine with Java installed:

[11:27:33 /ws/dev9/vagrant-dev-env/step2](git:master+?)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step2_default_1409768866354_7342
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2201.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2201 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2201
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step2
==> default: Running provisioner: shell...
    default: Running: inline script

[ clipping a bunch of useless stuff -- you know how it is. ]

==> default: 1 upgraded, 182 newly installed, 0 to remove and 109 not upgraded.
==> default: Need to get 99.4 MB of archives.
==> default: After this operation, 281 MB of additional disk space will be used.
[ ... ]
==> default: done.
==> default: done.

[11:30:15 /ws/dev9/vagrant-dev-env/step2]$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2

vagrant@ubuntu-14:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

vagrant@ubuntu-14:~$

And there we go. A scripted buildout of a base Ubuntu box with Java. Of course, shell scripts can and do go wrong. They get progressively more complex, especially as you start having components that mix and match. Additionally, since all developers should be getting familiar with Continuous Delivery concepts, let's take this opportunity to explore a little tool called Puppet

Step 3: Buildout with Puppet

Puppet is pretty awesome -- and so are Chef and Ansible. I chose Puppet initially because I could get it working quicker. I'm not making a value judgement on which one works best.

The idea with Puppet is that you use the puppet files to describe the state you want the machine to be in, and Puppet manages getting it there. Vagrant also has first-class support for Puppet. Remember above, how we're provisioning with inline shell scripts? Well, Vagrant also has a Puppet provisioner. If you've never used Puppet before, that's OK, the examples should give you a basic overview of its usage.

To set up a basic Puppet provisioner, let's do something like this in our Vagrantfile:

box      = 'phusion/ubuntu-14.04-amd64'

Vagrant.configure(2) do |config|
    config.vm.box = box

    # Now let puppet do its thing.
    config.vm.provision :puppet do |puppet|
      puppet.manifests_path = 'puppet/manifests'
      puppet.manifest_file = 'devenv.pp'
      puppet.module_path = 'puppet/modules'
      puppet.options = "--verbose"
    end
end

This also seems pretty straightforward. Again, don't worry too much if you don't know Puppet. Those paths are relative to the Vagrantfile, so your directory structure (initially) will look like this:

[12:43:47 /ws/dev9/vagrant-dev-env/step3]$ tree
.
├── Vagrantfile
└── puppet
    ├── manifests
    │   └── devenv.pp
    └── modules

In the provisioner, we're giving it 2 paths. Manifests is where puppet will look for manifest files. A manifest is a basic unit of execution in Puppet. A manifest is made up of one or more resource declarations -- the desired state of a resource. These resource declarations are the basic building blocks. So, to start, let's just get our previous example working in Puppet. Modify your devenv.pp to look like this:

group { 'puppet': ensure => 'present' }

exec { "apt-get update":
  command => "apt-get -yq update",
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

exec { "install java":
  command => "apt-get install -yq openjdk-7-jdk",
  require => Exec["apt-get update"],
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

This is pretty self explanatory, with one caveat: Order doesn't matter. Puppet tries to optimize the running and management of dependencies, so the steps will not necessarily be executed in the order you expect. This is why the require: declaration exists on the install java exec. We are telling Puppet to execute the apt-get update before this step. Notice also that it's a capital E in a require -- that's just the way Puppet does things. I'm sure somebody has a better explanation, but for now just consider it the required convention.

So, let's bring this box up:

[12:56:35 /ws/dev9/vagrant-dev-env/step3]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step3_default_1409774249245_48069
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2202.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2202 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2202
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step3
    default: /tmp/vagrant-puppet-3/manifests => /ws/dev9/vagrant-dev-env/step3/puppet/manifests
    default: /tmp/vagrant-puppet-3/modules-0 => /ws/dev9/vagrant-dev-env/step3/puppet/modules
==> default: Running provisioner: puppet...
==> default: Running Puppet with devenv.pp...
==> default: stdin: is not a tty
==> default: Notice: Compiled catalog for ubuntu-14.04-amd64-vbox in environment production in 0.07 seconds
==> default: Info: Applying configuration version '1409774267'
==> default: Notice: /Stage[main]/Main/Exec[apt-get update]/returns: executed successfully
==> default: Notice: /Stage[main]/Main/Exec[install java]/returns: executed successfully
==> default: Info: Creating state file /var/lib/puppet/state/state.yaml
==> default: Notice: Finished catalog run in 117.84 seconds

[12:59:48 /ws/dev9/vagrant-dev-env/step3](git:master+?)
$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2

vagrant@ubuntu-14:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
vagrant@ubuntu-14:~$

And now we have puppet provisioning our system! The output is also much nicer, and you can get some hint of how Puppet works -- there are stages, it gives us return values, it saves a state file, and there is a concept of environments. Any wonder why Puppet is so popular in the DevOps world? When you hear DevOps folks talking about a VM as a unit of deployment, they're not kidding. It's just a file.

Of course, this is basically cheating. The Puppet way is to describe the state of a system, and this is not describing the state of the system, it's describing commands to run. While some of you may like that, there are different frameworks for that. This is a declarative, stateful framework, so let's not try to turn it into glorified shell scripting. So, we can change that up a bit...

Part 4: Actually Using Puppet

For this step, the Vagrantfile doesn't change. We're just changing the Puppet files. Check this out:

group { 'puppet': ensure => 'present' }

exec { "apt-get update":
  command => "apt-get -yq update",
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

package { "openjdk-7-jdk":
  ensure  => installed,
  require => Exec["apt-get update"],
}

Now we're declaring state. We're just telling puppet to make sure openjdk-7-jdk is installed, and run an apt-get update beforehand. Since apt-get update is idempotent on its own, this whole definition is now idempotent. That means we can run it multiple times without issue!

Let's bring the box up:

[13:36:30 /ws/dev9/vagrant-dev-env/step4](git:master+!?)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step4_default_1409776604916_69804
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2202.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2202 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2202
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step4
    default: /tmp/vagrant-puppet-3/manifests => /ws/dev9/vagrant-dev-env/step4/puppet/manifests
    default: /tmp/vagrant-puppet-3/modules-0 => /ws/dev9/vagrant-dev-env/step4/puppet/modules
==> default: Running provisioner: puppet...
==> default: Running Puppet with devenv.pp...
==> default: stdin: is not a tty
==> default: Notice: Compiled catalog for ubuntu-14.04-amd64-vbox in environment production in 0.17 seconds
==> default: Info: Applying configuration version '1409776705'
==> default: Notice: /Stage[main]/Main/Exec[apt-get update]/returns: executed successfully
==> default: Notice: /Stage[main]/Main/Package[openjdk-7-jdk]/ensure: ensure changed 'purged' to 'present'
==> default: Info: Creating state file /var/lib/puppet/state/state.yaml
==> default: Notice: Finished catalog run in 134.04 seconds

There we go! We've declared the state of our machine, and Puppet does its magic. Of course, Puppet can do a whole lot more -- file templating, adding and removing users, setting up configuration, making sure some packages are NOT present, etc. This is YOUR machine -- install git, maven, oh-my-zsh, etc.

Also, keep in mind that Puppet is a really in-demand skill. You might find yourself with a valuable new tool.

This Month in Continuous Delivery: Dev9's Top 5

As August comes to a close, we'd like to share some of the helpful articles and posts we've come across this month. Here are our top 5 Continuous Delivery and DevOps news for August:

Is Your Organizational Culture Ready for DevOps?:  http://DevOpsGuys Blog

Testing in a Continuous Delivery world: SDTimes

Balancing Quality and Velocity in Agile: InfoQ

A/B Testing + Continuous Delivery = Everyday Product Launches: InfoQ

Continuous Delivery Vs. Continuous Deployment: What's the Diff?: Puppet Labs

Enjoy, and see you next month!

Smarter Acceptance Testing with Personas

A while back, I gave a talk on the combination of Cucumber, PhantomJS, and WebDriver. There is a project on GitHub that contains the sample code. This is a small follow-up to that talk with the idea of how to manage your Cucumber scripts.

Writing BDD

To make the best use of BDD, you should write very rich steps. This is entirely the point of using behavior frameworks. In fact, if your BDD scenarios can't be used as documentation, you've done something wrong. I'm going to assume a knowledge of the Given/When/Then structure. Let's take a look at a bad example:

Scenario: User Signup
  Given A user is signing up for our site,
  When he enters his First Name,
   And enters his Last Name,
   And enters his Email,
   And re-enters his Email,
   And enters a new Password,
   And specifies his Gender,
   And enters his Birthday,
   And submits his request,
  Then an account is created,
   And account name is set as the email address.
   And a confirmation email is sent to Fred.

Now, this would be fine, but we're basically writing a bunch of steps that are quite fine-grained. Compare with something like this:

Scenario: User Signup
  Given A user is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
   And a confirmation email is sent.

Remember, the point of BDD stuff is to make it closer to how you would describe it when talking to somebody. You won't list out every single field when you're talking about it, so make your code-behind more powerful!

Test Data

The biggest chronic problem with testing of any sorts is test data. That is, to fully exercise a complex system, you need your data in a known state, and you need it to be consistent. Creating this data is difficult enough. A tool like Flyway can assist with setup/teardown/reset of data. But there's another problem: How do we know what pre-canned test data set to use? With enough test cases and enough scenarios to exercise, it can be difficult to remember the exact configuration of the test data. Or, if you are writing a test that uses a data set that is 99% correct, even changing that 1% can break other tests that were relying on that data to be fixed. So how do we deal with this?

In the BDD world, we have a fascinating option. Remember way back when we were all getting on board the Agile/Scrum train? The proper procedure is to write user stories in a manner like so:

As a non-administrative user, I want to modify my own schedules but not the schedules of other users.

This could be translated directly into gherkin:

Scenario: Schedule Management - User
  Given a non-administrative user
  When I attempt to modify my calendar
  Then the modification is successful

Scenario: Schedule Management - Admin
  Given a non-administrative user
  When I attempt to modify another user's calendar
  Then the modification is not successful

This looks fine, but "non-administrative user" is quite vague. And what happens when other people want to use code like that? How can we make this more extensible without sacrificing readability or maintainability? Enter personas.

Personas

The user story from above is the classic way most of us were taught to write user stories. However, when Agile was still young, the idea of personas was quite popular. From this site, we can see:

A persona [...] defines an archetypical user of a system, an example of the kind of person who would interact with it. [P]ersonas represent fictitious people which are based on your knowledge of real users.

Personas are different [from actors] because they describe an archetypical instance of an actor. In a use case model we would have a Customer actor, yet with personas we would instead describe several different types of customers to help bring the idea to life.

It is quite common to see a page or two of documentation written for each persona. The goal is to bring your users to life by developing personas with real names, personalities, motivations, and often even a photo. In other words, a good persona is highly personalized.

This is actually a very powerful construct in our BDD world. Instead of generically referring to "a user" or "an admin," we can refer to personas. Let's take that signup example again, but apply a persona:

Scenario: User Signup
  Given Fred Jacobsen is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
    And a confirmation email is sent.

Not very different, right? But behind the scenes can be a different story. Let's look at what it might have looked like before:

Given /^A user is singing up for our site$/ do
  user = find_user(admin: no, valid: true, ...)
end 

Compare to:

Given /^(\w+) is signing up for our site$/ do |name|
  user = user_lookup(name)
end

In the first case, you need to search for a user. If the data changes underneath you, this function may not return the same result every time. In the second case, however, a user lookup by name will return you a consistent entry. Let's take a look at the Schedule Management example again, but with Personas:

Scenario: John can change his own schedule
  Given John Doe is using our app
  When John attempts to modify his calendar
  Then the modification is successful

Scenario: John cannot change Jane's schedule
  Given John Doe is using our app
  When I attempt to modify Jane Smith's schedule
  Then the modification is not successful

Again, we can look the users up by name here. But we can also have some documents behind the scenes explaining the personas. For this example, I'm going to use markdown:

# John Doe #
John Doe is a paying, non-administrative user on our site. He is a 40 year old dad of 2 boys and 1 girl, and uses our product to manage the children's activities between him and his wife, Heather Doe. He checks his schedule each morning and each evening, but does not check it throughout the day. 

## Payment Information ##
John has a monthly subscription, and is up-to-date on payments. 

## Account Setup ##
John has an avatar image and a verified email address. He has not entered his phone number for SMS updates

This gives us a wealth of information. We know that John is an active member, with up-to-date payments, has an avatar, has a verified email, and doesn't have a phone number. So the test data behind this -- what if we were smart about the way we used Flyway to manage it as well?

file: flyway/personas/001-John_Doe.sql

insert into users(name, email) values ('John Doe', 'loverboy85@hotmail.com');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar) values (@id, '/images/avatars/johndoe.png');

Now we can have a 1:1 correspondence between a persona and the data that powers it. Say somebody comes along and basically wants everything John Doe has, except he wants somebody with a phone number entered? Instead of modifying John or trying to figure out if other tests will break, we just create a new user and persona:

file: flyway/personas/002-Jenny_Smith.sql

insert into users(name, email) values ('Jenny Smith', 'jsmith17@compuserv.net');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar, phone) values (@id, '/images/avatars/jennysmith.png', '555-867-5309');

Moving Forward

So what have we done, really? From an abstract sense, we've created named collections of test data. They should (and ought to) be immutable. Any change that developers want to make results in a new persona. If we decide a persona is no longer useful, it also makes an easy search through your code to find all usages. Given this setup, it's a small leap to creating very complex setup:

file: resources/personas/Nevill_Wadsworth_III.md

# Nevill Wadsworth III #
Nevill comes from an old-money family with assets in the tens of millions of dollars. He manages 5 family trusts, and uses our trading system to manage all of their assets. 

## Payment Information ##
Nevill is up-to-date on payments. Payments are deducted automatically via ACH. 

## Account Info ##
The 5 trusts that Nevill manages: 

### Trust 1: Wadsworth Unlimited ###
Wadsworth Unlimited is a small trust with the stocks of 2 companies. This is the dividend account for him. The stocks in this account are: 

  stock |    qty  | purchase price | purchase date
  ------------------------------------------------
  MSFT  | 150,000 |         $27.45 | 10/23/1997
  BRK.A |   1,000 |      $3,544.18 | 09/16/2001

### Trust 2: Wadsworth International ###
Wadsworth International is the trust that manages all of the family's assets outside of the US. For tax purposes, they have not repatriated the money, so it can only be spent outside of the US. The assets in this account are: 

You see where I'm going. Personas don't have to be limited to humans and/or clients, as well. They could be companies, external agents like regulatory auditors, or even a pet (if you're running a vet, for example).

The promise of BDD is executable documentation. It's not hard to imaging taking these scenario files, combining them with the persona markdown, combining those with the persona SQL, and creating a fully cross-referenced site of the test cases, combined with the personas, combined with the test data generation. That's left as an exercise for the reader.

Summer of Seminars

Please join us for our June and July seminars in Kirkland, WA!

Introduction to Docker - 6/19

Heard about Docker, but not sure what it is? Interested in learning about how Docker can improve your development speed? How it can help you standardize and improve your environment management? Or how Docker can help you save money and get more bang for your buck with both on-premise environments and cloud-based services?

This hour-long presentation will walk you through Docker in clear, straight-forward language. We will walk you through how to approach Docker PoCs and evaluations, and how to integrate Docker into your existing environment.

Escape The Matrix - 6/26

In this presentation, we'll present a common industry organizational design pattern - the cross-fuctional matrix - and learn how it works, alternatives, and strategies for both mitigation and transformation.

Intro: Cucumber, PhantomJS, Selenium - 7/10

Cucumber is an popular platform for building test suites that are easy for business stakeholders to understand. In this session, we'll introduce and compare traditional JUnit-style tests with Cucumber-style BDD testing. We'll show how to add in Selenium-based testing to automate UI/browser testing, and then we'll show how to use PhantomJS to run those Selenium tests quickly. We'll also talk about how this integrates into a continuous integration environment, and walk through an example of how all of this is hooked together on GitHub.

Introduction To Kanban 7/31

Are you trying to introduce change into your organization, and feel like you’ve hit a brick wall? Do you want to move forward with Lean initiatives, but are not sure how to apply them “in the trenches”? Have you adopted Agile principles and practice Scrum, but find it limiting when dealing with multiple teams and/or cross-functional organizations?

Kanban is an increasingly popular system for introducing incremental, evolutionary process into an organization. Based on Lean principles, it offers a way to move beyond basic Scrum and improve process in a consistent, manageable fashion. Dev9 has helped clients transition to Kanban, and we would like to share our engagement experiences.

Please join us for a discussion of Kanban and how it supports more effective technology management!

6/19: Introduction to Docker

Introduction to Docker

Fast, Standardized, Managed Environments

Heard about Docker, but not sure what it is? Interested in learning about how Docker can improve your development speed? How it can help you standardize and improve your environment management? Or how Docker can help you save money and get more bang for your buck with both on-premise environments and cloud-based services?

This hour-long presentation will walk you through Docker in clear, straight-forward language. We will walk you through how to approach Docker PoCs and evaluations, and how to integrate Docker into your existing environment.

Performance 101: Causes of Bad Performance

In the previous articles, we talked about some language and concepts, and then we got into tools and some more math. This has allowed us to identify places we should spend our time.

Why?

The first question we ask ourselves when we're tasked with performance enhancements is why? There are a lot of factors to consider, however, there are some basic rules you can start with.

The first breakdown we have is identifying if the problem is a contention or algorithm performance issue. This maps almost directly to our throughput vs. latency talks from the first article. These are the vast majority of performance issues you might encounter. Contention is what prevents us from getting the throughput that we want.

Algorithm performance affects latency. Everything works as expected, with little contention, but it's still not fast enough. These problems, while they're more fun to fix, can also mean huge changes. Making a sorting algorithm or a queue implementation or a face detection algorithm faster is an example of algorithmic performance.

NOTE: While this article is mostly about identifying performance issues in your application, do not overlook the importance of identifying and fixing performance issues in software that your app depends on. Database servers are a common source of issues, for example. You should be running the contention checks on these services if your tests slowness in that area

Algorithm Performance

Since we are going to spend the majority of our time talking about contention, let's spend a few minutes talking about algorithm performance. I consider an algorithm performance problem one where increasing the speed of the computer is the only way to improve performance. All contentions are minimized, but the code is running into fundamental limits from the hardware.

Big-O

What we are talking about is the basis for Big-O notation. When you see O(n) or O(1) or O(n^2), these are Big-O examples. It helps us figure out the behavior of the algorithm with no restrictions. A merge sort (O(n log n)) is faster than a bubble sort (O(n*n)) because it needs to consider fewer things.

Data Structures

This type of performance issue can also arise when you use improper data structures. This is why it's important for programmers to at least be aware of the differences between arrays, linked lists, trees, and maps.

Caching

The last bit of note here is one we have all probably done: caching. If you've ever used ehcache or terracotta or memcached or even Hibernate, you've taken advantage of caching. The ability to skip a lengthy computation if the value doesn't change too frequently can mean the difference between 2 and 100 servers to handle the load you want. This can often be used for a quick performance gain if it's your first pass through the application.

Contention

A computer has a finite amount of resources. Contention is when one of these resources is utilized to its maximum capacity, and there is a queue of work waiting to be done. That resource simply can't handle any more work. Common sources of contention are discussed below, as well as some ways to mitigate the effects.

Disk Contention

Disk contention is when we spend too much time reading and writing data on disk. Because a spinning disk needs to get to the point on disk where you dsata lives, it takes some amount of time to get data back. If you have a lot of read and write tasks, your request may queue up behind others who haven't been served yet. This will result in long i/o wait times.

To investigate this issue on *nix-like systems, use the tool iostat. An example of a healthy disk setup:

Healthy Setup

You can see all the different devices in this example (thanks to Roger for putting it on the internet). The interesting things are the avgqu-sz and await. avgqu-sz is the average size of the request queue. In a mostly-idle system, this will be somewhere around zero, which means all requests are served quickly. In a system with disk contention, this number will grow rapidly. Await is the average time (in milliseconds) that a request took to process -- including queue time and serving time. Let's look at an unhealthy iostat:

Unhealthy iostat

There are a lot more devices on this machine, but you can see certain devices have a large queue size, and you can see the await times. If you want to serve a webpage in 100ms, but your disk is taking 105ms to service a request, you're going to have a bad time.

Blocking vs. Non-Blocking IO

You've surely heard about node.js by now. It's famous for its non-blocking IO. Whenever your application does something that needs to talk to disk (or something that would cause a wait), it yields to another task, and lets that original task finish later when the data is available.

Java also has some of this. There's a java.io package for blocking IO, and it's probably what most of us use. There's also the java.nio package for non-blocking IO. If you find that waiting for disk becomes an issue, see if you can use non-blocking IO calls to speed up your server's ability to handle more.

Reducing Contention

To reduce disk contention, there are four common strategies. First, you can buy your way out. Solid-state drives (SSD) are significantly faster than spinning disks for random seeks. If your access pattern is a lot of small reads and writes, this will get you very significant benefits in performance. If, instead, your performance problem is one of reading and writing very large chunks of data, an investment in an enterprise storage system may be warranted. A RAID setup combines multiple disks into one, providing (usually) both fault tolerance and increased throughput, since multiple drives contribute to the end result.

The second common strategy is to simply reduce the amount of data that is on the disk. If you can get by with smaller files, that may increase speeds. Compression can be used on large files, as expanding it in-memory can be faster than reading uncompressed from the disk. A different serialization format can result in significant disk savings (JSON, for example, is quite verbose compared to CSV).

The third common strategy is to do batch reads/writes. This is what your hard drive controller does, which is why it's so dangerous to just unplug your computer. There is a disk cache that waits for some amount of data, then writes it all to disk at once. This is how some high-performance NoSQL engines work. They keep as much working data in memory as possible, and flush to disk periodically.

The fourth common strategy is to do more caching. Your operating system likely does some of this for you, by caching frequently-used files in memory. Your application can do the same. If the data on disk changes infrequently, you might want to read it to memory when your app starts up. If it's very large, and you do a lot of searching, see if you can index the file, or sort the file, so you can use faster algorithms like a binary search to get to your data.

This problem can be magnified if you use disk stores backed by networks. NFS, NAS, Samba, SAN -- these are all disks backed by network. While they may offer unparalleled data security and storage capacity and data mobility, you incur some overhead since it needs to communicate over a network. That leads us to our next step...

Network Contention

Networks have a lot of the same issues we have been discussing (latency, throughput, queueing). There's also the issue of network card capacity and network connection capacity. Most servers these days should have a gigabit ethernet card, and high-end servers should be using 10gb ethernet. But if your switches/routers are only capable of 10mbit, you will have a problem.

To look for potential network issues, there are two tools that are useful. First, there's the venerable netstat. This tool is useful to inspect network connections on the server. This can help you diagnose if you need additional threads to handle connections, if you're leaking connections, or if your system is being overwhelmed by connections. Second, there's a little utility called iftop. It periodically tracks the throughput of all network devices. It can drill down to a single connection, aggregate all connections, and track peaks and means. If you have a 100mbit network card and see ~10MB/sec in iftop, there's a good chance you're maxing out your network. Here's a little screenshot of it running with -t (to just print text to the console instead of an interactive app):

iftop

Solving network issues can be tricky business. Lucky for us, it's also probably not the source of problems. I have seen it be the source of problems one or two times. In those cases, if you took a thread dump of the process under load, you would see multiple threads blocking on a process that is waiting in the function socketRead0 or socketWrite0 -- native code that actually does the socket communication. Sadly, thread dumps are the most reliable way I've found to differentiate between network issues and other forms of contention.

CPU Contention

CPU contention is fun, in a way. We all know the command 'top' to display system load and running processes. In the modern world, with lots of threads and processes, it can be helpful to get a bit more detailed in our analysis. The tool htop breaks down each core's load, and provides a lot more than the default top:

htop

Notice how all 4 cores are showing a high level of usage in this screenshot. I highly recommend this tool to verify that your CPU is overloaded.

While this is one of the easiest metrics to inspect, it can be very difficult to fix. The system just has too much to compute to do much more. This is where algorithmic fixes and better data structures come into play. If you notice the system bogging down but the CPUs are not maxed, you have some other contention slowing it down.

System Load

The load average from the above screen shot shows the 1-, 5-, and 15-minute averages of load. The load average is an exponentially dampened snapshot of running and runnable processes correlated with wait queue length. With multi-core machines, it gets even trickier. If it was truly just CPU, a 4-core machine with a load average of 1.00 is ambiguous. Is that 1 core fully maxed (and thus ripe for paralellization), or all 4 cores running at 25%? Maybe 2 cores at 50%? With multiple cores, a load average of 4 doesn't necessarily mean the system is under any stress. This is where tools like htop above help diagnose the issue

Memory Contention

Memory contention is when there is more memory being used than available on your system. Thanks to swap partitions, this shouldn't crash a machine. However, it's likely to destroy your performance. The more tuned your app is, the more detrimental swapping will be. Memory contention can also come when you run into out-of-memory (OOM) problems, or issues with garbage collection in managed apps.

Swap-based issues are easy to spot, as top or htop will tell you the amount of swap they're using. For a production system, you ideally want no swap used. Putting 128GB or more on one machine is perfectly doable these days.

Out-of-memory usually surfaces as a process that dies unexpectedly with no messages. The only way to fix this is to consume fewer resources. This, again, is where better data structures or more compact object representations may help. You may also have memory leaks.

Garbage Collection

Garbage collection tuning, especially in Java, is almost a job unto itself. There are a lot of tunable parameters. There is the permgen vs. the heap. There's the issue of heap resizing vs. fixed-size heap. When you profile your Java app, you really want to see the classic saw-tooth pattern:

GC-Sawtooth

This is a generally healthy system. The garbage collector kicks in and returns the heap to about the same size. A more unstable or growing-memory system looks like this:

GC-Bad

Notice how the "free heap" blue line climbs, then eventually drops, despite garbage collection happening on the green line. When the free heap is near zero, Java will spend a lot of its time trying to free up the heap, including multiple stop-the-world pauses. These can range from a second or so to multiple minutes, depending on heap size and object counts. If you let it run like this long enough, it will probably become unresponsive and eventually crash with an out-of-memory error.

Tuning the GC is beyond the scope of this article, but it can dramatically improve performance.

Memory Leaks

The last thing to look out for is memory leaks. Technically it's impossible to do this in Java if you're not using unsafe libraries, but in practice it's a problem. The biggest issue is dangling references to objects that you no longer need. It keeps these objects around in perpetuity, and that could eventually kill your heap. Using tools described in the previous article (visualvm, jprofiler, yourkit), you can inspect where these objects are created, which ones take the most space, and what types of objects they are. This can be very helpful in tracking down excessive memory usage.

There is a fascinating issue in Java that has cropped up occasionally. If you have a very large string (say, a JSON or XML document, or a large text file you've put into a string), then take a substring of it, that substring is just a windowed view of the larger text string. This is good when your strings are small, as it prevents a lot of reallocation. However, when the source is very large, your substring holds a reference to that large document, meaning your memory usage is far larger than normal. This "leak" was fixed in OpenJDK 7u6. If you're still on JDK 5 or 6, you're probably being affected by this.

Lock Contention

This is where things get really tricky. I've also found that it's the source of a lot of issues, so it's good to get well versed in this kind of contention. Because multiple threads accessing the same resource (variable, array, database connection, etc) can stomp on each other, there needs to be a way to control access to these sensitive bits. In Java, we often put synchronized on the method definition, or put a critical block inside a synchronized block. Behind the scenes, it creates a lock so that only one thread can execute here at a time.

When there are a lot of processes vying for the same lock, you are slowing them all down. If your locks are implemented as spin locks, it may exhibit 100% cpu usage while it waits. If you take a stack trace, you will see "Waiting on 0xXXXXXXX" for threads that are waiting for a lock. A useful tool for some users is TDA, and this is how it presents there:

Thread Dump

Not all locks are deadlocks. Deadlocks are two threads waiting for locks that the other holds, with no way to make progress. Most profilers offer tools to automatically detect deadlocks, and will make your life significantly easier.

Lock contention is so expensive that multiple languages/frameworks/patterns are designed to avoid it. Functional programming languages often avoid this issue because they do little-to-no variable mutation. Immutable data structures simplify your life significantly. Because they can't change, it's safe for multiple threads to access the data. Lock-free queues are popular in some circles, but they can be nasty to code up correctly if you're not extremely well-versed in this specialty.

Mechanical Empathy

If you're going the ultra-high-performance route, there's a concept you should be aware of called mechanical empathy. This describes a way of organizing your parallelism to minimize the burden on the CPU, especially context-swaps. When you're trying to design sub-millisecond responses in a managed language, it can be difficult to achieve using traditional methods.

Mechanical empathy was invented and popularized by the LMAX Disruptor. From the PDF of the whitepaper, this graphic comes to explain the cost of locks:

Lock Cost

You see that as contention (and thus arbitration) increases, your throughput gets tanked. LMAX Disruptor is an attempt to design a system to minimize locking and minimize expensive context switches.

This is not a simple implementation, and it's not a drop-in replacement for, say, ConcurrentHashMap. Read the link above to get a lot more information and see if it's the right approach for you.

Conclusion

This wraps up our tour of the most common cases of poor performance, and some things to look out for. Becoming proficient at analyzing and tuning the JVM as well as analyzing and tuning your own applications, you will have a much deeper understanding of your own code, as well as a much deeper understanding of how the JVM works, which might make for better code. Go forth and profile!