Dev9 Coffee Talk: the Psychological Benefits of Continuous Delivery

Welcome back to Dev9 Coffee Talks! This interview was with Jason Marshall and Keith Bloomfield, both of whom are developers at Dev9. They will be sitting down with us a few times over the next few weeks to talk about a number of different topics.

I started off this time by asking what the deployment process is like in a traditional environment, and how Continuous Delivery methodology is different. Jason jumped in right away. “Traditionally, developer teams write the code, QA checks it off, and Ops deploys it. It is a slow process, even when it done well.” Jason said. “More often than not it’s painful too. Particularly if you are dealing with preexisting customers and live deployments.”

“There is just so much that can go wrong with deployments" he continued. “When you are changing the shape of the data, changing schema, and trying to preserve all of the old data there are a lot of points where a product can fail. That’s why deployment time is so stressful for developers. Deployment day means pagers, all-nighters, and scheduled downtime. That’s where the image of developers hunching over glowing monitors in the dark with soda and cold pizza came from.”

Keith was the first one to stop laughing. “When large deployments are done all at once, it is extremely taxing.” He said. “The late nights mean you and your team get sleep deprived, and when you are sleep deprived you make mistakes. You attention to detail gets worse, and you want to get home so bad you just focus on the obvious problems. Then all the non-obvious problems slip through the cracks and crop up later.”

“Once all the red lights go off, victory is declared.” Jason chimed in. “Then, just as you pull into your driveway at home, the pager goes off and someone tells you everything is broken.”

“Continuous Delivery is about getting rid off all of the pain. Really, you could say that the goal is to make the deployment as boring as possible.” Jason went on. “If you are fresh when you hit the deploy button, you can figure out any problem. The best part, is that if you have set things up so that you are fresh when you hit that button, so many other things need to have been done right, that you probably don’t even need to be fresh.”

Stay tuned, we'll have more from Jason and Keith soon.

Intro to Go for Java Developers

Unless you've been living under a rock, or deep in crunch mode for several years, you've likely heard of Go (AKA golang), Google's new-ish language. It was designed as an alternative to the growing complexity of C++, especially around concurrency. It's also attracting droves of Python developers, as it offers dramatically better performance, all the fun of type safety, and a syntax that's more comfortable than Java or C#.

But I like Java just fine

However, for us Java (and C#) developers, we're told every new language is the one that will save us from ourselves. Let's take a quick tour of Go and see what it offers.

To this end, I won't bore you with explaining the basics of programming. I will show you the key differences with Java, and why you might consider Go for your next project.

Playground

For all of the examples listed in this article, you'll see a link next to 'Play this' -- this refers to the Golang Playground. This is a quick and easy way to test out the language without installing anything.

Hello World

Of course, before we get started, here is the canonical 'Hello World' for Go:

package main

import "fmt"

func main() {
    fmt.Println("Hello, world!")
}

Play this

This syntax is familiar to most developers in C-style languages.

Is it Object-Oriented? Functional? Procedural?

Go has constructs from all of these schools of thought, but with some modern best practices built in. For example, we've all heard these mantras before:

For this reason, Go has made some interesting choices. First off, it has no concept of "Objects" -- a single abstraction that represents both state and behavior. It just has the idea of Types -- in C-like structs:

type Address struct {
    Number string
    Street string
    City   string
    State  string
    Zip    string
}

Notice also that the types follow the declaration, and upper-cased letters are used to start identifiers.

So, this would almost seem like a purely procedural language. If you've used Scala or C#, however, you're probably familiar with the idea of Extension Methods. This is also possible in JavaScript (by modifying the object prototype), Groovy (by manipulating the metaclass), and Ruby (monkey-patching). Instead of having those as a separate concept, Go makes those the only way to define behavior for a type:

package main

import "fmt"

type Address struct {
    Number string
    Street string
    City   string
    State  string
    Zip    string
}

func (a Address) Location() {
    fmt.Println("I’m at", a.Number, a.Street, a.City, a.State, a.Zip)
}

func main() {
    address := Address{Number: "137", Street: "Park Lane", City: "Kirkland", State: "WA", Zip: "98033"}
    address.Location()
}

Play this

Notice some more neat things here. We have named constructor parameters. We did not provide a type to the variable 'address'. The pattern := tells the Go compiler to infer the type. And, the Location() function was automatically bound as a method on the Address type.

So, what would inheritance look like in this world? Let's create a MultiFamilyAddress:

type MultiFamilyAddress struct {
    Address Address
    Unit string
}

This is a perfect example of composition-over-inheritance but in Go. Now if we want to call the Location method, we have to do it like so:

func main() {
    address := Address {Number: "137", Street: "Park Lane", City: "Kirkland", State: "WA", Zip: "98033"}
    multi := MultiFamilyAddress {Address: address, Unit: "200"}
    multi.Address.Location()
}

Play this

Of course, we can always define a method with the signature func (m MultiFamilyAddress) Location() if we wanted to avoid this indirection. This isn't really inheritance the way we think of it. To do field-based inheritance, we use a construct Go calls anonymous fields:

type MultiFamilyAddress struct {
    Address
    Unit string
}

Not much different, right? This is Go's way of including all the fields of Address as though they were local fields on MultiFamilyAddress. This means the instantiation of MultiFamilyAddress will now look like this:

multi := MultiFamilyAddress{Address{Number: "137", Street: "Park Lane", City: "Kirkland", State: "WA", Zip: "98033"}, "200"}
multi.Location()

Play this

Go also offers interfaces, but they are a bit different than your normal OO interfaces. We'll cover those in another article.

So we've seen the procedural and object-oriented methodologies, but what about functional? A key component of functional programming Higher-order Functions. In Java, as of version 8, we can do something like this:

List<String> strings = Arrays.asList("Hello", "World");
strings.foreach(n -> System.out.println(n));

Of course, in Java 7 or before, it would be more like this:

List<String> strings = Arrays.asList("Hello", "World");
for ( String str : strings )
    System.out.println(str);

In Go, it would look something like this:

func main() {
    strings := [...]string{"Hello", "World"}
    for _, item := range strings {
        fmt.Println(item)
    }
}

Play this

Some interesting things here. First, to declare an array, we put that at the beginning of the variable definition. We used [...] in indicate the compiler should figure out the actual size. We could have easily made it [2]string{"Hello", "World"}.

The for loop is where it gets interesting. First, you see we are taking 2 parameters back, one indicated with an _ character. This is a convention in Go (and some other languages) for a parameter we don't care about. In this case, it's the index position of the element. The range operator takes a []T type, and executes the code inside the curly braces on each item.

Of course, this wasn't clearly a higher-order function, nor did it involve closures. Let's take a look at a simple example that does this:

func main() {
    x := 5
    fn := func() {
        fmt.Println("x is", x)
    }
    fn()
    x++
    fn()
}

Play this

This prints, as you might expect:

x is 5
x is 6

So we have functions as data types. This lets us do some interesting things:

package main

import (
    "fmt"
    "math/rand"
    "time"
)

type calcOp func(int, int) int

func main() {
    // You seed your RNGs, right?
    rand.Seed(time.Now().Unix())

    fns := []calcOp{
        func(x, y int) int { return x + y },
        func(x, y int) int { return x - y },
        func(x, y int) int { return x * y },
        func(x, y int) int { return x / y },
        func(x, y int) int { return x % y },
    }

    fn := fns[rand.Intn(len(fns))]

    x, y := 171, 35
    fmt.Println(fn(x, y))
}

Play this

So what's going on here? First, we've defined a type called calcOp -- a calculator operation. It is a function that takes 2 integers, and returns an integer. This is now a defined type we can use in argument lists and objects.

In the main method, we create a collection of these objects. However, since we have ommitted a size, it's not an array. In Go parlance, this is called a Slice.

We instantiate this collection of calcOp functions. We pick one at random. We initialize x and y with 171 and 35 respectively (that multi-assign syntax is also a feature of Go), then execute the function with those values. Neat!

Concurrency Constructs

So now we've seen that Go encapsulates many existing programming schools, but if you're a fan of one of those in particular, there is almost certainly a better language for it. Haskell and OCaml for functional, Clojure and Ruby for OO, and C and Rust for procedural. One of the key selling points, and I cringe while typing this out, is that Go is meant for the cloud. Not only do we parallelize and distribute our applications, we need to parallelize our code as well. This has been a major source of both performance issues, and correctness issues.

To that end, Go has two constructs that are going to help us: goroutines and channels. Goroutines are a lot like actors (in the Akka Actor sense) -- basically multiple threads without necessarily having a 1-to-1 correlation to system threads. When one blocks, another takes over. Channels are a way to separate computation and provide a clean interface to talk between them. Let's take a look at what they do:

package main

import (
    "fmt"
    "math/rand"
    "time"
    "strconv"
)

func Announce(message string, delay time.Duration) {
    go func() {
        time.Sleep(delay)
        fmt.Println(message)
    }()
}

func main() {
    for i := 0; i < 20; i++ {
        dur := time.Duration(rand.Int31n(10)) * time.Millisecond
        Announce("Item " + strconv.Itoa(i), dur)
    }

    fmt.Println("Done!")
}

Play this

The main method is just a bunch of setup -- defining dur to be a small duration of time (up to 10 milliseconds), and printing a value to the console 20 times. If you ran this program as-is, what would you expect to see? A bunch of random-ordered "Item X" messages, followed by a 'Done!' message? Here's what you actually get:

Done!
Program exited.

Wait, what? Let's look at that Announce function again. It is called with go func() -- this is how you invoke a goroutine. I am oversimplifying, but think of goroutines as backgrounded processes on the shell. Or, if you really know your threading model in Java, they are daemon threads. That is, they do not hold up program execution. When the main thread dies, they die as well. In Go, a goroutine will execute if the program is still running. We didn't get anything on the console because the program didn't run long enough. Let's add this line right before the 'Done!' line in the main function:

time.Sleep(time.Duration(5 * time.Second))

Play this

This tells our main thread to pause for 5 seconds, then we can continue and finish. With this model, we get our expected output:

Item 18
Item 15
Item 9
Item 5
Item 6
Item 17
...

So, that's goroutines. They're like background processes. The obvious question here is -- how do I make sure they execute? That is, you want to (potentially) offload the work to another thread or process, but it's important that it finishes. This is where Channels come in.

In Go, Channels -- blatantly taken from the link -- are "the pipes that connect concurrent goroutines. You can send values into channels from one goroutine and receive those values into another goroutine."

Call this IPC or eventing or what have you. It is a basic construct of communicating between goroutines. So, what does a channel look like? To make a channel, we use the Go builtin make. It makes a variable for you, and it's how you make channels:

mychan := make(chan string)

chan is the identifier for a channel. The string identifier says it's a channel of strings. That is, it takes and emits strings. The simplest way to emit and receive messages is this:

go func() { mychan <- "ping" }()
msg := <-mychan
fmt.Println(msg)

Play this

We are using a goroutine lambda to emit a message to the channel mychan, and then receiving it into msg.

So, how would we apply this to the example above? We know we can send a message to a channel, and we know we can receive messages. Additionally, receiving a message is a blocking operation -- the execution stops until a message is available. We could go really naive with it:

func Announce(message string, delay time.Duration) {
    mychan := make(chan bool)

    go func() {
        time.Sleep(delay)
        fmt.Println(message)
        mychan <- true
    }()

    <-mychan
}

Play this

In this example, we receive from mychan after the execution of func finishes. This has one rather predictable side effect: all lines are printed in order. Because receiving a message is a blocking operation, we don't return control to the for loop until we have received a message. Now, what if we want to keep the parallelism? Here's how I solved this one:

package main

import (
    "fmt"
    "math/rand"
    "strconv"
    "time"
)

func Announce(message string, delay time.Duration, done chan bool) {
    go func() {
        time.Sleep(delay)
        fmt.Println(message)
        done <- true
    }()
}

func main() {
    numMessages := 20

    channels := make([]chan bool, numMessages)

    for i := 0; i < numMessages; i++ {
        channels[i] = make(chan bool)
        dur := time.Duration(rand.Int31n(10)) * time.Millisecond
        Announce("Item "+strconv.Itoa(i), dur, channels[i])
    }

    for i := 0; i < numMessages; i++ {
        <-channels[i]
    }

    fmt.Println("Done!")
}

Play this

Here, we use the make function again to create an array of channels, one for each message. Then, inside the loop, we create a channel and stick it in the array. We then pass that channel to the Announce function. The goroutine inside that function signals the channel when it has executed. Because we don't query the channels until afterwards, this allows the random-order execution we're looking for. To finish it up, we drain the array of channels.

There are other problems with this solution -- what if we don't know the number of channels we want, what if the number is too large to reasonably store in memory? These will be left as an exercise for the reader.

Last Little Bits

So we've seen some neat concurrency concepts, as well as how to structure types and methods.

First, if you don't want to use the := syntax, you can declare a variable with a type:

var myint int = 5

This is not too useful for our examples. You can also declare constants:

const foo = "This is a constant"

We saw above that you can return multiple values from a function. You can do that yourself:

func mutireturn() (int, string) {
    return 42, "foo"
}
var x, str = multireturn()

We didn't show a pure example of higher-order functions in the functional section, so here's two of those:

func adder() func(int) int {
    sum := 0
    return func(x int) int {
        sum += x // sum is declared outside, but still visible
        return sum
    }
}

func sum(i int) func(int) int {
    sum := i
    return func(x int) int {
        sum += x
        return sum
    }
}

func main() {
    add := adder()
    fmt.Println(add(3))
    fmt.Println(add(5))

    add2 := sum(2)
    fmt.Println(add2(0))
    fmt.Println(add2(3))
}

Play this

This gives us the output:

3
8
2
5

And one last bit. Go has a defined structure to the code. There is only one correct way to format your Go programs. It's so important, that there is a go format command to put your code in the correct style, and it's not configurable. Holy wars have been started over the correct way to align braces, spaces, and brackets in C-style languages. Go picked one and built it in. When you have one less thing to worry about, you can focus on more important concerns.

Final Thoughts

Go is quite a fun language to work with. It has a lot of the power of C/C++ (including pointers), but cuts out a lot of cruft. It can be run either as a pre-compiled unit, or you can run a single file on the command line with go run myprogram.go. This makes it serve dual purpose of compiled and interpreted software. This makes it just as appropriate for high-performance, long-running software as it does for advanced shell scripting. Happy programming!

Continuous Delivery Tool Recommendation for the Java Stack

There are eight essential components of a Continuous Delivery setup.


1.    Source Control
2.    Build Tool
3.    Automated Tests
4.    Continuous Integration (CI) Server
5.    Binary Repository
6.    Configuration Management
7.    Automated Deployment
8.    Monitoring and Analytics


An issue management system could also be argued for, but it is more of a project management concern.


Source Control:


For this, I recommend Git unequivocally. Stash, which is like GitHub but behind a firewall, is also an effective tool. They allow for pull based workflows to enforce code reviews and knowledge sharing. Git is a fantastic tool.


Build Tool


I have to recommend Maven for this. While some may object to its verbose xml syntax, it’s very well supported by all the major Java IDEs. In addition, nearly every CI tool offers native Maven support. Maven also deals with dependency management, which could be its own category if they were not bundled so easily here. Gradle is another great alternative, but the ability to put code into your build scripts is a bit scary. It can be great if you have a disciplined team, but could lead to non-repeatable builds. Additionally, the more heavy the customization you put in, the less your tooling chain can help you.


Automated Tests


For commit tests, there are really two good choices. jUnit and TestNG. Either of them works. Nearly every java developer should be familiar with jUnit. TestNG offers some more advanced tooling and arguably better runtime behavior. Nobody will get fired for using jUnit, but TestNG is a bit better if you are staring a greenfield project.
For mocking/stubbing, I like to use Mockito. It is pretty unrivaled in ease of use.


For fluent assertions, I like AsserJ. It supersedes hamcrest and FEST.


Acceptance testing can often be done with jUnit and TestNG as well. I like to use the RestAssured framework for testing REST endpoints. I also do a bit of selenium and other browser-based testing. PhantomJS is a great too to do a first pass. I like acceptance testing in a framework called Cucumber, because the test specifications follow and almost English language structure.


For performance testing, I like Gatling locally and Neustar for cloud-based testing.


CI Server


Industry standard here is Jenkins, and it works fine. It has great community support and all that comes with it. However, I prefer TeamCity. It offers a lot of powerful features like extracting templates from a build, easy automatic job creation for new branches, and many more. I also like the way it manages VCS roots a lot better. It is a commercial product past a certain size, but I think it is worth it. To get the same features out of Jenkins, you must to a bunch of configuration on a bunch of plugins from many different sources.


Binary Repository


There are only two reasonable choices here: Nexus or Artifactory. People can get into religious wars over these, but I prefer Artifactory. It can act as an NPM repository and an RPM repository. However, there is a more contentious issue. Artifactory will rewrite POM files to remove <repository> information so that you don’t leak requests. Nexus does not. That means that if somebody specifies a custom repository in a POM file, you will end up searching that one as well.


Configuration Management


There is no single tool here that stands out. I like using Typesafe Config for configuration. You still need a way to deploy it, though that is more a component of automated deployment. There is a lot of talk about distributed configuration management and configuration discovery. For that, etcd is the popular choice.


Automated Deployment


This can be a contentious issue, and I don’t have a solid opinion on it. The two primary packages are Chef and Puppet. I think either is a reasonable choice. They both work to automatically bring a system to a known state, but they take different tacks. Puppet is more declarative, and Chef is more scripted. I have worked more with Puppet, so I am more comfortable with it.


Monitoring and Analytics


For analytics, it is still hard to beat Dropwizard Metrics. A few annotations and you are on your way.

For monitoring, Zabbix seems to be a rather common tool – that everyone has some problem with. ZenOSS is nice, but is usually used in very large organizations and therefore tends to be cumbersome. It is only really appropriate if you are managing 100 or more servers. Nagios is pretty popular, but seems like it has stagnated in terms of advancements. I remember it being purely plugin-driven as well, meaning you need to know the ecosystem just to get it running.

Altogether, I still have to recommend Zabbix for most circumstances.

 

October Retrospective

We’re reminiscing over what a great October it has been for us. We started the month with a group of developers moving in-house to work on an exciting project for a client.

A bunch of us got together on a Saturday for a great cause. We were recognized as one of the fastest-growing private companies in the state. We had great times over food & drinks with our entire team at our quarterly All Hands... and finally, we hosted two great seminars.

Justin Graham gave his first seminar on Developing a Test Strategy earlier in october and our CTO Will Iverson led the discussion on Managing an Agile Portfolio last week.

As we look forward to November, we’d like to share the next two seminars we've lined up.

Our first seminar for November will feature Faith Cooley, who will present Organizational Design for Effective Software Development on November 6th.

According to Faith, “It is often relatively easy to solve technical problems, [but] it is harder to solve organizational problems.”

Scenarios could include teams that are functioning in less than an optimal manner – in turn, this consumes budgets, impacts a lead or manager’s ability to deliver, and leaves everyone exhausted.

She will share easily executable ideas on how to improve cross training on teams, how leads can create well-rounded actionable reviews for their employees, and will give tips on how to have corrective conversations with team members.

Finally, later in the month, Gabe Hicks will cover Continuous Delivery Maturity. Be on the lookout for more info as we get closer to that event.

We can’t wait to see how November will shape up for us. More importantly we’re hoping you will take part in sharing some of those moments with us!

Dev9 Solutions Architect Coffee Talk: on Continuous Delivery & Automation

Once in a while, we like to sit down with our SA's and pick their brains about the development space in which they operate. We decided that those conversations are much more effective with the addition of coffee, so grab a cup and enjoy this entry in our series of "Dev9 SA Coffee Talks"

The Solutions Architect chosen for this first series of coffee interviews is Gabe Hicks, a solutions architect at Dev9. Gabe has been with Dev9 since the company’s inception. Currently he is working on a project at our corporate office.

We sat down at Starbucks. Gabe ordered a cappuccino and I went with an Americano. We opened our conversation by simply discussing, rather broadly, how companies benefit from continuous delivery. He paused thoughtfully for a few moments, and took a sip of his coffee. “Continuous delivery reduces the number of obstacles that surface during the development process,” Gabe said. “It’s about automation and breaking down traditional barriers. It’s about making deployment the most important piece of the development lifecycle.”

"Removing obstacles is really the core concept behind continuous delivery," He continued. "During development every obstacle must be dealt with or circumvented. The longer this process takes, the more expensive and frustrating a project becomes." Continuous Delivery establishes processes and practices that help to prevent some problems from occurring at all, and allows for quick identification and resolution of those that do occur.

As you would expect, our discussion inevitably shifted to the topic of automation.

“Developers love automation,” Gabe said matter-of-factly. ”It removes their fear of deployments. They know their code has been tested, and that if something isn’t right, they have the ability to react and redeploy quickly. Products don’t fail in the eleventh hour, and you get to produce good work in a manner that lets you go home and not be a ball of unhappiness.” We laughed.

Automated testing allows for developers to produce more code, with better testing coverage, than manual testing could ever allow for. This means that there are fewer bugs that make it into the build, and everyone loves that. Coupling automated deployment processes with automated testing allows for rapid development and deployment while minimizing downtime.

“Automation has not always been encouraged,” said Gabe as we headed out of 'our' Starbucks. “When I first started (developing), no one asked you to do any automation. Continuous delivery says to automate at every level, all the way through. It produces much higher quality (code).”

Keep an eye out here on our blog for further CD Interview transcripts!

We Won! PSBJ Top 100 Fastest Growing Companies!

We are delighted to announce that we rank #42 on Washington’s 100 Fastest-Growing Private Companies list by Puget Sound Business Journal!

Co-partners Matt Munson (our Chief Operating Officer) and Will Iverson (our Chief Technology Officer) accepted the award on behalf of our team at a sold-out awards banquet in Seattle earlier this month.

On winning the award, Will said, “It’s a really, really nice marker for us. We don’t wake up in the morning thinking about what awards we are going to win, but it really helps us benchmark how we are doing in a fun way!”

This award is meaningful to us because we believe it’s a testament to the dedication and passion that our team and clients contribute to the Dev9 experience.

“We started with an idea – providing best in class software development, using Continuous Delivery as a roadmap. That resonates with people as a vision, and from there it’s just a matter of treating people well,” said Will.

While rapid growth is great, it remains just that unless there’s a means to sustaining that growth and we recognize what that entails. “We will keep focusing on our strengths, which include core software development and automation,” Will noted. “New platforms drive new opportunities for revenue and growth for our clients, and so we are always investing.”

Cheers to growing!

Java Release Process with Continuous Delivery

Note: A lot of the release specifics were pioneered by Axel Fontaine.

One of the most interesting things we deal with is releases. Not a deployment -- which is actually running the new software. A release, in our parlance, is creating a binary artifact at a specific and immutable version. In the Java world, most of us use Maven for releases. More pointedly, we use the maven-release-plugin. I am going to show you why you should stop using that plugin.

Why Change?

This is a question I field a lot. There are several reasons, but the primary one is this: In a continuous delivery world, any commit could theoretically go to production. This means that you should be performing a maven release every time you build the software. So, let's revisit what happens inside your CI server when you use the maven-release-plugin properly:

  • CI checks out the latest revision from SCM
  • Maven compiles the sources and runs the tests
  • Release Plugin transforms the POMs with the new non-SNAPSHOT version number
  • Maven compiles the sources and runs the tests
  • Release Plugin commits the new POMs into SCM
  • Release Plugin tags the new SCM revision with the version number
  • Release Plugin transforms the POMs to version n+1 -SNAPSHOT
  • Release Plugin commits the new new POMs into SCM
  • Release Plugin checks out the new tag from SCM
  • Maven compiles the sources and runs the tests
  • Maven publishes the binaries into the Artifact Repository

Did you get all of that? It's 3 full checkout/test cycles, 2 POM manipulations, and 3 SCM revisions. Not to mention, what happens when somebody commits a change to the pom.xml (say, to add a new dependency) in the middle of all this? It's not pretty.

The method we're going to propose has 1 checkout/test cycle, 1 POM manipulation, and 1 SCM interaction. I don't know about you, but this seems significantly safer.

Versioning

Before we get into the details, let's talk about versioning. Most organizations follow the versioning convention they see most frequently (often called Semantic Versioning or SEMVER), but don't follow the actual principles. The main idea behind this convention is that you have 3 version numbers in dotted notation X.Y.Z, where:

  1. X is the major version. Any changes here are backwards-incompatible.
  2. Y is the minor version. Any changes here are backwards-compatible, but there may be bug fixes or new features.
  3. Z is the incremental version. All changes here are backwards-compatible.

However, most organizations do not use these numbers correctly. How many apps have you seen that sit at 1.0.x despite drastic breaking changes, feature addition/removal, and more? This scheme provides little value, especially when most artifacts are used in-house only. So, what makes a good version number?

  • Natural order: it should be possible to determine at a glance between two versions which one is newer
  • Build tool support: Maven should be able to deal with the format of the version number to enforce the natural order
  • Machine incrementable: so you don't have to specify it explicitly every time

While subversion offers a great candidate (the repository commit number), git does not have the same. However, all build systems, including both Bamboo and Jenkins, expose an environment variable that is the current build number. This is a perfect candidate that satisfies all three criteria, and has the added benefit that any artifact can be tied back to its specific build through convention.

What about Snapshots?

Snapshots are an anti-pattern in continuous delivery. Snapshots are, by definition, ephemeral. However, we're making one exception, and that's in the POM file itself. The rule we're following is that the pom.xmlalways has the version 0-SNAPSHOT. From here on out, no more snapshots!

The New Way

So, we're going to use the build number as the version number, and not have snapshots (except as described above). Our POM file is going to look a little something like this:

<project ...>
  ...
  <version>0-SNAPSHOT</version>
</project>

This is the only time we will use -SNAPSHOT identifiers. Everything else will be explicitly versioned. I am assuming your distributionManagement and scm blocks are filled in correctly. Next, we need to add 2 plugins to our POM file:

<build>
    ...
    <plugins>
    ...
        <plugin>
            <groupId>org.codehaus.mojo</groupId>
            <artifactId>versions-maven-plugin</artifactId>
            <version>2.1</version>
        </plugin>
        <plugin>
            <artifactId>maven-scm-plugin</artifactId>
            <version>1.8.1</version>
            <configuration>
                <tag>${project.artifactId}-${project.version}</tag>
            </configuration>
        </plugin>
    </plugins>
</build>

The devil is in the details, of course, so let's see what should happen now during your release process. Note that I am using Bamboo in this example. You should make sure to modify it for your CI server's variables. The process is:

  • CI checks out the latest revision from SCM
  • CI runs mvn versions:set -DnewVersion=${bamboo.buildNumber}
  • Maven compiles the sources and runs the tests
  • Maven publishes the binaries into the Artifact Repository
  • Maven tags the version

    Steps 3, 4, and 5 are run with one command: mvn deploy scm:tag.

That's it. We have one specific revision being tagged for a release. Our history is cleaner, we can see exactly which revision/refs were used for a release, and it's immune to pom.xml changes being committed during the process. Much better!

Gotcha!

Ok, this all works great, unless you have a bad setup. The primary culprit of a bad setup is distinct modules having snapshot dependencies. Remember how I told you snapshots are an anti-pattern? Here's the general rule: if the modules are part of the same build/release lifecycle, they should be put together in one source repository, and should be built/versioned/tagged/released as one unit. If the modules are completely separate, then they should be in a separate source repository, and you should have fixed-version dependencies between them to provide a consistent interface. If you are depending on snapshot versions, you are creating non-repeatable builds, as the time of day you run the build/release will determine which exact dependency you fetch.

Dev Environments with Vagrant

If you work with a number of clients, one issue pops up over and over: setting up a new machine. Sometimes, you're lucky and a client will let you use your own machine. More often than not, though, you're forced to use their hardware. This usually involves reading a bunch of out-of-date wiki documents, asking people around you, and maybe contributing back to the wiki for the next person. If you're lucky, you'll get this done in a day or two. More typically, it can take a week or so.

If you're a manager, this should also worry you. You're making these developers, who you likely spent a good amount of money on recruiting and compensation for, spend a week or so of down time just setting up their computer. Even taking a conservative estimate of $65/hr, that means you're spending $2600 for somebody to get up and running. Now imagine you're paying prevailing market rate for consultants, and that figure rises dramatically.

At Dev9, we like to automate. Typical payback times for automation projects may be in the months or even years, but imagine you could shave 2-3 days off of new machine setup time for each developer you onboard. This kind of tool could pay for itself with your first new developer, with better returns for each additional developer. So, what do we do?

Code

This article is going to involve some code. If you want to play along at home, you can view our repo at https://github.com/dev9com/vagrant-dev-env.

Enter Vagrant

Vagrant is a tool perfectly designed for our use case. It utilizes virtual machines (I use Oracle VirtualBox). VMs used to be clunky, and kind of slow. But we're living in an age where laptops come with 16GB RAM and 500+GB SSD drives, along with 8-core processors. We are living in an age of abundance here, and it would be a shame to let it go to waste :).

The Build

What we are going to build is a development machine image. While companies can benefit from creating this and handing it to new hires, it's just as valuable if you have multiple clients. I can transition between provided hardware with ease, because I'm just using them all as a host for my VM. In addition, I can make a change to the provisioning of one VM, and propogate it quickly to the others.

This VM is going to be a headless VM. That means there is no UI. We will interact with it over SSH. This helps keep it fast and portable. I have no problem using IntelliJ IDEA on Windows or Mac or Linux, but what I always want is my terminal and build tools. So, that's the machine we're going to build.

Initial Setup

First, get Vagrant and VirtualBox installed. Maybe clone our git repo if you want to follow along. That should be all for now!

This is something that only comes with research, but our base image is going to be phusion/ubuntu-14.04-amd64. This is the foundation of all of our images. This one was chosen because it plays really nicely with Docker. Full disclosure, we are Docker's PNW partner, so this is actually important to me :).

Step 1: A Basic Box

The first step in anything software related seems to be hello world. So, to create a Vagrant instance, we create a Vagrantfile. Clever, right? And even better, your Vagrantfile is just Ruby code -- like a Rakefile. The simplest possible Vagrantfile for what we're doing:

box      = 'phusion/ubuntu-14.04-amd64'
version  = 2

Vagrant.configure(version) do |config|
    config.vm.box = box
end

Let's go through this. As I mentioned above, our base box is that Ubuntu distro. You can just as easily choose CentOS, SUSE, CoreOS, or any number of other images. People even have entire dev stacks as one image! The version identifier is just signalling to Vagrant which configuration API to use. I've personally never seen anything except 2, but given the concept of versioned APIs in the REST world, it's not difficult to see how they plan to use it in the future.

So, to run this, we just type vagrant up:

[10:50:48 /ws/dev9/vagrant-dev-env/step1]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step1_default_1409766665528_9289
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2200 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2200
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step1

[10:51:23 /ws/dev9/vagrant-dev-env/step1]$

Notice that this took all of about 35 seconds. Most of the output is rather self-explanatory. So, this box is "up" -- how do we use it?

[10:51:23 /ws/dev9/vagrant-dev-env/step1]$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2
vagrant@ubuntu-14:~$

That's it. There's your Ubuntu VM! Let's say we want to take it down, delete it, and bring it back up:

vagrant@ubuntu-14:~$ exit
Connection to 127.0.0.1 closed.

[10:55:23 /ws/dev9/vagrant-dev-env/step1]$ vagrant destroy -f
==> default: Forcing shutdown of VM...
==> default: Destroying VM and associated drives...

[10:55:30 /ws/dev9/vagrant-dev-env/step1]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step1_default_1409766945197_31521
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2200.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2200 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2200
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step1

[10:56:02 /ws/dev9/vagrant-dev-env/step1]$

So under a minute to destroy a VM and bring up an identical one. Not bad, Future. Not bad. A box like this is fine and dandy, but we probably want to do more with it.

Step 2: Basic Provisioning

Even at a base level, let's say we want Java. So, let's tweak our Vagrantfile a bit:

box      = 'phusion/ubuntu-14.04-amd64'
version  = 2

Vagrant.configure(version) do |config|
    config.vm.box = box

    config.vm.provision :shell, :inline => "apt-get -qy update"
    config.vm.provision :shell, :inline => "apt-get -qy install openjdk-7-jdk"
end

If you now run vagrant up, you'll get a machine with Java installed:

[11:27:33 /ws/dev9/vagrant-dev-env/step2](git:master+?)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step2_default_1409768866354_7342
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2201.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2201 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2201
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step2
==> default: Running provisioner: shell...
    default: Running: inline script

[ clipping a bunch of useless stuff -- you know how it is. ]

==> default: 1 upgraded, 182 newly installed, 0 to remove and 109 not upgraded.
==> default: Need to get 99.4 MB of archives.
==> default: After this operation, 281 MB of additional disk space will be used.
[ ... ]
==> default: done.
==> default: done.

[11:30:15 /ws/dev9/vagrant-dev-env/step2]$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2

vagrant@ubuntu-14:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

vagrant@ubuntu-14:~$

And there we go. A scripted buildout of a base Ubuntu box with Java. Of course, shell scripts can and do go wrong. They get progressively more complex, especially as you start having components that mix and match. Additionally, since all developers should be getting familiar with Continuous Delivery concepts, let's take this opportunity to explore a little tool called Puppet

Step 3: Buildout with Puppet

Puppet is pretty awesome -- and so are Chef and Ansible. I chose Puppet initially because I could get it working quicker. I'm not making a value judgement on which one works best.

The idea with Puppet is that you use the puppet files to describe the state you want the machine to be in, and Puppet manages getting it there. Vagrant also has first-class support for Puppet. Remember above, how we're provisioning with inline shell scripts? Well, Vagrant also has a Puppet provisioner. If you've never used Puppet before, that's OK, the examples should give you a basic overview of its usage.

To set up a basic Puppet provisioner, let's do something like this in our Vagrantfile:

box      = 'phusion/ubuntu-14.04-amd64'

Vagrant.configure(2) do |config|
    config.vm.box = box

    # Now let puppet do its thing.
    config.vm.provision :puppet do |puppet|
      puppet.manifests_path = 'puppet/manifests'
      puppet.manifest_file = 'devenv.pp'
      puppet.module_path = 'puppet/modules'
      puppet.options = "--verbose"
    end
end

This also seems pretty straightforward. Again, don't worry too much if you don't know Puppet. Those paths are relative to the Vagrantfile, so your directory structure (initially) will look like this:

[12:43:47 /ws/dev9/vagrant-dev-env/step3]$ tree
.
├── Vagrantfile
└── puppet
    ├── manifests
    │   └── devenv.pp
    └── modules

In the provisioner, we're giving it 2 paths. Manifests is where puppet will look for manifest files. A manifest is a basic unit of execution in Puppet. A manifest is made up of one or more resource declarations -- the desired state of a resource. These resource declarations are the basic building blocks. So, to start, let's just get our previous example working in Puppet. Modify your devenv.pp to look like this:

group { 'puppet': ensure => 'present' }

exec { "apt-get update":
  command => "apt-get -yq update",
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

exec { "install java":
  command => "apt-get install -yq openjdk-7-jdk",
  require => Exec["apt-get update"],
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

This is pretty self explanatory, with one caveat: Order doesn't matter. Puppet tries to optimize the running and management of dependencies, so the steps will not necessarily be executed in the order you expect. This is why the require: declaration exists on the install java exec. We are telling Puppet to execute the apt-get update before this step. Notice also that it's a capital E in a require -- that's just the way Puppet does things. I'm sure somebody has a better explanation, but for now just consider it the required convention.

So, let's bring this box up:

[12:56:35 /ws/dev9/vagrant-dev-env/step3]$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step3_default_1409774249245_48069
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2202.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2202 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2202
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step3
    default: /tmp/vagrant-puppet-3/manifests => /ws/dev9/vagrant-dev-env/step3/puppet/manifests
    default: /tmp/vagrant-puppet-3/modules-0 => /ws/dev9/vagrant-dev-env/step3/puppet/modules
==> default: Running provisioner: puppet...
==> default: Running Puppet with devenv.pp...
==> default: stdin: is not a tty
==> default: Notice: Compiled catalog for ubuntu-14.04-amd64-vbox in environment production in 0.07 seconds
==> default: Info: Applying configuration version '1409774267'
==> default: Notice: /Stage[main]/Main/Exec[apt-get update]/returns: executed successfully
==> default: Notice: /Stage[main]/Main/Exec[install java]/returns: executed successfully
==> default: Info: Creating state file /var/lib/puppet/state/state.yaml
==> default: Notice: Finished catalog run in 117.84 seconds

[12:59:48 /ws/dev9/vagrant-dev-env/step3](git:master+?)
$ vagrant ssh
Welcome to Ubuntu 14.04 LTS (GNU/Linux 3.13.0-24-generic x86_64)

Last login: Tue Apr 22 19:47:09 2014 from 10.0.2.2

vagrant@ubuntu-14:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
vagrant@ubuntu-14:~$

And now we have puppet provisioning our system! The output is also much nicer, and you can get some hint of how Puppet works -- there are stages, it gives us return values, it saves a state file, and there is a concept of environments. Any wonder why Puppet is so popular in the DevOps world? When you hear DevOps folks talking about a VM as a unit of deployment, they're not kidding. It's just a file.

Of course, this is basically cheating. The Puppet way is to describe the state of a system, and this is not describing the state of the system, it's describing commands to run. While some of you may like that, there are different frameworks for that. This is a declarative, stateful framework, so let's not try to turn it into glorified shell scripting. So, we can change that up a bit...

Part 4: Actually Using Puppet

For this step, the Vagrantfile doesn't change. We're just changing the Puppet files. Check this out:

group { 'puppet': ensure => 'present' }

exec { "apt-get update":
  command => "apt-get -yq update",
  path    => ["/bin","/sbin","/usr/bin","/usr/sbin"]
}

package { "openjdk-7-jdk":
  ensure  => installed,
  require => Exec["apt-get update"],
}

Now we're declaring state. We're just telling puppet to make sure openjdk-7-jdk is installed, and run an apt-get update beforehand. Since apt-get update is idempotent on its own, this whole definition is now idempotent. That means we can run it multiple times without issue!

Let's bring the box up:

[13:36:30 /ws/dev9/vagrant-dev-env/step4](git:master+!?)
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'phusion/ubuntu-14.04-amd64'...
==> default: Matching MAC address for NAT networking...
==> default: Checking if box 'phusion/ubuntu-14.04-amd64' is up to date...
==> default: Setting the name of the VM: step4_default_1409776604916_69804
==> default: Clearing any previously set forwarded ports...
==> default: Fixed port collision for 22 => 2222. Now on port 2202.
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
    default: Adapter 1: nat
==> default: Forwarding ports...
    default: 22 => 2202 (adapter 1)
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
    default: SSH address: 127.0.0.1:2202
    default: SSH username: vagrant
    default: SSH auth method: private key
    default: Warning: Connection timeout. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
    default: /vagrant => /ws/dev9/vagrant-dev-env/step4
    default: /tmp/vagrant-puppet-3/manifests => /ws/dev9/vagrant-dev-env/step4/puppet/manifests
    default: /tmp/vagrant-puppet-3/modules-0 => /ws/dev9/vagrant-dev-env/step4/puppet/modules
==> default: Running provisioner: puppet...
==> default: Running Puppet with devenv.pp...
==> default: stdin: is not a tty
==> default: Notice: Compiled catalog for ubuntu-14.04-amd64-vbox in environment production in 0.17 seconds
==> default: Info: Applying configuration version '1409776705'
==> default: Notice: /Stage[main]/Main/Exec[apt-get update]/returns: executed successfully
==> default: Notice: /Stage[main]/Main/Package[openjdk-7-jdk]/ensure: ensure changed 'purged' to 'present'
==> default: Info: Creating state file /var/lib/puppet/state/state.yaml
==> default: Notice: Finished catalog run in 134.04 seconds

There we go! We've declared the state of our machine, and Puppet does its magic. Of course, Puppet can do a whole lot more -- file templating, adding and removing users, setting up configuration, making sure some packages are NOT present, etc. This is YOUR machine -- install git, maven, oh-my-zsh, etc.

Also, keep in mind that Puppet is a really in-demand skill. You might find yourself with a valuable new tool.

This Month in Continuous Delivery: Dev9's Top 5

As August comes to a close, we'd like to share some of the helpful articles and posts we've come across this month. Here are our top 5 Continuous Delivery and DevOps news for August:

Is Your Organizational Culture Ready for DevOps?:  http://DevOpsGuys Blog

Testing in a Continuous Delivery world: SDTimes

Balancing Quality and Velocity in Agile: InfoQ

A/B Testing + Continuous Delivery = Everyday Product Launches: InfoQ

Continuous Delivery Vs. Continuous Deployment: What's the Diff?: Puppet Labs

Enjoy, and see you next month!

Smarter Acceptance Testing with Personas

A while back, I gave a talk on the combination of Cucumber, PhantomJS, and WebDriver. There is a project on GitHub that contains the sample code. This is a small follow-up to that talk with the idea of how to manage your Cucumber scripts.

Writing BDD

To make the best use of BDD, you should write very rich steps. This is entirely the point of using behavior frameworks. In fact, if your BDD scenarios can't be used as documentation, you've done something wrong. I'm going to assume a knowledge of the Given/When/Then structure. Let's take a look at a bad example:

Scenario: User Signup
  Given A user is signing up for our site,
  When he enters his First Name,
   And enters his Last Name,
   And enters his Email,
   And re-enters his Email,
   And enters a new Password,
   And specifies his Gender,
   And enters his Birthday,
   And submits his request,
  Then an account is created,
   And account name is set as the email address.
   And a confirmation email is sent to Fred.

Now, this would be fine, but we're basically writing a bunch of steps that are quite fine-grained. Compare with something like this:

Scenario: User Signup
  Given A user is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
   And a confirmation email is sent.

Remember, the point of BDD stuff is to make it closer to how you would describe it when talking to somebody. You won't list out every single field when you're talking about it, so make your code-behind more powerful!

Test Data

The biggest chronic problem with testing of any sorts is test data. That is, to fully exercise a complex system, you need your data in a known state, and you need it to be consistent. Creating this data is difficult enough. A tool like Flyway can assist with setup/teardown/reset of data. But there's another problem: How do we know what pre-canned test data set to use? With enough test cases and enough scenarios to exercise, it can be difficult to remember the exact configuration of the test data. Or, if you are writing a test that uses a data set that is 99% correct, even changing that 1% can break other tests that were relying on that data to be fixed. So how do we deal with this?

In the BDD world, we have a fascinating option. Remember way back when we were all getting on board the Agile/Scrum train? The proper procedure is to write user stories in a manner like so:

As a non-administrative user, I want to modify my own schedules but not the schedules of other users.

This could be translated directly into gherkin:

Scenario: Schedule Management - User
  Given a non-administrative user
  When I attempt to modify my calendar
  Then the modification is successful

Scenario: Schedule Management - Admin
  Given a non-administrative user
  When I attempt to modify another user's calendar
  Then the modification is not successful

This looks fine, but "non-administrative user" is quite vague. And what happens when other people want to use code like that? How can we make this more extensible without sacrificing readability or maintainability? Enter personas.

Personas

The user story from above is the classic way most of us were taught to write user stories. However, when Agile was still young, the idea of personas was quite popular. From this site, we can see:

A persona [...] defines an archetypical user of a system, an example of the kind of person who would interact with it. [P]ersonas represent fictitious people which are based on your knowledge of real users.

Personas are different [from actors] because they describe an archetypical instance of an actor. In a use case model we would have a Customer actor, yet with personas we would instead describe several different types of customers to help bring the idea to life.

It is quite common to see a page or two of documentation written for each persona. The goal is to bring your users to life by developing personas with real names, personalities, motivations, and often even a photo. In other words, a good persona is highly personalized.

This is actually a very powerful construct in our BDD world. Instead of generically referring to "a user" or "an admin," we can refer to personas. Let's take that signup example again, but apply a persona:

Scenario: User Signup
  Given Fred Jacobsen is signing up for our site
    And the form is filled out correctly 
  When he submits his request
  Then an account is created
    And a confirmation email is sent.

Not very different, right? But behind the scenes can be a different story. Let's look at what it might have looked like before:

Given /^A user is singing up for our site$/ do
  user = find_user(admin: no, valid: true, ...)
end 

Compare to:

Given /^(\w+) is signing up for our site$/ do |name|
  user = user_lookup(name)
end

In the first case, you need to search for a user. If the data changes underneath you, this function may not return the same result every time. In the second case, however, a user lookup by name will return you a consistent entry. Let's take a look at the Schedule Management example again, but with Personas:

Scenario: John can change his own schedule
  Given John Doe is using our app
  When John attempts to modify his calendar
  Then the modification is successful

Scenario: John cannot change Jane's schedule
  Given John Doe is using our app
  When I attempt to modify Jane Smith's schedule
  Then the modification is not successful

Again, we can look the users up by name here. But we can also have some documents behind the scenes explaining the personas. For this example, I'm going to use markdown:

# John Doe #
John Doe is a paying, non-administrative user on our site. He is a 40 year old dad of 2 boys and 1 girl, and uses our product to manage the children's activities between him and his wife, Heather Doe. He checks his schedule each morning and each evening, but does not check it throughout the day. 

## Payment Information ##
John has a monthly subscription, and is up-to-date on payments. 

## Account Setup ##
John has an avatar image and a verified email address. He has not entered his phone number for SMS updates

This gives us a wealth of information. We know that John is an active member, with up-to-date payments, has an avatar, has a verified email, and doesn't have a phone number. So the test data behind this -- what if we were smart about the way we used Flyway to manage it as well?

file: flyway/personas/001-John_Doe.sql

insert into users(name, email) values ('John Doe', 'loverboy85@hotmail.com');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar) values (@id, '/images/avatars/johndoe.png');

Now we can have a 1:1 correspondence between a persona and the data that powers it. Say somebody comes along and basically wants everything John Doe has, except he wants somebody with a phone number entered? Instead of modifying John or trying to figure out if other tests will break, we just create a new user and persona:

file: flyway/personas/002-Jenny_Smith.sql

insert into users(name, email) values ('Jenny Smith', 'jsmith17@compuserv.net');
@id = select last_inserted_id(); /* or whatever your database supports */
insert into accounts(user_id, status) values (@id, 'active');
insert into profile(user_id, avatar, phone) values (@id, '/images/avatars/jennysmith.png', '555-867-5309');

Moving Forward

So what have we done, really? From an abstract sense, we've created named collections of test data. They should (and ought to) be immutable. Any change that developers want to make results in a new persona. If we decide a persona is no longer useful, it also makes an easy search through your code to find all usages. Given this setup, it's a small leap to creating very complex setup:

file: resources/personas/Nevill_Wadsworth_III.md

# Nevill Wadsworth III #
Nevill comes from an old-money family with assets in the tens of millions of dollars. He manages 5 family trusts, and uses our trading system to manage all of their assets. 

## Payment Information ##
Nevill is up-to-date on payments. Payments are deducted automatically via ACH. 

## Account Info ##
The 5 trusts that Nevill manages: 

### Trust 1: Wadsworth Unlimited ###
Wadsworth Unlimited is a small trust with the stocks of 2 companies. This is the dividend account for him. The stocks in this account are: 

  stock |    qty  | purchase price | purchase date
  ------------------------------------------------
  MSFT  | 150,000 |         $27.45 | 10/23/1997
  BRK.A |   1,000 |      $3,544.18 | 09/16/2001

### Trust 2: Wadsworth International ###
Wadsworth International is the trust that manages all of the family's assets outside of the US. For tax purposes, they have not repatriated the money, so it can only be spent outside of the US. The assets in this account are: 

You see where I'm going. Personas don't have to be limited to humans and/or clients, as well. They could be companies, external agents like regulatory auditors, or even a pet (if you're running a vet, for example).

The promise of BDD is executable documentation. It's not hard to imaging taking these scenario files, combining them with the persona markdown, combining those with the persona SQL, and creating a fully cross-referenced site of the test cases, combined with the personas, combined with the test data generation. That's left as an exercise for the reader.