A dirty check for Chef cookbooks: testing everything that has changed and only when it is changed / by Matt Wrock

I was recently pinged by a #cheffriend of mine, Alex Vinyar. A few months back, I sent him a bit of code that I wrote at CenturyLink Cloud that runs on our build agents to determine which cookbooks to test after a cookbook commit is pushed to source control. He just mentioned to me that he has gotten some positive feedback on that code so I thought I'd blog on the details of that for others who may be interested. This can also be considered a follow up to my post from almost a year ago on cookbook dependency management where I shared some theory I was pondering at the time on this subject. This post covers how I followed up on that theory with working code.

I'm going to cover two key strategies:

  1. Testing all cookbooks that depend on a changed cookbook
  2. Ensuring that once a cookbook is tested and succeeds, it is never tested again as long as it and its dependencies do not change

I'll share the same code I shared with Alex (and more) in a bit.

Not enough to test just the cookbook changed

Lets consider this scenario: You manage infrastructure for a complex distributed application. You maintain about a baker's dozen cookbooks - see what I did there?...baker...chef...look at me go girl! You have just committed and pushed a change to one of your lower level library cookbooks that exposes functions used by several of your other cookbooks. We'd expect your pipeline to run tests on that library cookbook but is that enough? What if your upstream cookbooks that depend on the library cookbook "blow up" when they converge with the new changes. This is absolutely possible and not just for library cookbooks.

I wanted a way for a cookbook commit to trigger not only its own chefspecs and test-kitchen tests but also the tests of other cookbooks that depend on it.

Kitchen tests provide slow but valuable feedback, don't run more than needed

Here is another scenario: You are now testing the entire dependency tree touched by a change...yay! But failure happens and your build fails. However the failure may not have occurred at the root of your tree (the cookbook that actually changed). Here are some possibilities:

  1. Maybe there was some transient network glitch that caused a upstream cookbook to fail
  2. Gem dependency changes outside of your own managed cookbook collection (you know - the dark of semver - minor build bumps that are really "major")
  3. Your build has batched more than one root cookbook change. One breaks but the others pass

So an hour's worth (or more) of testing 10 cookbooks results in a couple broken kitchen tests. You make some fixes, push again and have to wait for the entire set to build/test all over again.

I have done this before. Just look at my domain name!

However many of these tests may be unnecessary now. Some of the tests that has succeeded in the filed batch may be unaffected by the fix.

Finding dependencies

So here is the class we use at CenturyLink Cloud to determine when one cookbook as been affected by a recent change. This is what I sent to Alex. I'll walk you through some of it later.

require 'berkshelf'
require 'chef'
require 'digest/sha1'
require 'json'

module Promote
  class Cookbook
    def initialize(cookbook_name, config)
      @name = cookbook_name
      @config = config
      @metadata_path = File.join(path, 'metadata.rb')
      @berks_path = File.join(path, "Berksfile")
    end

    # List of all dependent cookbooks including transitive dependencies
    #
    # @return [Hash] a hash keyed on cookbook name containing versions
    def dependencies
      @dependencies ||= get_synced_dependencies_from_lockfile
    end

    # List of cookbooks dependencies explicitly specified in metadata.rb
    #
    # @return [Hash] a hash keyed on cookbook name containing versions
    def metadata_dependencies
      metadata.dependencies
    end

    def version(version=nil)
      metadata.version
    end

    def version=(version)
      version_line = raw_metadata[/^\s*version\s.*$/]
      current_version = version_line[/('|").*("|')/].gsub(/('|")/,"")
      if current_version != version.to_s
        new_version_line = version_line.gsub(current_version, version.to_s)
        new_content = raw_metadata.gsub(version_line, new_version_line)
        save_metadata(new_content)
      end
      version
    end

    # Stamp the metadata with a SHA1 of the last git commit hash in a comment
    #
    # @param [String] SHA1 hash of commit to stamp
    def stamp_commit(commit)
      commit_line = "#sha1 '#{commit}'"
      new_content = raw_metadata.gsub(/#sha1.*$/, commit_line)
      if new_content[/#sha1.*$/].nil?
        new_content += "\n#{commit_line}"
      end
      save_metadata(new_content)
    end

    def raw_metadata
      @raw_metadata ||= File.read(@metadata_path)
    end

    def metadata
      @metadata ||=get_metadata_from_file
    end

    def path
      File.join(@config.cookbook_directory, @name)
    end

    # Performs a berks install if never run, otherwise a berks update
    def sync_berksfile(update=false)
      return if berksfile.nil?

      update = false unless File.exist?("#{@berks_path}.lock")
      update ? berksfile_update : berksfile.install
    end

    # Determine if our dependency tree has changed after a berks update
    #
    # @return [TrueClass,FalseClass]
    def dependencies_changed_after_update?
      old_deps = dependencies
      sync_berksfile(true)
      new_deps = get_dependencies_from_lockfile

      # no need for further inspection is 
      # number of dependencies have changed
      return true unless old_deps.count == new_deps.count

      old_deps.each do |k,v|
        if k != @name
          return true unless new_deps[k] == v
        end
      end

      return false
    end

    # Hash of all dependencies. Guaranteed to change if 
    # dependencies of their versions change
    #
    # @param [environment_cookbook_name] Name of environmnt cookbook
    # @return [String] The generated hash
    def dependency_hash(environment_cookbook_name)
      # sync up with the latest on the server
      sync_latest_app_cookbooks(environment_cookbook_name)
      
      hash_src = ''
      dependencies.each do | k,v |
        hash_src << "#{k}::#{v}::"
      end
      Digest::SHA1.hexdigest(hash_src)
    end

    # sync our environmnt cookbook dependencies 
    # with the latest on the server
    #
    # @return [Hash] the list of cookbooks updated
    def sync_latest_app_cookbooks(environment_cookbook_name)
      result = {}

      #Get a list of all the latest cookbook versions on the server
      latest_server_cookbooks = chef_server_cookbooks(1)
      
      env_cookbook = Cookbook.new(environment_cookbook_name, @config)
      
      # iterate each cookbook in our environment cookbook
      env_cookbook.metadata_dependencies.each do |key|
        cb_key = key[0] # cookbook name

        # skip if its this cookbook or if its not on the server
        next if cb_key == @name || !latest_server_cookbooks.has_key?(cb_key)
        latest_version = latest_server_cookbooks[cb_key]['versions'][0]['version']
        
        # if the server as a later version than this cookbook
        # we do a berks update to true up with the server
        if dependencies.keys.include?(cb_key) &&
          latest_version > dependencies[cb_key].to_s
          
          berksfile_update(cb_key)
          result[cb_key] = latest_version
        end
      end

      # return the updated cookbooks
      result
    end

    private

    def chef_server_cookbooks(versions = 'all')
      Chef::Config[:ssl_verify_mode] = :verify_none
      rest = Chef::REST.new(
        @config.chef_server_url, 
        @config.node_name, 
        @config.client_key
      )
      rest.get_rest("/cookbooks?num_versions=#{versions}")
    end

    def berksfile
      @berksfile ||= get_berksfile
    end

    def get_berksfile
      return unless File.exist?(@berks_path)
      Berkshelf.set_format :null
      Berkshelf::Berksfile.from_file(@berks_path)
    end

    def save_metadata(content)
      File.open(@metadata_path, 'w') do |out|
        out << content
      end

      # force subsequent usages of metadata to refresh from file
      @metadata=nil
      @raw_metadata = nil
    end

    def get_metadata_from_file
      metadata = Chef::Cookbook::Metadata.new
      metadata.from_file(@metadata_path)
      metadata
    end

    def get_synced_dependencies_from_lockfile
      # install to make sure we have a lock file
      berksfile.install

      get_dependencies_from_lockfile
    end

    # get the version of every dependent cookbook from berkshelf
    def get_dependencies_from_lockfile
      berks_deps = berksfile.list
      deps = {}

      berks_deps.each {|dep| deps[dep.name] = dep.locked_version}
      deps
    end

    def berksfile_update(cookbooks = nil)
      cookbooks.nil? ? berksfile.update : berksfile.update(cookbooks)
      @dependencies = nil
    end
  end
end

This class is part of a larger "promote" gem that we use to promote cookbooks along our delivery pipeline. The class represents an individual cookbook and can be instantiated given an individual cookbook name and a config.

The config class (not shown here) is an extremely basic data class that encapsulates the properties of the chef environment.

Some assumptions

This code will only work given a couple conditions are true. One is fairly common and the other not so much. Even if these conditions do not exist in your environment you should be able to groc the logic.

Using Berkshelf

You are using berkshelf to manage your dependencies. This class uses the berkshelf API to find dependencies of other cookbooks.

Uning the Environment Cookbook Pattern

We use the environment cookbook pattern to govern which cookbooks are considered part of a deployable application. You can read more about this pattern here but an environment cookbook is very simple. It is just a metadata.rb and a Bekshelf.lock. The metadata file includes a depends for each of my cookbooks responsible for infrastructure used in the app. The Berkshelf.lock file uses that to build the entire tree of not only "my own" cookbooks but all third party cookbooks as well.

The dependency hash

For the purposes of this post, the dependency_hash method of the above class is the most crucial. This hash represents all cookbook dependencies and their version at the current point in time. If I were to persist this hash (more on that in a bit) and then later regenerate it with a hash that is at all different from the persisted one, I know something has changed. I don't really care where. Any change at all anywhere in a cookbook's dependency tree means it is at risk of breaking and should be tested.

The "Last Known Good" environment

Every time we test a cookbook and it passes, we store its current dependency hash in a special chef environment called LKG (last known good). This is really a synthetic environment with the same structure as any chef environment but the version constraints may look a bit strange:

{
  name: "LKG_platform",
  chef_type: "environment",
  json_class: "Chef::Environment",
  cookbook_versions: {
    p_rabbitmq: "1.1.15.2d88c011bfb691e9de0df593e3f25f1e5bf7cbba",
    p_elasticsearch: "1.1.79.2dd2472fbd47bf32db4ad83be45acb810b788a90",
    p_haproxy: "1.1.47.75dfc55b772fa0eb08b6fbdee02f40849f37ca77",
    p_win: "1.1.121.fa2a465e8b964a1c0752a8b59b4a5c1db9074ee5",
    p_couchbase: "1.1.92.325a02ab22743eccc17756539d3fac57ce5ef952",
    elasticsearch_sidec: "0.1.7.baab30d04181368cd7241483d9d86d2dffe730db",
    p_orchestration: "0.1.48.3de2368d660a4c7dfd6dba2bc2c61beeea337f70"
  }
}

We update this synthetic environment any time a cookbook test runs and passes. Here is a snippet of our Rakefile that does this:

cookbook = Promote::Cookbook.new(cookbook_name, config)
environment = Promote::EnvironmentFile.new("LKG", config)
lkg_versions = environment.cookbook_versions
cookbook_version = "#{cookbook.version.to_s}.#{hash}"

lkg_versions[cookbook_name] = cookbook_version

environment.write_cookbook_versions(lkg_versions)

This is just using the cookbook and environment classes of our Promote gem to write out the new versions and following this, the environment like all of our environments is committed to source control.

The dirty check

We have now seen how we generate a dependency hash and how we persist that on successful tests. Here is where we actually compare our current dependency hash and the last known hash. We need to determine before running integration tests on a given cookbook if it is any different than the last time we succesfully tested it. If we previously tested it and neither it nor any of its dependencies (no matter how deep) have changed, we really do not need or want to test it again.

def is_dirty(
  source_environment,
  lkg_environment,
  cookbook_name,
  dependency_hash = nil
)
  # Get the environment of the last good tests
  lkg_env_file = EnvironmentFile.new(lkg_environment, config)

  # if this cookbook is not in the last good tests, we are dirty
  unless lkg_env_file.cookbook_versions.has_key?(cookbook_name)
    return true
  end

  # get the current environment under test
  source_env_file = EnvironmentFile.new(source_environment, config)
  
  # get the version of this cookbook from the last good tests
  # it will look like 1.1.15.2d88c011bfb691e9de0df593e3f25f1e5bf7cbba
  # thi is the normal version followed by the dependency hash tested
  lkg_parts = lkg_env_file.cookbook_versions[cookbook_name].split('.')

  # Get the "plain" cookbook version
  lkg_v = lkg_parts[0..2].join('.')
  
  # the dependency hash last tested
  lkg_hash = target_parts[3]
  
  # we are dirty if our plain version or dependency hash has changed
  return source_env_file.cookbook_versions[cookbook_name] != lkg_v || 
    dependency_hash != lkg_hash
end

This method takes the environment currently under test, our LKG environment that has all the versions and their dependency hashes last tested, the cookbook to be tested and its current dependency hash. It returns whether or not our cookbook is dirty and needs testing.

Prior to calling this code, our build pipeline edits our "test" environment (the "source_environment" above). This environment holds the cookbook versions of all cookbooks being tested including the last batch of commits. We use a custom kitchen provisioner plugin to ensure Test-Kitchen uses these versions and not just the versions in each cookbooks berkshelf lock file. As I discussed in my dependency post from last year, its important that all changes are tested together.

So there you have it. Now Hurry up and Wait for your dirty cookbooks to run their tests!