Habitat application portability and understanding dynamic linking of ELF binaries / by Matt Wrock

I do not come from a classical computer science background and have spent the vast majority of my career working with Java, C# and Ruby - mostly on Windows. So I have managed to evade the details of exactly how native binaries find their dependencies at compile time and runtime on Linux. It just has not been a concern in the work that I do. If my app complains about missing low level dependencies, I find a binary distribution for Windows (99% of the time these exist and work across all modern Windows platforms) and install the MSI. Hopefully when the app is deployed, those same binary dependencies have been deployed on the production nodes and it would be just super if its the same version.

Recently I joined the Habitat team at Chef and one of the first things I did to get the feel of using Habitat to build software was to start creating Habitat build plans. The first plan I set out to create was .NET Core. I would soon find out that building .NET Core from source on Linux was probably a bad choice for a first plan. It uses clang instead of GCC, it has lots of cmake files that expect binaries to live in /usr/lib and it downloads built executables that do not link to Habitat packaged dependencies. Right out the gate, I got all sorts of various build errors as I plodded forward. Most of these errors centered around a common theme: "I can't find X." There were all sorts of issues beyond linking too that I won't get into here but I'm convinced that if I knew the basics of what this post will attempt to explain, I would have had a MUCH easier time with all the errors and pitfalls I faced.

What is linking and what are ELF binaries?

First lets define our terms:

ELF

There are no "Lord of the Rings" references to be had here. ELF is the Extensible and linkable format and defines how binary files are structured on Linux/Unix. This can include executable files, shared libraries, object files and more. An ELF file contains a set of headers and a number of sections for things like text, data, etc. One of the key roles of an ELF binary is to inform the operating system how to load a program into memory including all of the symbols it must link to.

Linking

Linking is a key part of the process of building an executable. The other key part is compiling. Often we refer to both jointly as "compiling" but they are really two distinct operations. First the compiler takes source code files and turn them into machine language instructions in the form of object files. These object files alone are not very useful to running a program.

Linking takes the object files (some might be from source code you wrote) and links them together with external library files to create a functioning program. If your source code calls a function from an external library, the compiler gleefully assumes that function exists and moves on. If it doesn't exist, don't worry, the linker will let you know.

Often when we hear about linking, two types are mentioned: static and dynamic. Static linking takes the external machine instructions and embeds them directly into the built executable. If all external dependencies of a program were statically linked, there would be only one executable file and no need for any dependent shared object files to be referenced.

However, we usually dynamically link our external dependencies. Dynamic linking does not embed the external code into the final executable. Instead it just points to an external shared object (.so) file (or .dll file on Windows) and loads that code into the running process at runtime. This has the benefit of being able to update external dependencies without having to ship and package your application each time a dependency is updated. Dynamic linking also results in a smaller application binary since it does not contain the external code.

On Unix/Linux systems, the ELF format specifies the metadata that governs what libraries will be linked. These libraries can be in many places on the machine and may exist in more than one place. The metadata in the ELF binary will help determine exactly what files are linked when that binary is executed.

Habitat + dynamic linking = portability

Habitat leverages dynamic linking to provide true application portability. It might not be immediately obvious what this means or why it is important or if it is even a good thing. So lets start by describing how applications typically load their dependencies in a normal environment and the role that configuration management systems like Chef play in these environments.

How you manage dependencies today

Lets say you have written an application that depends on the ZeroMQ library. You might use apt-get or yum to install ZeroMQ and its binaries are likely dropped somewhere into /usr. Now you can build and run your application and it will consume the ZeroMQ libraries installed. Unless it is told otherwise, the linker will scan the trusted Linux library locations for shared object files to link.

To illustrate this, I have built ZeroMQ from source and it produced libzmq.so.5 and put it in /usr/local/lib. If I examine that shared object with ldd, I can see where it links to its dependencies:

mwrock@ultrawrock:~$ ldd /usr/local/lib/libzmq.so.5
linux-vdso.so.1 =>  (0x00007ffffe05f000)
libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f7e92370000)
libsodium.so.18 => /usr/local/lib/libsodium.so.18 (0x00007f7e92100000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7e91ef0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7e91cd0000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7e91ac0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7e917a0000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7e91490000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e910c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7e92a00000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f7e90e80000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7e90c60000)

They are all linked to the dependencies found in the Linux trusted library locations.

Now the time comes to move to production and just like you needed to install the ZeroMQ libraries in your dev environment, you will need to do the same on your production nodes. We all know this drill and we have probably all been burned at some point - something new is deployed to production and either its dependencies were not there or they were but they were the wrong version.

Configuration Management as solution

Chef fixes this right? Kind of...it's complicated.

You can absolutely have Chef make sure that your application's dependencies are installed with the correct versions. But what if you have different applications or services on the same node that depend on a different version of the same dependency? It may not be possible to have multiple versions coexist in /usr/lib. Maybe your new version will work or maybe it won't. Especially for some of the lower level dependencies, there is simply no guarantee that compatible versions will exist. If anything, there is one guarantee: different distros will have different versions.

Keeping the automation with the application

Even more important - you want these dependencies to travel with your application. Ideally I want to install my application and know by virtue of installing it, everything it needs is there and has not stomped over the dependencies of anything else. I do not want to delegate the installation of its dependencies and the knowledge of which version to install to a separate management layer. Instead, Habitat binds dependencies with the application so that there is no question what your application needs and installing your application includes the installation of all of its dependencies. Lets look at how this works and see how dynamic linking is at play.

When you build a habitat plan, your plan will specify each dependency required by your application in your application's plan:

pkg_deps=(core/glibc core/gcc-libs core/libsodium)

Then when Habitat packages your build into its final, deployable artifact (.hart file), that artifact will include a list of every dependent Habitat package (including the exact version and release):

[35][default:/src:0]# cat /hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/DEPS
core/glibc/2.22/20160612063629
core/gcc-libs/5.2.0/20161208223920
core/libsodium/1.0.8/20161214075415

At install time, Habitat installs your application package and also the packages included in its dependency manifest (the DEPS file shown above) in the pkgs folder under Habitat's root location. Here it will not conflict with any previously installed binaries on the node that might live in /usr. Further, the Habitat build process links your application to these exact package dependencies and ensures that at runtime, these are the exact binaries your application will load.

[36][default:/src:0]# ldd /hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/lib/libzmq.so.5
linux-vdso.so.1 (0x00007fffd173c000)
libsodium.so.18 => /hab/pkgs/core/libsodium/1.0.8/20161214075415/lib/libsodium.so.18 (0x00007f8f47ea4000)
librt.so.1 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/librt.so.1 (0x00007f8f47c9c000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007f8f47a7e000)
libstdc++.so.6 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libstdc++.so.6 (0x00007f8f47704000)
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007f8f47406000)
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007f8f47061000)
libgcc_s.so.1 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libgcc_s.so.1 (0x00007f8f46e4b000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib64/ld-linux-x86-64.so.2 (0x0000560174705000)

Habitat guarantees that the same binaries that were linked at build time, will be linked at run time. Even better, it just happens and you don't need a separate management layer to enforce this.

This is how a Habitat package provides portability. Installing and running a Habitat package brings all of its dependencies with it. They do not all live in the same .hart package, but your application's .hart package includes the necessary metadata to let Habitat know what other packages to download and install from the depot. These dependencies may or may not already exist on the node with varying versions, but it doesn't matter because a Habitat application only relies on the packages that reside within Habitat. And even within the Habitat environment, you can have multiple applications that rely on the same dependency but different versions, and these applications can run side by side.

The challenge of portability and the Habitat studio

So when you are building a Habitat plan into a hart package, what keeps that build from pulling dependencies from the default Linux lib directories? What if you do not specify these dependencies in your plan and the build links them from elsewhere? That could break our portability. If your application builds magically from a non-Habitat controlled location, then there is no guarantee that those dependencies will land when you install your application elsewhere. Habitat constructs a build environment called a "studio" to protect against this exact scenario.

The Habitat studio is a clean room environment. The only libraries you will find in this environment are those managed by Habitat. You will find /lib and /usr/lib totally empty here:

[37][default:/src:0]# ls /lib -la
total 8
drwxr-xr-x  2 root root 4096 Dec 24 22:46 .
drwxr-xr-x 26 root root 4096 Dec 24 22:46 ..
lrwxrwxrwx  1 root root    3 Dec 24 22:46 lib -> lib
[38][default:/src:0]# ls /usr/lib -la
total 8
drwxr-xr-x 2 root root 4096 Dec 24 22:46 .
drwxr-xr-x 9 root root 4096 Dec 24 22:46 ..
lrwxrwxrwx 1 root root    3 Dec 24 22:46 lib -> lib

Habitat installs several packages into the studio including several familiar Linux utilities and build tools. Every utility and library that Habitat loads into the studio is a Habitat package itself.

[1][default:/src:0]# ls /hab/pkgs/core/
acl       cacerts    gawk      gzip            libbsd         mg       readline    vim
attr      coreutils  gcc-libs  hab             libcap         mpfr     sed         wget bash      diffutils  glibc     hab-backline    libidn         ncurses  tar         xz
binutils  file       gmp       hab-plan-build  linux-headers  openssl  unzip       zlib bzip2     findutils  grep      less            make           pcre     util-linux

This can be a double edged sword. On the one hand it protects us from undeclared dependencies being missed by our package. The darker side is that your plan may be building source that has build scripts that expect dependencies or other build tools to exist in their "usual" homes. If you are unfamiliar with how the standard Linux linker scans for dependencies, discovering what is wrong with your build may be less than obvious.

The rules of dependency scanning

So before we go any further lets take a look at how the linker works and how Habitat configures its build environment to influence where it finds dependencies at both build and run time. The linker looks at a combination of environment variables, cli options and well known directory paths and in a strict order of precedence. Here is a direct quote from the ld (the linker binary) man page:

The linker uses the following search paths to locate required shared libraries:

1. Any directories specified by -rpath-link options.
2. Any directories specified by -rpath options.  The difference between -rpath and -rpath-link is that directories specified by -rpath options are included in the executable and used at runtime, whereas the -rpath-link option is only effective at link time. Searching -rpath in this way is only supported by native linkers and cross linkers which have been configured with the --with-sysroot option.
3. On an ELF system, for native linkers, if the -rpath and -rpath-link options were not used, search the contents of the environment variable "LD_RUN_PATH".
4. On SunOS, if the -rpath option was not used, search any directories specified using -L options.
5. For a native linker, search the contents of the environment variable "LD_LIBRARY_PATH".
6. For a native ELF linker, the directories in "DT_RUNPATH" or "DT_RPATH" of a shared library are searched for shared libraries needed by it. The "DT_RPATH" entries are ignored if "DT_RUNPATH" entries exist.
7. The default directories, normally /lib and /usr/lib.
8. For a native linker on an ELF system, if the file /etc/ld.so.conf exists, the list of directories found in that file.

At build time Habitat sets the $LD_RUN_PATH variable to the library path of every dependency that the building plan depends on. We can see this in Habitat's build output when we build a Habitat plan:

zeromq: Setting LD_RUN_PATH=/hab/pkgs/mwrock/zeromq/4.1.4/20161225135834/lib:/hab/pkgs/core/glibc/2.22/20160612063629/lib:/hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib:/hab/pkgs/core/libsodium/1.0.8/20161214075415/lib

This means that at run time, when you run your application built by habitat, it will load from the "habetized" packaged dependencies. This is because setting the $LD_RUN_PATH influences how the ELF metadata is constructed and causes it to point to these Habitat package paths.

Patching pre-built binaries

Habitat not only allows one to build packages from source but also supports "binary-only" packages. These are packages that are made up of binaries downloaded from some external binary repository or distribution site. These are ideal for closed-source software or software that may be too complicated or takes too long to build. However, Habitat cannot control the linking process for these binaries. If you try to execute these binaries in a Habitat studio, you may see runtime failures.

The dotnet-core package is a good example of this. I ended up giving up on building that plan from source and instead just download the binaries from the public .NET distribution site. Running ldd on the dotnet binary, we see:

[8][default:/src:0]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet
/hab/pkgs/core/glibc/2.22/20160612063629/bin/ldd: line 117:
/hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet:
No such file or directory

Well that's not very clear. This isn't even able to show us any of the linked dependencies because the glibc interpreter the ELF metadata says to use is not where the metadata says it is:

[9][default:/src:1]# file /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet
/hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225145648/bin/dotnet:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32,
BuildID[sha1]=db256f0ac90cd718d8ec2d157b29437ea8bcb37f, not stripped

/lib64/ld-linux-x86-64.so.2 does not exist . We can manually fix this even after a binary is built with a tool called patchelf. We will declare a build dependency in our plan to core/patchelf and then we can use the following command:

find -type f -name 'dotnet' \
  -exec patchelf --interpreter "$(pkg_path_for glibc)/lib/ld-linux-x86-64.so.2"

Now lets try ldd again:

[16][default:/src:130]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225151837/bin/dotnet
linux-vdso.so.1 (0x00007ffe421eb000)
libdl.so.2 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libdl.so.2 (0x00007fcb0b2cc000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007fcb0b0af000)
libstdc++.so.6 => not found
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007fcb0adb1000)
libgcc_s.so.1 => not found
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007fcb0aa0d000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib/ld-linux-x86-64.so.2 (0x00007fcb0b4d0000)

This is better. It now links our glibc dependencies to the Habitat packaged glibc binaries, but there are still a couple dependencies that the linker could not find. At least now we can see more clearly what they are.

There is another argument we can pass to patchelf --set-rpath that can edit the ELF metadata as if $LD_RUN_PATH was set when the binary was built:

find -type f -name 'dotnet' \
  -exec patchelf --interpreter "$(pkg_path_for glibc)/lib/ld-linux-x86-64.so.2" --set-rpath "$LD_RUN_PATH" {} \;
find -type f -name '*.so*' \
  -exec patchelf --set-rpath "$LD_RUN_PATH" {} \;

So we set the rpath to the $LD_RUN_PATH set in the Habitat environment. We will also make sure to do this for each *.so file in the directory where we downloaded the distributable binaries. Finally ldd now finds all of our dependencies:

[19][default:/src:130]# ldd /hab/pkgs/mwrock/dotnet-core/1.0.0-preview3-003930/20161225152801/bin/dotnet
linux-vdso.so.1 (0x00007fff3e9a4000)
libdl.so.2 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libdl.so.2 (0x00007f1e68834000)
libpthread.so.0 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libpthread.so.0 (0x00007f1e68617000)
libstdc++.so.6 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libstdc++.so.6 (0x00007f1e6829d000)
libm.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libm.so.6 (0x00007f1e67f9f000)
libgcc_s.so.1 => /hab/pkgs/core/gcc-libs/5.2.0/20161208223920/lib/libgcc_s.so.1 (0x00007f1e67d89000)
libc.so.6 => /hab/pkgs/core/glibc/2.22/20160612063629/lib/libc.so.6 (0x00007f1e679e5000)
/hab/pkgs/core/glibc/2.22/20160612063629/lib/ld-linux-x86-64.so.2 (0x00007f1e68a38000)

Every dependency is a Habitat packaged binary as declared in our own application's (dotnet-core here) dependencies as low level as glibc. This should be fully portable across any 64 bit Linix distribution.