Since the dawn of time, human kind has struggled to produce Windows images under a gigabyte and failed. We have all read the stories from the early Upanishads, we have studied the Zoroastrian calculations, recited the talmudic laws governing SxS yet continue to grow ever older as we wait for our windows pets to find an IP for us to RDP to. Well hopefully these days are nearing an end. I think its pretty encouraging that I can now package a windows VM in a 300MB vagrant package.
This post is going to walk through the details and pitfalls of creating a Packer template for Windows Nano Vagrant boxes. I have already posted on the basics of Packer templates and Vagrant box packaging. This post will assume some knowledge of Packer and Vagrant basic concepts.
Windows Nano, a smaller windows
Windows nano finally brings us vm images of similar relative size to its linux cousins. The one I built for VirtualBox is about 307MB. This is 10x smaller than the smallest 2012R2 box I have packaged at around 3GB.
Why so much smaller?
Here are a few highlights:
- No GUI. Really this time. No notepad and no cmd.exe window. Its windows without windows.
- No SysWow64. Nano completely abandons 32 bit compatibility, but I'm bummed there will be no SysWowWow128.
- Minimal packages and features in the base image. The windows team have stripped down this OS to a minimal set of APIs and features. You will likely find some of your go to utilities missing here, but thats ok because it likely has another and probably better API that accomplishes the same functionality.
Basically Microsoft is letting Backwards compatibility slide on this one and producing an OS that does not try to support legacy systems, but is far more effective at managing server cattle.
Installation challenges
Windows Nano does not come packaged in a separate ISO nor does it bundle as a separate image inside the ISO like most of the other server SKUs such as Standard Server or Data Center. Instead you need to build the image from bits in the installation media and extract that.
If you want to host Nano on Hyper-V, running the scripts to build and extract this image are shockingly easy. Even if you want to build a VirtualBox VM, things are not so bad. However there are more moving parts and some elusive gotchas when preparing a Packer template.
Just show me the template
Before I go into detail, mainly as a cathartic act of self governed therapy to recover from the past week of yak shaving, lets just show how to start producing and consuming packer templates for Nano images today. The template can be found here in my packer-templates repository. I'm going to walk through the template and the included scripts but that is optional reading.
I'm running Packer 0.8.2 and Virtualbox 5.0.4 on Windows 8.1 to build the template.
Known Issues
There were several snags but here are a couple items that just didn't work and may trip you up when you first try to build the template or Vagrant up:
- I had upgraded to the latest Packer version, 0.8.6 at the time of this post, and had issues with WinRM connectivity so reverted back to 0.8.2. I do plan to investigate that and alter the template to comply with the latest version or file issue(s) and/or PRs if necessary.
- Vagrant up will fail but may succeed to the extent that you need it to. It will fail to establish a WinRM connection with the box but it will create a connectable box and can also destroy it. This does mean that you will not have luck using any vagrant provisioners or packer provisioners. For me, that's fine for now.
The reason for the latter issue is that the WinRM service in nano expects requests to use codepage 65001 (UTF-8) and will refuse requests that do not. The WinRM ruby gem used by Vagrant uses codepage 437 and you will see exceptions when it tries to connect. Previous windows versions have accepted both codepages and I have heard that this will be the case with nano by the time it officially ships.
Connecting and interacting with the Nano Server
I have been connecting via powershell remoting. That of coarse assumes you are connecting from Windows. Despite what I said above about the limitations of the ruby WinRM gem, it does have a way to override the 437 codepage. However, doing so is not particularly friendly and means you cannot use alot of the helper methods in the gem.
To connect via powershell, run:
# Enable powershell remoting if it is not already enabled Enable-PSRemoting -Force # You may change "*" to the name or IP of the machine you want to connect to Set-Item "wsman:\localhost\client\trustedhosts" -Value "*" -Force # the password is vagrant $creds = Get-Credential vagrant # this assumes you are using NAT'd network which is the Virtualbox default # Use the computername or IP of the machine mand skip the port arg # if you are using Hyper-V or another non NAT network Enter-PSSession -Computername localhost -Port 55985 -Credential $creds
If you do not have a windows environment from which to run a remote powershell session, you can just create a second VM.
Deploying Nano manually
Before going through the packer template, it would be helpful to understand how one would build a nano server without packer or by hand. Its a bit more involved that giving packer an answer file. There are a few different ways to do this and some paths work better for different scenarios. I'll just layout the procedure for building Nano on virtualbox.
From Windows hosts
Ben Armstrong has a great post on creating nano VMs for Hyper-V. If you are on Windows and want to create Virtualbox VMs, the instructions for creating the nano image are nearly identical. The key change is to specify -OEMDrivers instead of -GuestDrivers in the New-NanoServerImage command. GuestDrivers have the minimal set of drivers needed for Hyper-V. While it can also Create a VirtualBox image that loads and shows the initial nano login screen, I was unable to actually login. Using -OEMDrivers adds a larger set of drivers and allows the box to function in VirtualBox. Its interesting to note that a Hyper-V Vagrant Box build using GuestDrivers is 60MB smaller than one using OEMDrivers.
Here is a script that will pop out a VHD after you mount the Windows Server 2016 Technical Preview 3 ISO:
cd d:\NanoServer . .\new-nanoserverimage.ps1 mkdir c:\dev\nano $adminPassword = ConvertTo-SecureString "Pass@word1" -AsPlainText -Force New-NanoServerImage ` -MediaPath D:\ ` -BasePath c:\dev\nano\Base ` -TargetPath c:\dev\nano\Nano-image ` -ComputerName Nano ` -OEMDrivers ` -ReverseForwarders ` -AdministratorPassword $adminPassword
Now create a new Virtualbox VM and attach to the VHD created above.
From Mac or Linux hosts
You have no powershell here so the instructions are different. Basically you need to either create or use an existing windows VM. Make sure you have a shared folder setup so that you can easily copy the nano VHD from the windows VM to your host and then create the Virtualbox vm using that VHD as its storage.
That all seems easy, why Packer?
So you may very well be wondering at this point, "Its just a handful of steps to create a nano VM. Your packer template has multiple scripts and probably 100 lines of powershell. What is the advantage of using Packer?"
First there might not be one. If you want to create one instance and play around on the same host and don't care about supporting other instances on other hosts or have scenarios where you need to ensure that multiple nodes come from an identically built image, then packer may not be the right tool for you.
Here are some scenarios where packer shines:
- Guaranteed identical images - If all images come from the same template, you know that they are all the same and you have "executable documentation" on how they were produced.
- Immutable Infrastructure - If I have production clusters that I routinely tear down and rebuild/replace or a continuous delivery pipeline that involves running tests on ephemeral VMs that are freshly built for each test suite, I can't be futzing around on each node, copying WIMs and VHDs.
- Multi-Platform - If I need to create both linux and windows environments, I'd prefer to use a single tool to pump out the base images.
- Single click, low friction box sharing - For the thousands and thousands of vagrant users out there, many of whom do not spend much time on windows, giving them a vagrant box is the best way to ensure they have a positive experience provisioning the right image and Packer is the best tool for creating vagrant boxes.
Walking through the template
So now we will step through the key parts of the template and scripts highlighting areas that stray from the practices you would normally see in windows template work and dwelling on nano behavior that may catch you off guard.
High level flow
First a quick summary of what the template does:
- Installs Windows Server 2016 Core on a new Virtualbox VM
- Powershell script is launched from the answer file that creates the Nano image, mounts it, copies it to an empty partition and then updates the default boot record to boot from that partition.
- Machine reboots into nano
- Some winrm tweaks are made, the Windows Server 2016 partition is removed and the nano partition extended over it.
- "Zap" unused space on disk.
- Packer archives the VM to vmdk and packages to a .box file.
Three initial disk partitions
We assume that there is no windows anywhere (because this reflects many build environments) so we will be installing two operating systems: The larger Windows Server 2016 and Nano. We build nano from the former. Our third partition is a system partition. Its easier to have a separate partition for the master boot record that we don't have to touch or move around in the process.
It is important that the Windows Server 2016 Partition be physically located at the end of the disk. Otherwise we will be stuck with a gap in the disk after we remove it.
One may find it odd that our Autounattend.xml file installs Server 2016 from an image named "Windows Server 2012 R2 SERVERDATACENTERCORE." It is odd but correct. That's cool. This is all beta still and I'm sure this is just one detail yet to be ironed out. There is probably some horrendously friction laden process involved to change the image name. One thing that tripped me up a bit is that there are 4 images in the ISO:
C:\dev\test> Dism /Get-ImageInfo /ImageFile:d:\sources\install.wim Deployment Image Servicing and Management tool Version: 10.0.10240.16384 Details for image : d:\sources\install.wim Index : 1 Name : Windows Server 2012 R2 SERVERSTANDARDCORE Description : Windows Server 2012 R2 SERVERSTANDARDCORE Size : 9,621,044,487 bytes Index : 2 Name : Windows Server 2012 R2 SERVERSTANDARD Description : Windows Server 2012 R2 SERVERSTANDARD Size : 13,850,658,303 bytes Index : 3 Name : Windows Server 2012 R2 SERVERDATACENTERCORE Description : Windows Server 2012 R2 SERVERDATACENTERCORE Size : 9,586,595,551 bytes Index : 4 Name : Windows Server 2012 R2 SERVERDATACENTER Description : Windows Server 2012 R2 SERVERDATACENTER Size : 13,847,190,006 bytes The operation completed successfully.
Images 3 and 4, the DataCenter ones are the only ones installable from an answer file.
Building Nano
I think .\scripts\nano_create.ps1 is pretty straight forward. We build the nano image as discussed earlier in this post and copy it to a permanent partition.
What might seem odd is the last few lines that setup winrm. Why do we do this when we are about to blow away this OS and never use winrm? We do this because of the way that the VirtualBox builder works in packer. It is currently waiting for winrm to become available before moving forward in the build process. So this is done simply as a signal to packer. A signal to what?
The Virtualbox builder will now invoke any "provisioners" in the template and then issue the template's shutdown command. We dont use any provisioners which brings us to the our first road bump.
Nano forces a codepage incompatible with packer and vagrant
On the one hand it is good to see Nano using a Utf-8 code page (65001). However, previous versions of Windows have traditionally used the old MS-DOS code page (437) and both the ruby WinRM gem used by Vagrant and the GO WinRM package used by packer are hard coded to use 437. At this time, Nano will not accept 437 so any attempt to establish WinRM communication by Vagrant and Packer will fail with htis error:
An error occurred executing a remote WinRM command. Shell: powershell Command: hostname if ($?) { exit 0 } else { if($LASTEXITCODE) { exit $LASTEXITCODE } else { exit 1 } } Message: [WSMAN ERROR CODE: 2150859072]: <f:WSManFault Code='2150859072' Machine='192.168.1.130' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message><f:ProviderFault path='%systemroot%\system32\winrscmd.dll' provider='Shell cmd plugin'>The WinRS client cannot process the request. The server cannot set Code Page. You may want to use the CHCP command to change the client Code Page to 437 and receive the results in English. </f:ProviderFault></f:Message></f:WSManFault>
This means packer provisioners will not work and we need to take a different route to provisioning.
One may think this a show stopper for provisioning Windows images and it is for some scenarios but for my initial packer use case, that's OK and I hear that Nano will accept 437 before it "ships." Note that this only seems to be the case with Nano and not Windows Server 2016.
Cut off from Winrm Configuration APIs
Both Vagrant and Packer expect to communicate over unencrypted WinRM using Basic Authentication. I know I just said that Vagrant and Packer cant talk WinRM at all but I reached a challenge with WinRM before discovering the codepage issue. When trying to allow unencrypted WinRM and basic auth, I found that the two most popular methods for tweaking winrm were not usable on nano.
These methods include:
- Using the winrm command line utility
- Using the WSMan Powershell provider
The first simply does not exist. Now the winrm command is c:\windows\system32\winrm.cmd which is a tiny thin wrapper around cscript.exe, the scripting engine used to run vbscripts. Well there is no cscript or wscript so no visual basic runtime at all. Interestingly, winrm.vbs does exist. Feels like a sick joke.
So we could use the COM API to do the configuration. If you like COM constants and HRESULTS, this is totally for you. The easier approach at least for my purposes is to simply flip the registry keys to get the settings I want:
REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Service /v allow_unencrypted /t REG_DWORD /d 1 /f REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Service /v auth_basic /t REG_DWORD /d 1 /f REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\WSMAN\Client /v auth_basic /t REG_DWORD /d 1 /f
No modules loaded in Powershell scripts run from SetupComplete.cmd
SetupComplete.cmd is a special file that can sit in windows\setup\scripts and if it does, will be run on first boot and then never again. We use this because as mentioned before, we cant use Packer provisioners since winrm is not an option. I have never used this file before so its possible this is not specific to nano but that would be weird. I was wondering why the powershell script I called from this file was not being called at all. Everything seemed to go fine, no errors but my code was definitely not being called. Kinda like debugging scheduled tasks.
First, Start-Transcript is not present on Nano. So that was to blame for the lack of errors. I switched to old school redirection:
cmd.exe /c C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -command . c:\windows\setup\scripts\nano_cleanup.ps1 > c:\windows\setup\scripts\cleanup.txt
Next I started seeing errors about other missing cmdlets like Out-File. I thought that seemed strange and had the script run Get-Module. The result was an empty list of modules so I added loading of the basic PS modules and the storage module, which would normally be auto loaded into my session:
Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Microsoft.PowerShell.Utility\Microsoft.PowerShell.Utility.psd1 Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Microsoft.PowerShell.Management\Microsoft.PowerShell.Management.psd1 Import-Module C:\windows\system32\windowspowershell\v1.0\Modules\Storage\Storage.psd1
Not everything you expect is on Nano but likely everything you need
As I mentioned above, Start-Transcipt and cscrpt.exe were missing, but thats not the only things. Here are some other commands I noticed were gone:
- diskpart
- bcdboot
- Get-WMIObject
- Restart-Computer
I'm sure there are plenty others but these all have alternatives that I could use.
Different arguments to powershell.exe
A powershell /? will reveal a command syntax slightly different from what one is used to:
C:\dev\test> Enter-PSSession -ComputerName 192.168.1.134 -Credential $c [192.168.1.134]: PS C:\Users\vagrant\Documents> powershell /? USAGE: powershell [-Verbose] [-Debug] [-Command] <CommandLine> CoreCLR is searched for in the directory that powershell.exe is in, then in %windir%\system32\CoreClrPowerShellExt\v1.0\.
No -ExecutionProfile, no -File and others are missing too. I imagine this could break some existing scripts.
No 32 bit
I knew this going in but was still caught off guard when sdelete.exe failed to work. I use sdelete, a sysinternals utility, for freeing empty space on disk which leads to a dramatically smaller image size when we are done. Well I'm guessing it was compiled for 32 bit because I got complaints about the executable image being incomatible with nano.
In the end this turned out to be for the best, I found a pure powershell alternative to sdelete which I adapted for my limited needs:
$FilePath="c:\zero.tmp" $Volume= Get-Volume -DriveLetter C $ArraySize= 64kb $SpaceToLeave= $Volume.Size * 0.05 $FileSize= $Volume.SizeRemaining - $SpacetoLeave $ZeroArray= new-object byte[]($ArraySize) $Stream= [io.File]::OpenWrite($FilePath) try { $CurFileSize = 0 while($CurFileSize -lt $FileSize) { $Stream.Write($ZeroArray,0, $ZeroArray.Length) $CurFileSize +=$ZeroArray.Length } } finally { if($Stream) { $Stream.Close() } } Del $FilePath
Blue Screens of Death
So I finally got the box built and was generally delighted with its size (310MB). However when I launched the vagrant box, the machine blue screened reporting that a critical process had died. All of the above issues had made this a longer haul than I expected but it turned out that troubleshooting the bluescreens was the biggest time suck and sent me on hours of wild goose chases and red herrings. I almost wrote a separate post dedicated to this issue, but I'm gonna try to keep it relatively brief here (not a natural skill).
What was frustrating here is I knew this could work. I had several successful tests but with slightly different execution flows which I was tweaking along the way, but it certainly did not like my final template and scripts. I would get the CRITICAL_PROCESS_DIED blue screen twice and then it would stop at a display of error code 0xc0000225 and the message "a required device isn't connected or can't be accessed."
Based on some searching I thought that there was something wrong somewhere in the boot record. After all I was messing with deleting and resizing partitions and changing the boot record compounded by the fact that I am not an expert in that area. However lots of futzing with diskpoart, cdbedit, cdbboot, and bootrec got me nowhere. I also downloaded the Technical preview 3 debug symbols to analyze the memory dump but there was nothing interesting there. Just a report that the process that died was wininit.exe.
Trying to manually reproduce this I found that the final machine produced by packer was just fine. Packer exports the VM to a new .vmdk virtual disk. Trying to create a machine from that would produce blue screens. Further, manually cloning a .vdi had the same effect - more blue screens. Finally, I tried attaching a new VM to the same disk that worked and made sure the vm settings were identical to the working machine. This failed too which seemed very odd. I then discovered that removing the working machine and manually editing the broken machine 's .box xml to have the same UUID as the working one, fixed things. After more researching, I found out that Virtualbox has a modifiable setting called a Hardware UUID. If none is supplied, it uses the box's UUID. So I cloned another box from the working machine, validated that it blue screened and then ran:
vboxmanage modifyvm --hardwareid "{same uuid as the working box}"
Voila! The box came to life. So I could fix this by telling the packer template to stamp an artificial guid at startup:
"vboxmanage": [ [ "modifyvm", "{{.Name}}", "--natpf1", "guest_winrm,tcp,,55985,,5985" ], [ "modifyvm", "{{.Name}}", "--memory", "2048" ], [ "modifyvm", "{{.Name}}", "--vram", "36" ], [ "modifyvm", "{{.Name}}", "--cpus", "2" ], [ "modifyvm", "{{.Name}}", "--hardwareuuid", "02f110e7-369a-4bbc-bbe6-6f0b6864ccb6" ] ],
then add the exact same guid to the Vagrantfile template:
config.vm.provider "virtualbox" do |vb| vb.customize ["modifyvm", :id, "--hardwareuuid", "02f110e7-369a-4bbc-bbe6-6f0b6864ccb6"] vb.gui = true vb.memory = "1024" end
This ensures that vagrant "up"s the box with the same hardware UUID that it was created with. The actual id does not matter and I don't think there is any harm, at least for test purposes, in having duplicate hardware uuids.
I hoped that a similar strategy would work for Hyper-V by changing its BIOSGUID using a powershell script like this:
#Virtual System Management Service $VSMS = Get-CimInstance -Namespace root/virtualization/v2 -Class Msvm_VirtualSystemManagementService #Virtual Machine $VM = Get-CimInstance -Namespace root/virtualization/v2 -Class Msvm_ComputerSystem -Filter "ElementName='Demo-VM'" #Setting Data $SD = $vm | Get-CimAssociatedInstance -ResultClassName Msvm_VirtualSystemSettingData -Association Msvm_SettingsDefineState #Update bios uuid $SD.BIOSGUID = "some guid" #Create embedded instance $cimSerializer = [Microsoft.Management.Infrastructure.Serialization.CimSerializer]::Create() $serializedInstance = $cimSerializer.Serialize($SD, [Microsoft.Management.Infrastructure.Serialization.InstanceSerializationOptions]::None) $embeddedInstanceString = [System.Text.Encoding]::Unicode.GetString($serializedInstance) #Modify the system settings Invoke-CimMethod -CimInstance $VSMS -MethodName ModifySystemSettings @{SystemSettings = $embeddedInstanceString}
Thanks to this post for an example. It did not work but the Hyper-V blue screens seem to be "self healing". I post more details in the vagrant box readme on atlas.
No sysprep
I suspect that the above is somehow connected to OS activation. I saw lots of complaints on google about needing to do something similar with the hardware uuid in order to preserve the windows activation of a cloned machine. I also noted that if I cloned the box manually before packer rebooted into nano thereby letting the clone run the initial setup, things worked.
Ideally, the fix here would be to leave the hardware uuid alone and just sysprep the machine as the final packer step. This is what I do for 2012R2. However from what I can tell, there is no sysprep available for nano. I really hope that there will be.
Finally a version of windows for managing cattle servers
You may have heard the cattle vs. pets analogy that compares "special snowflake" servers to cloud based clusters. The idea is that one perceives cloud instances like cattle. There is no emotional attachment or special treatment given to one server over another. At this scale, one cant afford to. We don't have the time or resources to be accessing our instances via a remote desktop and clicking buttons or dragging windows. If one becomes sick, we put it out of its misery quickly and replace it.
Nano is light weight and made to be accessed remotely. Really interested to see how this progresses.