Skip to content. | Skip to navigation

Personal tools
Log in

DIstributed Computing Environments (DICE) Team
We are a group of computer scientists and IT experts from the Department of Computer Science AGH and ACC Cyfronet AGH. We are a curiosity- and research-driven team, specializing in large-scale distributed computing, HPC, Web and Cloud technologies. We develop new methods, tools and environments and we apply these solutions in e-Science, healthcare and industrial domains.


Sections
You are here: Home DICE Blog To bake or not to bake - or how to create VM images in hybrid clouds.

To bake or not to bake - or how to create VM images in hybrid clouds.

Posted by Maciej Malawski at Jul 25, 2014 12:06 AM |
Public, private and hybrid clouds of IaaS model give you the root access to the virtual machine instances and the power to install everything you need, apply all the tweaks to the OS and applications, and save the VM image as a new template. The question arises: when I change something, should I save it as a new template again - we call it “baking” a new image; or maybe I should have a set of scripts - we call it a devops way - to automate the installation process and apply it to a fresh image? Should I bake or should I not? Unsurprisingly, both ways have pros and cons.

The advantages of baking are simple to grasp:

  • You use a simple method of manual installation as you do on your laptop, and you can periodically click the “save as” button to bake your updated image.

  • The method is so simple even your grandma can do it (or at least your users can do it, even if they are more interested in chemistry or biology than in cloud - as many of our scientist users actually do!)

  • The baked image can be easily exported and imported to other cloud, which makes bursting a pleasure. Of course, your clouds need to support import and export features, and still many of them don’t. And such details as image format compatibility or other networking, security and configuration issues should not be forgotten when you want to bake a portable image!

The disadvantages of baking images are also somewhat important:

  • When you bake an image from a manually configured VM instance, you may forget how the installation was actually done and which configuration tweaks were performed. So the process loses its trace and may be hard to repeat and reproduce.

  • Sometimes your cloud does not allow for image export or import, so you may get stuck with a tasty baked image that you cannot feed another cloud. That is not so nice…

  • Even if your cloud supports export/import, the baked image may be quite heavy to move around. Typically an Ubuntu image with some software installed can weigh around 30GB, so saving it may take half an hour and sending it via your uplink to a public cloud may take ages.

Using the tools for automation the infrastructure and software provisioning have also several pros and cons to account for. The advantages are:

  • You can use your favorite devops tools: Chef, Puppet, Ansible, you name it. They help code your software installation in a form of scripts that are much less heavy than the baked images themselves, so you can store them in a Git repo, edit, version, and so forth as easy as a snap.

  • The scripts can use the existing community recipes and cookbooks (speaking in Chef jargon) to cook a desired image from an empty OS template and repeat it on other cloud or inside your own VirtualBox or Vagrant development sandbox.

  • If you lose an image - no problem! You can rebuild it from scratch using your scripts.

  • The best advantage is the flexibility. Let me give an example: I ran some experiments with an NFS server installed on Ubuntu. After some stress tests (100+ clients on other VMs) I found some scalability issues with the NFS installation. After changing a few lines of Chef scripts I was able to redeploy the NFS server on a CentOS image, and found out that the NFS server version shipped with this OS performs much better. This is what we call a “programmable infrastructure”!

Of course, the devops way is not for everybody:

  • Probably your grandma would have to learn some devops tools and change her way of doing things. This may be too much for some users, for whom manual installation and configuration of software is already a challenge.

  • The process of software provisioning requires package downloading, installation, or even compilation from source, which of course takes the precious time of your hourly billed VM instance.

  • Not all software comes with pre-build Chef or Puppet scripts, so you are left with the only option that is carving them by hand.

  • One mistake or a bug in your script can be found only after the machine is started and the installation process reaches the faulty line. So, debugging your devops code can cost you some hair pulling.

 

So, to bake or not to bake? Let’s take a look how we deal with it at DICE and in our projects.

In VPH-Share (http://www.vph-share.eu/), we develop Atmosphere cloud platform for biomedical researchers in the Virtual Physiological Human community. The platform manages Atomic Services, which are VM images that can be developed, run and shared between the researchers. The main method is the bake approach, so each developer can create and save new versions of VM templates as the development process goes on. This approach is convenient since the users range from medical informaticians and computational physicists who are not much familiar with the software engineering tools, to more experienced research programmers and cloud admins who can manage the advanced software and operating system configuration.

In VPH-Share, we run OpenStack cloud at Cyfronet and another installation is at Vienna University. So the image that is baked and saved at Cyfronet site needs to be propagated to the Vienna site. For that, we developed an rsync based template propagation mechanism. The images are not synchronized automatically, but the sync process has to be spawned manually for the services that need to be replicated.

Another issue that we run into was while bursting into Amazon EC2 or other clouds. When we did a survey of cloud providers last year (http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6546092&tag=1), none of the major cloud providers allowed the image import/export functionality. This, however, have changed during the past year! Most notably, RackSpace become more open thanks to their involvement in OpenStack project, thus making it easy to migrate an image from a private OpenStack installation to their public hosting. Moreover, Amazon that has been known for their proprietary AMI image format and specifics to the paravirtualized Xen-based instances, recently announced the image import/export feature for their new generation m3.* instance types. This is partly due to the fact that these instances use HVM virtualization that does not require any paravirtualization support in the kernel, thus making the image import from our KVM-based cloud possible. Actually, Amazon allows to import VM images in RAW, OVF, VMDK and other formats: (http://aws.amazon.com/ec2/vm-import/). Please note that Ubuntu versions up from 12.04 are supported, older ones have incompatible kernels. We have tested recently that this is not a blocker, since even some of our oldest Atomic Services based on Ubuntu 10.04 (LTS when we started the project) can be quite easily dist-upgraded to 12.04 without much harm to the installed software. Thanks to this import feature, we can now burst into EC2 with our pre-baked images and the process is automated using our recently developed scripts.

The baking way is not the only one we pursue. In PaaSage project (http://www.paasage.eu/) , we work on applying Model Driven Engineering (MDE) to cloud development, using tools such as CloudML (http://cloudml.org/). Our HyperFlow (https://github.com/dice-cyfronet/hyperflow) workflow engine is adapted to this method by following the devops way. We have developed a set of Chef scripts for deployment of HyperFlow engine and workflow applications (https://github.com/malawski/hyperflow-deployment). These scripts will be further integrated with the CloudML and Cloudify (http://getcloudify.org/) that are used in Paasage. By adhering to the MDE paradigm, the applications will become more portable for hybrid clouds, as in “develop once, execute on every cloud”.

So, again, to bake, or not to bake? The answer is not so simple, since both ways have better tool support as clouds evolve. It depends also on the frequency of your image updates, whether you do it occasionally, or on a daily basis as in continuous deployment or continuous delivery. Finally, perhaps the answer depends whether you feel more as a belonging to developers, or admins, or devops category.

Comments (0)

NOTE! This web site uses cookies and similar technologies (if you do not change the browser settings, you agree to this).

cyfronet agh