kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jagane Sundar <jagane@sundar.org>
To: kvm@vger.kernel.org
Subject: A Live Backup feature for KVM
Date: Sat, 23 Apr 2011 16:17:59 -0700	[thread overview]
Message-ID: <4DB35E27.4030404@sundar.org> (raw)

Hello All,

I would like to get your input on a KVM feature that I am
currently developing.

What it does is this - it can perform full and incremental
disk backups of running KVM VMs, where a backup is defined
as a snapshot of the disk state of all virtual disks
configured for the VM.

This backup mechanism is built by modifying the qemu-kvm
userland process, and works as follows:
- If a VM is configured for backup, qemu-kvm maintains a
    dirty blocks list since the last backup. Note that this
    is different from the dirty blocks list currently
    maintained for block migration purposes in that it is
    persistent across VM reboots.
- qemu-kvm creates a thread and listens for backup clients.
- A backup client connects to qemu-kvm and initiates an
    incremental backup.
       * A snapshot of each virtual disk is created by
         qemu-kvm. This is as simple as saving the dirty
         blocks map in the snapshot structure
       * The dirty blocks are now transferred over to the
         backup client.
       * While this transfer is in progress, if any blocks
         are written by the VM, the livebackup code
         intercepts these writes, saves the old blocks in
         a qcow2 file, and then allows the write to progress.
       * When the transfer of all dirty blocks in the
         incremental backup is completed, then the snapshot
         is destroyed.

I have considered other technologies that may be utilized
to solve the same problem such as LVM snapshots. It is
possible to create a new LVM partition for each virtual disk
in the VM. When a VM needs to be backed up, each of these LVM
partitions is snapshotted. At this point things get messy
- I don't really know of a good way to identify the blocks
that were modified since the last backup. Also, once these
blocks are identified, we need a mechanism to transfer
them over a TCP connection to the backup server. Perhaps
a way to export the 'dirty blocks' map to userland and use
a deamon to transfer the block. Or maybe a kernel thread
capable of listening on TCP sockets and transferring the
blocks over to the backup client (I don't know if this
is possible).

In any case, my first attempt is to implement this in the
qemu-kvm userland binary.

The benefit to the end user of this technology is this: Today
IaaS cloud platforms such as EC2 provide you with the ability
to have two types of virtual disks in VM instances
1. Ephemeral virtual disks that are lost if there is a
     hardware failure
2. EBS storage volumes which are costly.

I think that an efficient disk backup mechanism will enable
a third type of virtual disk - one that is backed up, perhaps
every hour or so. So a cloud operator using KVM virtual
machines can offer three types of VMS:
1. An ephemeral VM that is lost if a hardware failure happens
2. A backed up VM that can be restored from the last hourly
     backup
3. A fully highly-available VM running off of a NAS or SAN
     or some such shared storage.

VMware has extensive support for backing up running Virtual
Machines in their products. It is called VMware Consolidated
Backup. A lot of it seems to be targeted at Windows VMs,
with hooks provided into Microsoft's Volume Snapshot Service
running in the guest.

My proposal will also eventually need the capability to run an
agent in the guest for sync'ing the filesystem, flushing
database caches, etc. I am also unsure whether just sync'ing
a ext3 or ext4 FS and then snapshotting is adequate for backup
purposes.

I want to target this feature squarely at the cloud use model,
with automated backups scheduled for instances created using
an EC2 or Openstack API.

Please let me know if you find this feature interesting. I am
looking forward to feedback on any and all aspects of this
design. I would like to work with the KVM community to
contribute this feature to the KVM code base.

Thanks,
Jagane Sundar


             reply	other threads:[~2011-04-23 23:24 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-23 23:17 Jagane Sundar [this message]
2011-04-24  8:32 ` A Live Backup feature for KVM Stefan Hajnoczi
2011-04-25  8:16   ` Jagane Sundar
2011-04-25 13:34     ` Stefan Hajnoczi
2011-04-26  3:31       ` Jagane Sundar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DB35E27.4030404@sundar.org \
    --to=jagane@sundar.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).