qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Dietmar Maurer <dietmar@proxmox.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1)
Date: Wed, 21 Nov 2012 11:48:36 +0100	[thread overview]
Message-ID: <50ACB184.5080204@redhat.com> (raw)
In-Reply-To: <1353488464-82756-1-git-send-email-dietmar@proxmox.com>

Am 21.11.2012 10:01, schrieb Dietmar Maurer:
> +Some storage types/formats supports internal snapshots using some kind
> +of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
> +to use that for backups, but for now we want to be storage-independent.
> +
> +Note: It turned out that taking a qcow2 snapshot can take a very long
> +time on larger files.

Hm, really? What are "larger files"? It has always been relatively quick
when I tested it, though internal snapshots are not my focus, so that
need not mean much.

If this is really an important use case for someone, I think qcow2
internal snapshots still have some potential for relatively easy
performance optimisations.

But that just as an aside...

> +
> +=Make it more efficient=
> +
> +The be more efficient, we simply need to avoid unnecessary steps. The
> +following steps are always required:
> +
> +1.) read old data before it gets overwritten
> +2.) write that data into the backup archive
> +3.) write new data (VM write)
> +
> +As you can see, this involves only one read, an two writes.

Looks like a nice approach to backup indeed.

The question is how to fit this into the big picture of qemu's live
block operations. Much of it looks like an active mirror (which is still
to be implemented), with the difference that it doesn't write the new,
but the old data, and that it keeps a bitmap of clusters that should not
be mirrored.

I'm not sure if this means that code should be shared between these two
or if the differences are too big. However, both of them have things in
common regarding the design. For example, both have a background part
(copying the existing data) and an active part (mirroring/backing up
data on writes). Block jobs are the right tool for the background part.

The active part is a bit more tricky. You're putting some code into
block.c to achieve it, which is kind of ugly. We have been talking about
"block filters" previously that would provide a generic infrastructure,
and at least in the mid term the additions to block.c must disappear.
(Same for block.h and block_int.h - keep things as separated from the
core as possible) Maybe we should introduce this infrastructure now.

Another interesting point is how (or whether) to link block jobs with
block filters. I think when the job is started, the filter should be
inserted automatically, and when you cancel it, it should be stopped.
When you pause the job... no idea. :-)

> +
> +To make that work, our backup archive need to be able to store image
> +data 'out of order'. It is important to notice that this will not work
> +with traditional archive formats like tar.

> +* works on any storage type and image format.
> +* we can define a new and simple archive format, which is able to
> +  store sparse files efficiently.

> +
> +Note: Storing sparse files is a mess with existing archive
> +formats. For example, tar requires information about holes at the
> +beginning of the archive.

> +* we need to define a new archive format
> +
> +Note: Most existing archive formats are optimized to store small files
> +including file attributes. We simply do not need that for VM archives.
> +
> +* archive contains data 'out of order'
> +
> +If you want to access image data in sequential order, you need to
> +re-order archive data. It would be possible to to that on the fly,
> +using temporary files.
> +
> +Fortunately, a normal restore/extract works perfectly with 'out of
> +order' data, because the target files are seekable.

> +=Archive format requirements=
> +
> +The basic requirement for such new format is that we can store image
> +date 'out of order'. It is also very likely that we have less than 256
> +drives/images per VM, and we want to be able to store VM configuration
> +files.
> +
> +We have defined a very simply format with those properties, see:
> +
> +docs/specs/vma_spec.txt
> +
> +Please let us know if you know an existing format which provides the
> +same functionality.

Essentially, what you need is an image format. You want to be
independent from the source image formats, but you're okay with using a
specific format for the backup (or you wouldn't have defined a new
format for it).

The one special thing that you need is storing multiple images in one
file. There's something like this already in qemu: qcow2 with its
internal snapshots is basically a flat file system.

Not saying that this is necessarily the best option, but I think reusing
existing formats and implementation is always a good thing, so it's an
idea to consider.

Kevin

  parent reply	other threads:[~2012-11-21 10:48 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-21  9:01 [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1) Dietmar Maurer
2012-11-21  9:01 ` [Qemu-devel] [PATCH 2/5] add basic backup support to block driver Dietmar Maurer
2012-11-22 11:25   ` Stefan Hajnoczi
2012-11-22 11:29     ` Dietmar Maurer
2012-11-23  8:56     ` Dietmar Maurer
2012-11-21  9:01 ` [Qemu-devel] [PATCH 3/5] introduce new vma archive format Dietmar Maurer
2012-11-21 16:06   ` Eric Blake
2012-11-21 17:56     ` Dietmar Maurer
2012-11-21  9:01 ` [Qemu-devel] [PATCH 4/5] add backup related monitor commands Dietmar Maurer
2012-11-21 16:16   ` Eric Blake
2012-11-21 17:59     ` Dietmar Maurer
2012-11-21 18:49     ` Dietmar Maurer
2012-11-21  9:01 ` [Qemu-devel] [PATCH 5/5] add regression tests for backup Dietmar Maurer
2012-11-21 10:48 ` Kevin Wolf [this message]
2012-11-21 11:10   ` [Qemu-devel] [PATCH 1/5] RFC: Efficient VM backup for qemu (v1) Dietmar Maurer
2012-11-21 12:37     ` Kevin Wolf
2012-11-21 13:23       ` Paolo Bonzini
2012-11-23  7:42         ` Dietmar Maurer
2012-11-23  9:18           ` Paolo Bonzini
2012-11-23  9:28             ` Dietmar Maurer
2012-11-23  8:12         ` Dietmar Maurer
2012-11-23  9:01           ` Dietmar Maurer
2012-11-23  9:05             ` Dietmar Maurer
2012-11-23  9:15               ` Paolo Bonzini
2012-11-23  9:17                 ` Dietmar Maurer
2012-11-23  9:55               ` Kevin Wolf
2012-11-23 10:55                 ` Markus Armbruster
2012-11-21 13:25       ` Dietmar Maurer
2012-11-21 13:58         ` Kevin Wolf
2012-11-21 15:47           ` Dietmar Maurer
2012-11-23  7:38       ` Dietmar Maurer
2012-11-23  9:08         ` Kevin Wolf
2012-11-23  9:21           ` Dietmar Maurer
2012-11-23  9:31             ` Dietmar Maurer
2012-11-23 10:29               ` Kevin Wolf
2012-11-26  5:51                 ` Dietmar Maurer
2012-11-26 12:07                   ` Paolo Bonzini
2012-11-27  6:20                     ` Dietmar Maurer
2012-11-27  7:15                       ` Dietmar Maurer
2012-11-27  8:48                         ` Kevin Wolf
2012-11-27 10:24                           ` Dietmar Maurer
2012-11-21 11:23   ` Dietmar Maurer
2012-11-22 11:12 ` Stefan Hajnoczi
2012-11-22 11:26   ` Dietmar Maurer
2012-11-22 12:44     ` Stefan Hajnoczi
2012-11-22 12:55       ` Dietmar Maurer
2012-11-22 15:30         ` Stefan Hajnoczi
2012-11-22 15:58           ` Dietmar Maurer
2012-11-22 17:02             ` Stefan Hajnoczi
2012-11-22 17:34               ` Dietmar Maurer
2012-11-22 11:40   ` Dietmar Maurer
2012-11-22 15:42     ` Stefan Hajnoczi
2012-11-22 12:00   ` Dietmar Maurer
2012-11-22 15:45     ` Stefan Hajnoczi
2012-11-22 15:56       ` Dietmar Maurer
2012-11-22 16:37         ` Stefan Hajnoczi
2012-11-22 12:03   ` Dietmar Maurer
2012-11-22 17:16   ` Stefan Hajnoczi
2012-11-22 17:46     ` Dietmar Maurer
2012-11-23  5:23       ` Stefan Hajnoczi
2012-11-23  5:25         ` Stefan Hajnoczi
2012-11-23  6:18           ` Dietmar Maurer
2012-11-23  6:13         ` Dietmar Maurer
2012-11-22 17:50     ` Dietmar Maurer
2012-11-23  5:21       ` Stefan Hajnoczi
2012-11-22 18:05     ` Dietmar Maurer
2012-11-23  5:19       ` Stefan Hajnoczi
2012-11-23  6:05         ` Dietmar Maurer
2012-11-22 18:15     ` Dietmar Maurer
2012-11-27 10:09 ` Wenchao Xia
2012-11-27 10:37   ` Dietmar Maurer
2012-11-28  9:39     ` Wenchao Xia
2012-11-28 11:08       ` Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50ACB184.5080204@redhat.com \
    --to=kwolf@redhat.com \
    --cc=dietmar@proxmox.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).