From: Stefan Hajnoczi <stefanha@gmail.com>
To: Ryan Harper <ryanh@us.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [Qemu-devel] Block layer roadmap on wiki
Date: Mon, 22 Aug 2011 18:58:58 +0100 [thread overview]
Message-ID: <20110822175858.GA28175@stefanha-thinkpad.localdomain> (raw)
In-Reply-To: <20110822142712.GR5792@us.ibm.com>
On Mon, Aug 22, 2011 at 09:27:12AM -0500, Ryan Harper wrote:
> * Stefan Hajnoczi <stefanha@gmail.com> [2011-08-22 08:35]:
> > At KVM Forum Kevin, Christoph, and I had an opportunity to get
> > together for a Block Layer BoF. We went through the recent "roadmap"
> > mailing list thread and touched on each proposed feature.
> >
> > Here is the block layer roadmap wiki page:
> > http://wiki.qemu.org/BlockRoadmap
> >
> > Kevin: I have moved the runtime WCE toggling to QEMU 1.0 since you
> > mentioned you want it for the next release.
> >
> > My main take-away from the BoF was that integrating support for host
> > block devices and storage appliances will allow us to reduce the
> > amount of effort spent on image formats. In order to make image
> > formats support the desired features and performance we end up
> > implementing much of the storage stack and file systems in userspace -
> > code that is duplicated and cannot take advantage of the existing
> > storage stack.
>
> +1
>
> >
> > Storage management features are not just available in remote SAN and
> > NAS appliances anymore. For local storage, btrfs has file-level
> > clones and thin-dev is significantly improving LVM snapshots.
> >
> > Thin-dev is bringing a much more efficient and scalable snapshot model
> > to LVM. This device-mapper feature will make LVM attractive for high
> > performance I/O without giving up snapshot and clone features. It
> > also supports cloning off block devices that are not in the pool (e.g.
> > external storage, much like QEMU's backing files feature):
> > https://github.com/jthornber/linux-2.6/tree/thin-dev
> >
> > This will not replace image formats overnight because image formats
> > are still widely used and will continue to be a useful for
> > transferring and sharing disk images. But focussing on the larger
>
> Any thoughts on how to make this easily usable for LVM? If there were
> an export/import to/from file to LVM? is that sufficient? Anything
> like this in existence?
Forgot to mention a major advantage of a raw-oriented storage stack: we need
good support for raw + storage appliance anyway. Users want to hook up their
SAN or NAS just like they can with other hypervisors. Time spent on image
formats would be better spent fleshing out integration with LVM, btrfs, SAN,
NAS, and friends.
Back to import/export, it serves two purposes:
1. Efficient transport. Uploading and downloading image files in a
compact form that represents zero blocks efficiently and perhaps
compresses data.
2. Compatibility with other hypervisors and external tools. Here it's
all about using a well-defined file format.
In order to pull off a raw-oriented storage stack we need to do
import/export well. So this is an area where we have to focus.
Image streaming is a good approach for import because it allows the VM
to start instantly (even before the image is fully imported). A
qemu-nbd process serves up image data and we stream into a logical
volume.
For export we can do a fuse file system that presents logical volumes as image
files. That way existing applications can get at the data as if there were
real image files sitting on the file system. Sequential read access is easy
for all formats, random read is more difficult but should be doable for most
formats (the exception would be stream compressed formats that are not designed
for random access).
So moving to a raw-oriented storage stack does not mean we get rid of
image formats. We still need them but they are outside the critical I/O
path. Their role is changed since we don't push features into the
formats anymore.
Side note: iSCSI vs NBD came up during the BoF. Although NBD has not
seen maintenance or activity recently it's perfectly possible to build
on it. The first feature we need is a flush command (so that NBD can do
non-O_DSYNC accesses for speed). At that point we have a bare-bones
remote block protocol that can be used for migration and for connecting
up userspace image formats. iSCSI is more complex and suited for
permanent storage, whereas NBD is simple but perhaps not a protocol we
want to access data over for a long period of time.
Stefan
next prev parent reply other threads:[~2011-08-22 17:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-22 13:34 [Qemu-devel] Block layer roadmap on wiki Stefan Hajnoczi
2011-08-22 14:27 ` Ryan Harper
2011-08-22 17:58 ` Stefan Hajnoczi [this message]
2011-08-22 19:04 ` Anthony Liguori
2011-08-22 20:31 ` Stefan Hajnoczi
2011-08-22 20:48 ` Ryan Harper
2011-08-22 21:01 ` Anthony Liguori
2011-08-23 7:59 ` Stefan Hajnoczi
2011-08-23 11:25 ` Kevin Wolf
2011-08-23 12:21 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110822175858.GA28175@stefanha-thinkpad.localdomain \
--to=stefanha@gmail.com \
--cc=hch@lst.de \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ryanh@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).