Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Michael Roth <mdroth@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>,
	qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
Date: Wed, 27 Jul 2011 11:07:13 -0500	[thread overview]
Message-ID: <4E3037B1.7070301@linux.vnet.ibm.com> (raw)
In-Reply-To: <20110727152457.GK18528@redhat.com>

On 07/27/2011 10:24 AM, Andrea Arcangeli wrote:
> Hello everyone,
>
> I've been thinking at the current design of the fsfreeze feature used
> by libvirt.
>
> It currently relays on an userland agent in the guest talking to qemu
> with some vmchannel communication. The guest agent would walk the
> filesystems in the guest and call fsfreeze ioctl on them.
>
> The fsfreeze is an optional feature, it's not required to do safe
> snapshots, after fsfreeze (regardless if available or not) QEMU must
> still block all I/O for all qemu blkdevices before the image is saved,
> to allow safe snapshotting of non-linux guests. Then if a VM is
> restarted in the snapshot it becomes identical to a fault tolerance
> fallback with nfs or drdb in a highly available
> configuration. Fsfreeze just provides some further (minor) benefit on
> top of that (which probably won't be available for non-linux guests
> any time soon).
>
> The benefits this optional fsfreeze feature provides to the snapshot
> are:
>
> 1) more peace of mind by not relaying on the kernel journal reply code
> when snapshotting journaled/cow filesystems like ext4/btrfs/xfs
>
> 2) all dirty outstanding cache is flushed, which reduces the chances
> of running into userland journaling data reply bugs if userland is
> restarted on the snapshot
>
> 3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on
> linux (not so common, and vfat on non-linux guest won't benefit)
>
> 4) allows to mount the snapshotted image readonly without requiring
> metadata journal reply
>
> Problem is that having a daemon in guest userland is not my
> preference, considering it can be done with a virtio-fsfreeze.ko
> kernel module in guest without requiring any userland modification to
> the guest (and no interprocess communication through vmchannel
> or similar way).
>
> This means a kernel upgrade in the guest that adds the
> virtio-fsfreeze.ko virtio paravirt driver would be enough to be able
> to provide fsfreeze during snapshots.
>
> A virtio-fsfreeze.ko would certainly be more developer friendly, you
> could just build the kernel and even boot it with -kernel bzImage
> (after building it with VIRTIO_FSFREEZE=y). Then it'd just work
> without any daemon or vmchannel or any other change to the guest
> userland.
>
> I could see some advantage in not having to modify qemu if libvirt was
> talking directly to the guest agent, so to avoid any knowledge into
> qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest
> agent patches floating around. So if qemu has to be modified and be
> aware of the fsfreeze feature in the userland guest agent (and not
> just asked to block all I/O which doesn't require any guest knowledge
> and in turn it'd remain agnostic about fsfreeze) I think it'd be
> better if the fsfreeze qemu code would just go into a virtio backend.
>
> There is also an advantage in reliability as there's no more need to
> worry about mlocking the memory of the userland guest agent, making
> sure no lib is calling any I/O function to be able to defreeze the
> filesystems later, making sure the oom killer or a wrong kill -9
> $RANDOM isn't killing the agent by mistake while the I/O is blocked
> and the copy is going. The guest kernel is a more reliable and natural
> place to call fsfreeze through a virtio-fsfreeze guest driver without
> having to spend time into worrying about the reliability of the
> guest-agent feature. It'd surely also waste less memory in the guest
> (not that the agent takes much memory but a few kbytes of .text of a
> kernel module for this surely would takes a fraction of the mlocked
> RAM the agent would take, the RAM saving is the least interesting
> part of course).
>
> If there was no hypervisor behind the kernel, it could only be the
> userland starting a fsfreeze, so we shouldn't be fooled into thinking
> userland is the best place where to start a fsfreeze invocation, it's
> most certainly not, but on the host (without virt) there's no other
> thing that could possibly ask for it. But here we have an hypervisor
> behind the guest kernel that asks for it, so starting the fsfreeze
> through a virtio-fsfreeze.ko kernel module loaded into the guest
> kernel (or linked into the guest kernel) sounds a cleaner and more
> reliable solution (maybe simpler too).
>
> I'd be certainly a more friendly solution for developers to test or
> run it, libvirt would talk only with qemu, and qemu would only talk
> with the guest kernel without requiring any modification to the guest
> userland. My feeling is that usually what feels much simpler to use
> for developers tends to be a better solution (not guaranteed) and to
> me a virtio-fsfreeze.ko solution would look much simpler to use.
>
> There are drawbacks, like the fact respinning an update to the
> fsfreeze code, would then require an upgrade of the guest kernel,
> instead of a package update. But there are avantages too in terms of
> coverage, as an updated kernel would also run on top of an older guest
> userland that may not have a agent package to install through a
> repository.
>
> In any case if the virtio-fsfreeze.ko doesn't register into qemu
> virtio-fsfreeze backend, the qemu monitor command should still just
> work and allow snapshotting by just only blocking all I/O, that is
> more than enough for a not-buggy guest capable of fault tolerance
> against power loss.
>
> I understand an agent may be needed for other features but I think
> whenever a feature is better suited for not requiring userland guest
> support, it shouldn't. To me requiring modifications to the guest
> userland, looks the least transparent and most intrusive possible way
> to implement a libvirt feature so it should be used when it has
> advantages and I see mostly disadvantages here.
>

One thing worth mentioning is that the current host-side interface to 
the guest agent is not what we're hoping to build libvirt interfaces 
around. It's a standalone, out-of-band tool for now, but when QMP is 
converted to QAPI the guest agent interfaces will be exposed to the host 
transparently to the host as normal QMP commands. libvirt should be able 
to tell the difference from a guest-agent induced fsfreeze or a guest 
kernel induced fsfreeze (except perhaps to identify extended 
capabilities in a particular case):

http://wiki.qemu.org/Features/QAPI/GuestAgent

Another thing to note is that snapshotting is not necessarily something 
that should be completely transparent to the guest. One of the planned 
future features for the guest agent (mentioned in the snapshot wiki, and 
a common use case that I've seen come up elsewhere as well in the 
context of database applications), is a way for userspace applications 
to register callbacks to be made in the event of a freeze (dumping 
application-managed caches to disk and things along that line). The 
implementation of this would likely be a directory where application can 
place scripts in that get called in the event of a freeze, something 
that would require a user-space daemon anyway.

Also, in terms of supporting older guests, the proposed guest tools ISO 
(akin to virtualbox/vmware guest tools):

http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02239.html

would give us a distribution channel that doesn't require any 
involvement from distro maintainers. A distro-package to boot strap the 
agent would be still be preferable, but the ISO approach seems to work 
well in practice. And for managed environments getting custom packages 
installed generally isn't as much of a problem as requiring reboots or 
kernel changes.

> This is just a suggestions, I think the agent should work too.
>
> Thanks a lot,
> Andrea
>

next prev parent reply	other threads:[~2011-07-27 16:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-27 15:24 [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel Andrea Arcangeli
2011-07-27 16:07 ` Michael Roth [this message]
2011-07-27 16:40   ` Andrea Arcangeli
2011-07-28  8:54     ` Jes Sorensen
2011-07-28 15:26       ` Michael Roth
2011-07-27 16:34 ` Anthony Liguori
2011-07-27 16:50   ` Andrea Arcangeli
2011-07-27 18:36   ` Christoph Hellwig
2011-07-27 19:47     ` Andrea Arcangeli
2011-07-28  8:56     ` Jes Sorensen
2011-07-28  2:53 ` Fernando Luis Vázquez Cao
2011-07-28  8:03   ` Andrea Arcangeli
2011-07-28 15:11     ` Michael Roth
2011-07-29  0:29       ` Fernando Luis Vazquez Cao
2011-08-07 18:28 ` Ronen Hod
2011-08-08 13:26   ` Luiz Capitulino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E3037B1.7070301@linux.vnet.ibm.com \
    --to=mdroth@linux.vnet.ibm.com \
    --cc=Jes.Sorensen@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).