From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:47284)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QmSFO-00058D-TF
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 11:11:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QmSFN-000665-39
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 11:11:38 -0400
Received: from e6.ny.us.ibm.com ([32.97.182.146]:34189)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mdroth@linux.vnet.ibm.com>) id 1QmSFM-00062X-Sp
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 11:11:37 -0400
Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233])
	by e6.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p6SElRCO003227
	for <qemu-devel@nongnu.org>; Thu, 28 Jul 2011 10:47:27 -0400
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217])
	by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	p6SFBY47127348
	for <qemu-devel@nongnu.org>; Thu, 28 Jul 2011 11:11:34 -0400
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1])
	by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	p6SBBLcG008519
	for <qemu-devel@nongnu.org>; Thu, 28 Jul 2011 08:11:21 -0300
Message-ID: <4E317C24.3000102@linux.vnet.ibm.com>
Date: Thu, 28 Jul 2011 10:11:32 -0500
From: Michael Roth <mdroth@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <20110727152457.GK18528@redhat.com>
	<1311821631.9256.11.camel@nexus.oss.ntt.co.jp>
	<20110728080313.GE3087@redhat.com>
In-Reply-To: <20110728080313.GE3087@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] RFC: moving fsfreeze support from the userland
 guest agent to the guest kernel
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jes Sorensen <Jes.Sorensen@redhat.com>, =?ISO-8859-1?Q?Fernando_Luis_V=E1zquez_Cao?= <fernando@oss.ntt.co.jp>, qemu-devel@nongnu.org, Luiz Capitulino <lcapitulino@redhat.com>

On 07/28/2011 03:03 AM, Andrea Arcangeli wrote:
> On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis V=E1zquez Cao w=
rote:
>> On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:
>>> making
>>> sure no lib is calling any I/O function to be able to defreeze the
>>> filesystems later, making sure the oom killer or a wrong kill -9
>>> $RANDOM isn't killing the agent by mistake while the I/O is blocked
>>> and the copy is going.
>>
>> Yes with the current API if the agent is killed while the filesystems
>> are frozen we are screwed.
>>
>> I have just submitted patches that implement a new API that should mak=
e
>> the virtualization use case more reliable. Basically, I am adding a ne=
w
>> ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and retur=
ns
>> a file descriptor; as long as that file descriptor is held open, the
>> filesystem remains open. If the freeze file descriptor is closed (be i=
t
>> through a explicit call to close(2) or as part of process exit
>> housekeeping) the associated filesystem is automatically thawed.
>>
>> - fsfreeze: add ioctl to create a fd for freeze control
>>    http://marc.info/?l=3Dlinux-fsdevel&m=3D131175212512290&w=3D2
>> - fsfreeze: add freeze fd ioctls
>>    http://marc.info/?l=3Dlinux-fsdevel&m=3D131175220612341&w=3D2
>
> This is probably how the API should have been implemented originally
> instead of FIFREEZE/FITHAW.
>
> It looks a bit overkill though, I would think it'd be enough to have
> the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by
> closing the file without requiring any of the
> FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases

One of the crappy things about the current implementation is the=20
inability to determine whether or not a filesystem is frozen. At least=20
in the context of guest agent at least, it'd be nice if=20
guest-fsfreeze-status checked the actual system state rather than some=20
internal state that may not necessarily reflect reality (if we freeze,=20
and some other application thaws, we currently still report the state as=20
frozen).

Also in the context of the guest agent, we are indeed screwed if the=20
agent gets killed while in a frozen state, and remain screwed even if=20
it's restarted since we have no way of determining whether or not we're=20
in a frozen state and thus should disable logging operations.

We could check status by looking for a failure from the freeze=20
operation, but if you're just interested in getting the state, having to=20
potentially induce a freeze just to get at the state is really heavy-hand=
ed.

So having an open operation that doesn't force a freeze/thaw/status=20
operation serves some fairly common use cases I think.

> for those if you implemented it, maybe to check if root is stepping on
> its own toes by checking if the fs is already freezed before freezing
> it and returning failure if it is, running ioctl instead of opening
> closing the file isn't necessarily better. At the very least the
> get_user(should_freeze, argp) doesn't seem so necessary, it just
> complicates the ioctl API a bit without much gain, I think it'd be
> cleaner if the FS_FREEZE_FD was the only way to freeze then.
>
> It's certainly a nice reliability improvement and safer API.
>
> Now if you add a file descriptor to epoll/poll that userland can open
> and talk to, to know when a fsfreeze is asked on a certain fs, a
> fsfreeze userland agent (not virt related too) could open it and start
> the scripts if that filesystem is being fsfreezed before calling
> freeze_super().
>
> Then a PARAVIRT_FSFREEZE=3Dy/m driver could just invoke the fsfreeze
> without any dependency on a virt specific guest agent.
>
> Maybe Christoph's right there are filesystems in userland (not sure
> how the storage is related, it's all about filesystems and apps as far
> I can see, and it's all blkdev agnostic) that may make things more
> complicated, but those usually have a kernel backend too (like
> fuse). I may not see the full picture of the filesystem in userland or
> how the storage agent in guest userland relates to this.
>
> If you believe having libvirt talking QMP/QAPI over a virtio-serial
> vmchannel with some virt specific guest userland agent bypassing qemu
> entirely is better, that's ok with me, but there should be a strong
> reason for it because the paravirt_fsfreeze.ko approach with a small
> qemu backend and a qemu monitor command that starts paravirt-fsfreeze
> in guest before going ahead blocking all I/O (to provide backwards
> compatibility and reliable snapshots to guest OS that won't have the
> paravirt fsfreeze too) looks more reliable, more compact and simpler
> to use to me. I'll be surely ok either ways though.
>
> Thanks,
> Andrea