From mboxrd@z Thu Jan  1 00:00:00 1970
From: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
Subject: Re: [PATCH 8/9] fsfreeze: add vfs ioctl to check freeze state
Date: Wed, 10 Oct 2012 11:17:25 +0900
Message-ID: <5074DAB5.6080504@lab.ntt.co.jp>
References: <1349414653.7347.2.camel@nexus.lab.ntt.co.jp> <1349415809.7347.15.camel@nexus.lab.ntt.co.jp> <20121008150516.GE9243@quack.suse.cz> <5073F272.3080103@lab.ntt.co.jp> <20121009145554.GD15790@quack.suse.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Josef Bacik <jbacik@fusionio.com>,
	Eric Sandeen <sandeen@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
To: Jan Kara <jack@suse.cz>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from tama500.ecl.ntt.co.jp ([129.60.39.148]:59296 "EHLO
	tama500.ecl.ntt.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751569Ab2JJCRx (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 9 Oct 2012 22:17:53 -0400
In-Reply-To: <20121009145554.GD15790@quack.suse.cz>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 2012/10/09 23:55, Jan Kara wrote:
> On Tue 09-10-12 18:46:26, Fernando Luis Vazquez Cao wrote:
>> Regarding your concern about the ioctl approach, when a frozen
>> filesystem is detached from the namespace it can still be reached
>> through the block device it is sitting on (well... with the exception
>> of btrfs which has some issues that I am working on) and this is the
>> reason I added a block device level check ioctl too. That said, if one
>> day we have a filesystem which is not block device based and supports
>> fsfreeze (ioctl_fsfreeze() returns -EOPNOTSUPP if the superblock has
>> no ->freeze_fs operation, which is the case for all virtual
>> filesystems and NAS drivers that we have) the two check ioctls would
>> not cover that case.
>    In principle, there are filesystems which operate e.g. on MTD and thus do
> not have a block device. So far none of these seem to support freezing but
> in principle there's no reason they couldn't. And for these filesystems your
> ioctls won't help.

Such devices fall in category 1) below, which means they would
be automatically thawed on umount if frozen. As long as the
filesystem remains mounted the check ioctl can be used.


>> I think that to cover all cases without adding a completely new API we
>> need to do the following:
>>
>> 1) Filesystems which are not tied to a block device (virtual
>>    filesystems, NAS, etc):
>>
>>    As soon as the filesystem is removed from the namespace the
>>    superblock based fsfreeze ioctls become useless; if we let a umount
>>    of a frozen filesystem succeed we would not be able to thaw it (well
>>    we could use emergency thaw but it would be overkill). Since we do
>    Actually, you can always mount the filesystem again (you will essentially
> just attach the superblock to the namespace again) and thaw the filesystem.
> So this is not a big issue.

The problem is that we may generate write I/O during the second
mount. We would need to audit all filesystems (which I am fine
with if there is a sensible use case).


>>    not want to break lazy umounts the only viable solution is thawing
>>    the superblock automatically on umount (releasing the active
>>    reference taken in freeze_super() to be more precise).
>    I'm not against this. As you write below, you cannot really thaw
> freeze coming via block device so you end up with somewhat inconsistent
> behavior (thaw only freezes by ioctl) but after all freeze of a filesystem
> and freeze of a block device *are* somewhat different requests so the
> inconsistency can be justified.
>
> Do I get right that when we do this, you won't need ioctls for querying the
> freeze state?

I would still want the check ioctls. For example, in some cases the
freeze/unfreeze process is controlled by a daemon which can die
and with the current API there is no way to check what state
filesystems where left in (well, we have emergency thaw but thaw
unfreezes all filesystems which may not be what we want, i.e. overkill).
I have heard a lot of complaints about this from users.

Virtualization is a special case of this where the freeze of a guest
filesystem can be initiated from the hypervisor and carried out by
a guest agent behind the guest's administrator's back.


>> 2) Block device based filesystems:
>>
>>    These can be reached through the block device it is sitting on even
>>    if the filesystem was detached from the namespace and have the
>>    particularity that they can be frozen using two different APIs, a
>>    block device level one and the ioctls. When a filesytem was frozen
>>    using the former, which only has in-kernel users such as dm,
>>    automatically thawing the filesystem on umount is arguably too rude
>>    (we can end up breaking the filesystem level consistency of a
>>    storage snapshot). It we care about this, we could modify
>>    sys_umount() so that filesystem is automatically thawed if and only
>>    if there are no block device level freezes active. This behavior
>>    would be consistent with case 1) above (the premise here is that
>>    both fsfreeze and umount are userspace controlled operations and the
>>    administrator should know what it is doing) and is the less likely
>>    to cause surprises to freeze_bdev() users.
>>
>>    It would also be nice to have a block device level thaw ioctl for
>>    emergency cases (for example, a scenario where thaw_bdev() was not
>>    called and the freeze counter was left in a inconsistent state;
>>    freeze_bdev() and thaw_bdev() are exported symbols and in many cases
>>    we cannot control what external modules do).
>    Umm, I don't know. I'd rather forbid thawing via ioctl when the device is
> frozen via block device so that should solve possible issues caused by
> buggy userspace and the rest is a kernel bug - emergency thaw is for
> that...

That is an approach I myself considered and that I would be ok
with. I guess I will implement both and let Al decide.

Thanks,
Fernando