From: Fernando Luis Vazquez Cao <fernando_b1@lab.ntt.co.jp>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Al Viro <viro@zeniv.linux.org.uk>,
Dave Chinner <dchinner@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state
Date: Fri, 14 Sep 2012 10:46:33 +0900 [thread overview]
Message-ID: <50528C79.9000804@lab.ntt.co.jp> (raw)
In-Reply-To: <20120914001532.GO11511@dastard>
On 2012/09/14 09:15, Dave Chinner wrote:
> On Thu, Sep 13, 2012 at 05:19:21PM +0900, Fernando Luis Vazquez Cao wrote:
>> The problem is that we allow users to umount a frozen
>> filesystem but we have neither a bdev level fsfreeze
>> API to thaw it after that nor check ioctls.
> Yes, I know they don't exist, but you have to justify why they are
> needed. So far you haven't.
>
>> I proposed returning EBUSY when userspace tries to umount a
>> frozen filesystem, but this would break lazy umount, which
>> I was told by Al Viro and Josef Bacik is a no go, so I discarded
>> this approach.
>>
>> I will follow Al's advice a few years ago and do the following:
>> 1- Let userspace umount frozen filesystems.
> Which it already can.
Of course. I just meant that I will leave things as they are in that regard.
>> 2- Provide a bdev level ioctl to unfreeze a umounted filesystem.
> The only time a block device knows about the frozen state of the
> superblock is when dm-snapshot drives the freeze from the block
> device. There are special hooks in, e.g. mount_bdev() for checking
> whether there is an active bdev freeze and this prevents new mounts
> from occurring while a bdev freeze is in progress.
Yes, and I tried to retain that feature in my patch set.
> DM also removes the freeze state itself, so this is not a persistent state.
Unfortunately freeze_bdev and thaw bdev are symbols
exported to modules.
> IOWs, there are two specific freeze types - one a superblock (user)
> level freeze, and the other is a block device (kernel) level freeze.
> What you are proposing here means that the user, in certain
> circumstances, needs to manipulate superblock level freezes from the
> block level because they superblock is no longer visible to the
> user. It's a recipe for confusion and convoluted bugs, and it sure
> as hell won't work for all filesystems. e.g. see 18e9e51 ("Introduce
> freeze_super and thaw_super for the fsfreeze ioctl") as the reason
> why superblock level freezes exist and why trying to thaw from the
> bdev level doesn't work.
I discussed the issue mentioned in that commit with Josef
Bacik and Chris Mason and decided to fix those filesystems
for which it would not work (we need to fix the btrfs issue
anyway).
> Indeed, what happens when the superblock freeze is driven from
> dm-snapshot, and the user unmounts the fs and runs the blockdev
> ioctl to drop the freeze reference that dm-snapshot holds?
> That would free the superblock out from underneath DM, so this is
> a can of worms I'd prefer not to open.
The prototype I wrote for the blockdev ioctl
grabs the bdev->bd_fsfreeze_mutex and checks
bdev->bd_fsfreeze_count. If bd_fsfreeze_count>1
it returns EBUSY, else it calls freeze_super so
that the thawed and subsequently released.
I hope this addresses your concerns. That said
I am not married to this aproach, if we decide
that filesystems should be thawed on
umount (which I was told is a no-go in a previous
LSF) I am willing to implement it.
>> I will also:
>> 3- add the check ioctls so that users can check whether a
>> filesystem is frozen or not and avoid surprises.
> Which only solves the problem for users that know they have to check
> the state in certain corner cases. That doesn't fix the underlying
> problem.
>
> The reason this problem exists is that a active superblock level
> freeze holds a reference to the superblock. This has interesting
> side effects:
>
> $ sudo xfs_freeze -f /mnt/scratch
> $ sudo umount /mnt/scratch
> $ sudo mount /dev/vdc /mnt/scratch
> $ sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> <blocked on frozen state>
> ^Z
> [1]+ Stopped sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> $ sudo xfs_freeze -u /mnt/scratch
> $
> $ fg
> sudo xfs_io -f -c "pwrite 0 64k" /mnt/scratch/foo
> wrote 65536/65536 bytes at offset 0
> 64 KiB, 16 ops; 0.0000 sec (inf EiB/sec and inf ops/sec)
> $
>
> The freeze persists over unmount/mount, but we write to the
> filesystem during unmount, and run log recovery and write stuff to
> the filesystem as part of the normal mount process. IOWs, no
> filesystem checks for SB_UNFROZEN in either the unmount or mount
> path and so we violate the freeze condition by writing to the block
> device. This means that as it stands, an unmount or mount violates
> the frozen state guarantee and unmount/mount effectively imply that
> the filesystem is no longer frozen. Hence silent snapshot image
> corruption is very possible, and the fact that it is silent is very
> worrying.
That is why my original proposal was to return EBUSY
when userspace tries to umount a frozen filesystem.
However this breaks lazy umount and I was told that
auto-thaw on umount is not acceptable, so I came
with the current patch set. If the behaviour implemented
in this patches is considered the right one I guess
filesystems would need to get fixed to check for
SB_UNFROZEN (I need to check whether this is
realistic or not).
> As it is, the requirement that we allow unmounting of frozen
> filesystems implies that an unmount operation must also be a thaw
> operation. The filesystem is no longer visible to the user, and they
> have no method of controlling it's state anymore, so the user has
> said to the kernel "it's all yours now". If the filesystem was
> frozen when the unmount is issued, then the user is saying "I don't
> care about the frozen state anymore". If they did care, then they
> wouldn't be unmounting the filesystem without having first issued a
> thaw operation.
>
> FWIW, while we are unmounting the filesystem, we hold the s_umount lock
> exclusively, so the filesystem cannot be thawed while an unmount is
> in progress. Hence there is no reason why the unmount can't thaw the
> superblock and take away it's reference as part of the unmount.
I was told that behaviour was too harsh during previous
discussions. I do not dislike it personally though. I guess
this is Al's call.
> If we do this, all of the problems with persistent frozen superblock
> state after unmount go away. At this point, there is no need for
> block device level ioctls to clean up a persistent frozen superblock
> becuase they don't exist anymore. That means there is no need for
> ioctls to check the frozen state of the filesystem before unmount,
> either, because there is no problem to avoid....
Check ioctls are still needed IMHO. For example, there
is one issue we run into quite often in virtualization
environments:
We have Linux guest OS where fsfreeze in performed by a
guest agent controlled by the hypervisor. At the behest of
the hypervisor the agent freezes a guest filesystem and, for
whatever reason (a bug for example), the agent dies after
that. The filesystem was frozen without the guest's users
being aware of it, so these users will find that writes to
the filesystem block and they have no way to know
what is going on because we do not have check ioctls.
In such cases I was told by costumers that they would
like to use either emergency thaw (which this patch set
attempts to fix) or a check ioctl.
Thanks,
Fernando
next prev parent reply other threads:[~2012-09-14 1:46 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-12 8:57 [PATCH 0/8 v2] fsfreeze: miscellaneous fixes and cleanups Fernando Luis Vázquez Cao
2012-07-12 9:02 ` [PATCH 1/8] fsfreeze: Prevent emergency thaw from looping infinitely Fernando Luis Vázquez Cao
2012-07-13 12:59 ` Jan Kara
2012-07-17 5:13 ` Fernando Luis Vazquez Cao
2012-07-12 9:04 ` [PATCH 2/8] fsfreeze: emergency thaw will deadlock on s_umount Fernando Luis Vázquez Cao
2012-07-13 13:17 ` Jan Kara
2012-08-09 6:00 ` Fernando Luis Vazquez Cao
2012-09-13 6:23 ` Fernando Luis Vazquez Cao
2012-07-12 9:05 ` [PATCH 3/8] fsfreeze: freeze_super and thaw_bdev don't play well together Fernando Luis Vázquez Cao
2012-07-13 13:45 ` Jan Kara
2012-08-09 9:00 ` Fernando Luis Vazquez Cao
2012-07-12 9:07 ` [PATCH 4/8] fsfreeze: switch to using super methods where possible Fernando Luis Vázquez Cao
2012-07-13 13:50 ` Jan Kara
2012-07-12 9:09 ` [PATCH 5/8] fsfreeze: move emergency thaw code to fs/super.c Fernando Luis Vázquez Cao
2012-07-13 13:53 ` Jan Kara
2012-07-12 9:10 ` [PATCH 6/8] fsfreeze: add vfs ioctl to check freeze state Fernando Luis Vázquez Cao
2012-07-13 13:54 ` Jan Kara
2012-07-15 22:45 ` Dave Chinner
2012-09-13 6:19 ` Fernando Luis Vazquez Cao
2012-09-13 7:18 ` Dave Chinner
2012-09-13 8:19 ` Fernando Luis Vazquez Cao
2012-09-14 0:15 ` Dave Chinner
2012-09-14 1:46 ` Fernando Luis Vazquez Cao [this message]
2012-09-14 6:28 ` Dave Chinner
2012-09-14 8:18 ` Fernando Luis Vazquez Cao
2012-09-13 6:11 ` Fernando Luis Vazquez Cao
2012-07-12 9:11 ` [PATCH 7/8] fsfreeze: add block device " Fernando Luis Vázquez Cao
2012-07-12 9:12 ` [PATCH 8/8] fsfreeze: update Documentation/filesystems/Locking Fernando Luis Vázquez Cao
2012-07-13 14:11 ` Jan Kara
2012-07-17 1:42 ` Fernando Luis Vazquez Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50528C79.9000804@lab.ntt.co.jp \
--to=fernando_b1@lab.ntt.co.jp \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).