From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, Al Viro <viro@ZenIV.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
dchinner@redhat.com, sandeen@redhat.com,
Kamal Mostafa <kamal@canonical.com>, Ben Myers <bpm@sgi.com>,
Alex Elder <elder@kernel.org>,
xfs@oss.sgi.com
Subject: Re: [PATCH 11/19] xfs: Convert to new freezing code
Date: Tue, 13 Mar 2012 10:48:17 +1100 [thread overview]
Message-ID: <20120312234817.GA5091@dastard> (raw)
In-Reply-To: <20120312175551.GI5998@quack.suse.cz>
On Mon, Mar 12, 2012 at 06:55:51PM +0100, Jan Kara wrote:
> On Mon 12-03-12 09:45:37, Dave Chinner wrote:
> > On Fri, Mar 09, 2012 at 03:22:53PM +0100, Jan Kara wrote:
> > It will do work as other things push inodes into reclaim. e.g.
> > filestreams inode expiry. And it will still run to reclaim clean
> > inodes...
> Sure. But reclaim will start a transaction only if inode is dirty. And
> inode must not be dirty on a frozen filesystem.
That's what I've been trying to tell you - even clean inodes can be
dirtied in reclaim on XFS, so ensuring the FS is clean is not a good
enough guarantee. And not just through the shrinkers - just closing
a file descriptor can trigger truncation as well via .release. See
xfs_release:
/* If this is a read-only mount, don't do this (would generate I/O) */
if (mp->m_flags & XFS_MOUNT_RDONLY)
goto out;
> > > xfs_flush_worker() - everything is clean so no IO should happen
> > > xfs_sync_worker() - again the same if I understand the code right
> >
> > xfs_sync_worker() will always trigger a log force, so if there is
> > anything that has dirtied the journal, it will trigger IO. We have
> > no protection against the journal being dirtied anymore, so no
> > guarantees can be given here.
> Yes, it would be a bug if something dirtied a journal while the
> filesystem is frozen. To catch issues like this, I've added WARN_ON in
> xfs_trans_alloc() to actually warn when transaction would be started on a
> frozen filesystem. My testing never triggered it with the latest patch set.
Just hold a file descriptor open for a large write over the freeze
(don't get it caught in the freeze), and then close it after the
filesystem is frozen. Perhaps even drop the slab caches.
> > Basically, your patchset creates a "shell" around the outside of the
> > filesystem that catches all the external modifications that can
> > occur through the VFS and ioctls. The "shell" is now the only layer
> > of defense because the patchset removes the layer of protection that
> > prevents internal modifications from occurring.
> Yes. I'd just note that the "shell" is there already to provide reliable
> remounting read only.
But it doesn't provide reliable read-only behaviour - internal
filesystem triggers still need to check and disarm to make read-only
mounts reliable. Indeed, XFS has many internal checks for read only
mounts, and the places we are talking about here are similar to the
places where you've removed freeze protection from by gutting the
transaction level freezing.
> I just had to change some properties of the shell to
> be usable for freezing as well. But the shell has to be maintained
> regardless of how we decide to do freezing.
>
> Also believe me that I've initially tried to fix the freezing without the
> external shell. But it just isn't enough to prevent dirty inodes from being
> created (e.g. from directory operations) while sync during freezing is
> running.
Sure - the old mechanism was "freeze data" which only protected data
modification paths, followed by "freeze trans" which protected
metadata modification paths (like directory operations). You started
by removing the metadata modification path protection - it is any
wonder that those operations continued to dirty inodes until you
expanded the "freeze data" shell to mean "freeze data and metadata"?
> Sure I could keep the freezing mechanism on transaction start. But it
> seemed like a cleaner approach to me to protect all the places properly
> with the generic mechanism than having two mechanisms and unclear
> (especially locking) interactions between them.
Once again, that's a view that does not take into account that
modifcations don't all come through the VFS.
It's like the hard shelli/soft center security model - it protects
well from external attackers, but once they get inside, there's no
protection. Indeed, there is zero protection from inside jobs, and
thats where multiple layers of defense are needed.
Your freeze changes provide us with a hard outer shell, but it's got
a really, really soft center. We can't use the outer shell defenses
deep inside the shell because of the locking orders that have been
defined (freeze protection must be outermost), so we need another
layer....
> But in the end, if you guys really feel strongly about it and decide that
> XFS wants to keep it's mechanism of freezing on transaction start, then I
> won't stop you. Although it would mean XFS would have to have three
> counters to protect freezing while other filesystems will need only two.
There is two counters only to avoid lockdep problems because the
different entry points to the outer shell have different locking
orders. I fail to see why adding a third counter to provide an
innermost freeze lock that retains the current level of protection
is a big deal because all that it really needs to work is a
different lock order.
Also, ext4 could retain it's current outer/inner protection, so I
don't see that this behaviour is XFS specific.....
> Anyway I'm confident enough to remove that mechanism from ext3/4 and ocfs2
> but I don't know XFS internals enough to really argue...
... especially given the reported ext4 regression. I have much less
confidence than you that the outer shell is sufficient for ext4 or
any other complex filesystem.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-03-12 23:48 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-05 16:00 [PATCH 00/19] Fix filesystem freezing deadlocks Jan Kara
2012-03-05 16:00 ` [PATCH 01/19] mm: Make default vm_ops provide ->page_mkwrite handler Jan Kara
2012-03-23 22:45 ` Andrew Morton
2012-03-27 7:55 ` Jan Kara
2012-03-27 21:38 ` Andrew Morton
2012-03-27 22:08 ` Jan Kara
2012-03-27 22:45 ` Dave Chinner
2012-03-28 9:48 ` Jan Kara
2012-03-05 16:01 ` [PATCH 02/19] fs: Push mnt_want_write() outside of i_mutex Jan Kara
2012-03-05 16:01 ` [PATCH 03/19] fat: " Jan Kara
2012-03-05 16:01 ` [PATCH 04/19] btrfs: " Jan Kara
2012-03-05 16:01 ` [PATCH 05/19] nfsd: " Jan Kara
2012-03-05 16:01 ` [PATCH 06/19] fs: Improve filesystem freezing handling Jan Kara
2012-03-05 16:01 ` [PATCH 07/19] fs: Add freezing handling to mnt_want_write() / mnt_drop_write() Jan Kara
2012-03-05 16:01 ` [PATCH 08/19] fs: Skip atime update on frozen filesystem Jan Kara
2012-03-05 16:01 ` [PATCH 09/19] fs: Protect write paths by sb_start_write - sb_end_write Jan Kara
2012-03-05 16:01 ` [PATCH 10/19] ext4: Convert to new freezing mechanism Jan Kara
2012-03-07 22:32 ` Kamal Mostafa
2012-03-08 9:05 ` Jan Kara
2012-03-05 16:01 ` [PATCH 11/19] xfs: Convert to new freezing code Jan Kara
2012-03-08 23:20 ` Dave Chinner
2012-03-09 8:23 ` Jan Kara
2012-03-09 14:22 ` Jan Kara
2012-03-11 22:45 ` Dave Chinner
2012-03-12 17:55 ` Jan Kara
2012-03-12 23:48 ` Dave Chinner [this message]
2012-03-13 21:30 ` Jan Kara
2012-03-14 3:00 ` Dave Chinner
2012-03-05 16:01 ` [PATCH 12/19] ocfs2: Convert to new freezing mechanism Jan Kara
2012-03-05 16:01 ` [PATCH 13/19] gfs2: " Jan Kara
2012-03-05 16:01 ` [PATCH 14/19] fuse: " Jan Kara
2012-03-05 16:01 ` [PATCH 15/19] ntfs: " Jan Kara
2012-03-05 16:01 ` [PATCH 16/19] nilfs2: " Jan Kara
2012-03-05 16:01 ` [PATCH 17/19] btrfs: " Jan Kara
2012-03-05 16:01 ` [PATCH 18/19] fs: Remove old " Jan Kara
2012-03-05 16:01 ` [PATCH 19/19] fs: Refuse to freeze filesystem with open but unlinked files Jan Kara
2012-03-11 20:22 ` [PATCH 00/19] Fix filesystem freezing deadlocks Kamal Mostafa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120312234817.GA5091@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=dchinner@redhat.com \
--cc=elder@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kamal@canonical.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=viro@ZenIV.linux.org.uk \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox