linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: Surbhi Palande <csurbhi@gmail.com>,
	Kamal Mostafa <kamal@canonical.com>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <dchinner@redhat.com>,
	Al Viro <viro@ZenIV.linux.org.uk>
Subject: [RFC] How to fix broken freezing?
Date: Fri, 6 Jan 2012 15:09:31 +0100	[thread overview]
Message-ID: <20120106140931.GD20291@quack.suse.cz> (raw)

  Hello,

  I was looking at what causes filesystem to have dirty data after it is
frozen. After some thought I realized freezing code is inherently racy and
all filesystems (ext3, ext4, xfs) can have dirty data on frozen filesystem.

The race is basically following:
	Task 1					Task 2
freeze_super()				__generic_file_aio_write()
  ...					  vfs_check_frozen(sb, SB_FREEZE_WRITE)
  sb->s_frozen = SB_FREEZE_WRITE;
  sync_filesystem(sb);
					  do the write
					    /* Here we create dirty data
					     * which is left on frozen fs */
  sb->s_frozen = SB_FREEZE_TRANS;
  ...
  ->freeze_fs()

The problem is that you can never make checking for frozen filesystem
race-free with the current s_frozen scheme - the filesystem can always be
frozen the instant after you check for it and you end up creating dirty
data on frozen filesystem.

The question is what to do with this problem. I outline the possibilities
that come to my mind below:
1) Ignore the problem - depending on the exact fs details this could lead to
   fs snapshot being corrupted, also flusher thread can hang on the frozen
   filesystem (e.g. because of sync(1)) creating all sorts of secondary
   issues. So I don't think this is really an option.
2) Have a rwlock in the superblock that is held for writing while
   filesystem freezing is in progress and held for reading by the filesystem
   while a transaction is running except for transactions that are required
   to do writeback. This is kind of ugly but at least for ext3/4 relatively
   easy to implement.
3) Have the same rwlock but already VFS will take the lock in kernel entry
   points which modify a filesystem. Lot of these places is already guarded
   by mnt_want_write/mnt_drop_write pair so we could hook into it but there
   are entry points which use file descriptor and thus are not guarded by
   mnt_want_write/mnt_drop_write so we would have to modify these places.
   Note that this in particular also means ioctl calls and such so it won't
   be trivial to catch all the places. This approach looks the cleanest to
   me but it's quite some work and it's a bit fragile - requires all people
   adding an entry point modifying filesystem to think of fs freezing.

What do people think about this? Any idea other idea how to solve the
problem?

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

             reply	other threads:[~2012-01-06 14:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-06 14:09 Jan Kara [this message]
2012-01-11 21:53 ` [RFC] How to fix broken freezing? Eric Sandeen
2012-01-11 22:36   ` Dave Chinner
2012-01-12  1:09     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120106140931.GD20291@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=csurbhi@gmail.com \
    --cc=dchinner@redhat.com \
    --cc=hch@infradead.org \
    --cc=kamal@canonical.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).