All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@openvz.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount
Date: Mon, 25 Jan 2010 02:01:17 +0300	[thread overview]
Message-ID: <87my03w1j6.fsf@openvz.org> (raw)
In-Reply-To: <20100124213707.GY19799@ZenIV.linux.org.uk> (Al Viro's message of "Sun, 24 Jan 2010 21:37:07 +0000")

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Mon, Jan 25, 2010 at 12:15:51AM +0300, Dmitry Monakhov wrote:
>
>> > It's not a solution.  You get an _attempted_ remount ro making writes
>> > fail, even if it's going to be unsuccessful.  No go...
>> We have two options for new writers:
>> 1) Fail it via -EROFS
>>    Yes, remount may fail, but it is really unlikely.
>> 2) Defer(block) new writers on until we complete or fail remount
>>    for example like follows. Do you like second solution ?
>
> Umm...  I wonder what the locking implications would be...  Frankly,
> I suspect that what we really want is this:
> 	* per-superblock write count of some kind, bumped when we decide
> that writeback is inevitable and dropped when we are done with it (the
> same thing goes for async part of unlink(), etc.)
> 	* fs_may_remount_ro() checking that write count
> So basically we try to push those short-term writers to completion and
> if new ones had come while we'd been doing that (or some are really
> stuck) we fail remount with -EBUSY.
>
> As a short-term solution the second patch would do probably (-stable and .33),
> but in the next cycle I'd rather see something addressing the real problem.
> fs_may_remount_ro() in its current form is really broken by design - it
> should not scan any lists (which is where your race comes from, BTW)
This is not actually true. The race happens not only because
fs_may_remount_ro() is not atomic, but because we have two stages
1) fs_may_remount_ro()
2) sync_filesystem()
Even when we make first stage atomic, we still have race between
second stage and new writers.
BTW: Your idea about per-sb counter may be useful here but
it must be not reference count, but it may be used like i_version
For example:
mnt_want_write()
{
   mnt->mnt_sb->s_wr_count++;
}
mnt_drop_write()
{
   mnt->mnt_sb->s_wr_count++;
}
do_remount_sb {
    cur = mnt->mnt_sb->s_wr_count;
    if (fs_may_remount_ro())
         return -EBUSY;
    sync_filesystem()
    if (cur != mnt->mnt_sb->s_wr_count)
         return -EBUSY;
}



      parent reply	other threads:[~2010-01-24 23:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-24 11:41 [PATCH] fs: fix filesystem_sync vs write race on rw=>ro remount Dmitry Monakhov
2010-01-24 11:50 ` Dmitry Monakhov
2010-01-24 19:53 ` Al Viro
2010-01-24 21:15   ` Dmitry Monakhov
2010-01-24 21:37     ` Al Viro
2010-01-24 22:40       ` Dave Chinner
2010-02-09 15:28         ` Jan Kara
2010-01-24 23:01       ` Dmitry Monakhov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87my03w1j6.fsf@openvz.org \
    --to=dmonakhov@openvz.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.