From: Andrea Arcangeli <andrea@suse.de>
To: Pavel Machek <pavel@suse.cz>
Cc: Andrew Morton <akpm@digeo.com>,
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>,
linux-kernel@vger.kernel.org
Subject: Re: 2.0, 2.2, 2.4, 2.5: fsync buffer race
Date: Mon, 10 Feb 2003 14:07:04 +0100 [thread overview]
Message-ID: <20030210130704.GO31401@dualathlon.random> (raw)
In-Reply-To: <20030204231652.GC128@elf.ucw.cz>
On Wed, Feb 05, 2003 at 12:16:53AM +0100, Pavel Machek wrote:
> Hi!
>
> > > there's a race condition in filesystem
> > >
> > > let's have a two inodes that are placed in the same buffer.
> > >
> > > call fsync on inode 1
> > > it goes down to ext2_update_inode [update == 1]
> > > it calls ll_rw_block at the end
> > > ll_rw_block starts to write buffer
> > > ext2_update_inode waits on buffer
> > >
> > > while the buffer is writing, another process calls fsync on inode 2
> > > it goes again to ext2_update_inode
> > > it calls ll_rw_block
> > > ll_rw_block sees buffer locked and exits immediatelly
> > > ext2_update_inode waits for buffer
> > > the first write finished, ext2_update_inode exits and changes made by
> > > second proces to inode 2 ARE NOT WRITTEN TO DISK.
> > >
> >
> > hmm, yes. This is a general weakness in the ll_rw_block() interface. It is
> > not suitable for data-integrity writeouts, as you've pointed out.
> >
> > A suitable fix would be do create a new
> >
> > void wait_and_rw_block(...)
> > {
> > wait_on_buffer(bh);
> > ll_rw_block(...);
> > }
> >
> > and go use that in all the appropriate places.
> >
> > I shall make that change for 2.5, thanks.
>
> Should this be fixed at least in 2.4, too? It seems pretty serious for
> mail servers (etc)...
actually the lowlevel currently is supposed to take care of that if it
writes directly with ll_rw_block like fsync_buffers_list takes care of
it before calling ll_rw_block. But the whole point is that normally the
write_inode path only marks the buffer dirty, it never writes directly,
and no dirty bitflag can be lost and we flush those dirty buffers right
with the proper wait_on_buffer before ll_rw_block. So I don't think it's
happening really in 2.4, at least w/o mounting the fs with -osync.
But I thought about about Mikulas suggestion of adding lock_buffer
in place of the test and set bit. This looks very appealing. the main
reason we didn't do that while fixing fsync_buffers_list a few months
ago in 2.4.20 or so, is been that it is a very risky change, I mean, I'm
feeling like it'll break something subtle.
In theory the only thing those bitflags controls are the coherency of
the buffer cache here, the rest is serialized by design at the
highlevel. And the buffer_cache should be ok with the lock_buffer(),
since we'll see the buffer update and we'll skip the write if we race
with other reads (we'll sleep in lock_buffer rather than in
wait_on_buffer). For the writes we'll overwrite the data one more time
(the feature incidentally ;). so it sounds safe. Performance wise it
shouldn't matter, for read it can't matter infact.
So I guess it's ok to make that change, then we could even move the
wait_on_buffer from fsync_buffers_list to the xfs buffer_delay path of
write_buffer. I'm tempted to make this change for next -aa. I feel it
makes more sense and it's a better fix even if if risky. Can anybody see
a problem with that?
Andrea
next prev parent reply other threads:[~2003-02-10 12:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-02-02 23:32 2.0, 2.2, 2.4, 2.5: fsync buffer race Mikulas Patocka
2003-02-03 0:00 ` Andrew Morton
2003-02-03 1:13 ` Mikulas Patocka
2003-02-03 1:20 ` Andrew Morton
2003-02-03 9:29 ` Mikulas Patocka
2003-02-04 23:16 ` Pavel Machek
2003-02-05 15:13 ` Mikulas Patocka
2003-02-10 13:07 ` Andrea Arcangeli [this message]
2003-02-10 16:28 ` Mikulas Patocka
2003-02-10 16:57 ` Linus Torvalds
2003-02-10 20:40 ` Andrew Morton
2003-02-10 21:18 ` Andrea Arcangeli
2003-02-10 21:44 ` Andrew Morton
2003-02-10 21:59 ` Andrea Arcangeli
2003-03-11 13:58 ` Andrea Arcangeli
2003-03-14 6:42 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030210130704.GO31401@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@digeo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mikulas@artax.karlin.mff.cuni.cz \
--cc=pavel@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox