Re: [PATCH RESEND] Minimal fix for private_list handling races

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RESEND] Minimal fix for private_list handling races
Date: Fri, 25 Jan 2008 19:34:07 +1100	[thread overview]
Message-ID: <200801251934.08160.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20080123154818.GD10144@duck.suse.cz>

On Thursday 24 January 2008 02:48, Jan Kara wrote:
> On Thu 24-01-08 02:05:16, Nick Piggin wrote:
> > On Thursday 24 January 2008 00:30, Jan Kara wrote:
> > > On Wed 23-01-08 12:00:02, Nick Piggin wrote:
> > > > On Wednesday 23 January 2008 04:10, Jan Kara wrote:
> > > > >   Hi,
> > > > >
> > > > >   as I got no answer for a week, I'm resending this fix for races
> > > > > in private_list handling. Andrew, do you like them more than the
> > > > > previous version?
> > > >
> > > > FWIW, I reviewed this, and it looks OK although I think some comments
> > > > would be in order.
> > >
> > >   Thanks.
> > >
> > > > What would be really nice is to avoid the use of b_assoc_buffers
> > > > completely in this function like I've attempted (untested). I don't
> > > > know if you'd actually call that an improvement...?
> > >
> > >   I thought about this solution as well. But main issue I had with this
> > > solution is that currently, you nicely submit all the metadata buffers
> > > at once, so that block layer can sort them and write them in nice
> > > order. With the array you submit buffers by 16 (or any other fixed
> > > amount) and in mostly random order... So IMHO fsync would become
> > > measurably slower.
> >
> > Oh, I don't know the filesystems very well... which ones would
> > attach a large number of metadata buffers to the inode?
>
>   This logic is actually used only by a few filesystems - ext2 and UDF are
> probably the most common ones. For example for ext2, the indirect blocks
> are on the list if the file is freshly written, so that is roughly around
> 1MB of metadata per 1GB of data (for 4KB blocks, with 1KB blocks it is 4MB
> per 1GB). Because seeks are expensive, you could really end up with the
> write being 16 times slower when you do it in 16 passes instead of one...

Yeah, that's fair enough I suppose. I wasn't thinking you'd have a
huge newly dirtied file, but it could happen. I don't want to cause
regressions.


> > With that in mind, doesn't your first patch suffer from a race due to
> > exactly this unlocked list_empty check when you are removing clean
> > buffers from the queue?
> >
> >                              if (!buffer_dirty(bh) && !buffer_locked(bh))
> > mark_buffer_dirty()
> > if (list_empty(&bh->b_assoc_buffers))
> >      /* add */
> >                                  __remove_assoc_queue(bh);
> >
> > Which results in the buffer being dirty but not on the ->private_list,
> > doesn't it?
>
>   Hmm, I'm not sure about which patch you speak. Logic with removing clean
> buffers has been in the first version (but there mark_buffer_dirty_inode()
> was written differently).

Ah, yes I see I missed that. I like that a lot better.


> In the current version, we readd buffer to 
> private_list if it is found dirty in the second while loop of
> fsync_buffers() and that should be enough.

Sure, I think there is still a data race though, but if there is one
it's already been there for a long time and nobody cares too much about
those anyway.


> > But let's see... there must be a memory ordering problem here in existing
> > code anyway, because I don't see any barriers. Between b_assoc_buffers
> > and b_state (via buffer_dirty); fsync_buffers_list vs
> > mark_buffer_dirty_inode, right?
>
>   I'm not sure. What exactly to you mean? BTW: spin_lock is a memory
> barrier, isn't it?

In existing code:

mark_buffer_dirty_inode():              fsync_buffers_list():
 test_set_buffer_dirty(bh);              list_del_init(&bh->b_assoc_buffers);
 if (list_empty(&bh->b_assoc_buffers))   if (buffer_dirty(bh)) {
     ...                                   list_add(&bh->b_assoc_buffers, );

These two code sequences can run concurrently because only fsync_buffers_list
takes the lock.

So fsync_buffers_list can speculatively load bh->b_state before
its stores to clear b_assoc_buffers propagate to the CPU running
mark_buffer_dirty_inode.

So if there is a !dirty buffer on the list, then fsync_buffers_list will
remove it from the list, but mark_buffer_dirty_inode won't see it has been
removed from the list and won't re-add it. I think.

This is actually even possible to hit on x86 because they reorder loads
past stores. It needs a smp_mb() before if (buffer_dirty(bh) {}.

Actually I very much dislike testing list entries locklessly, because they
are not trivially atomic operations like single stores... which is another
reason why I like your first patch.

next prev parent reply	other threads:[~2008-01-25  8:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-22 17:10 [PATCH RESEND] Minimal fix for private_list handling races Jan Kara
2008-01-23  1:00 ` Nick Piggin
2008-01-23 13:30   ` Jan Kara
2008-01-23 15:05     ` Nick Piggin
2008-01-23 15:48       ` Jan Kara
2008-01-25  8:34         ` Nick Piggin [this message]
2008-01-25 18:16           ` Jan Kara
2008-01-28 22:16           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801251934.08160.nickpiggin@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox