From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Ben Myers <bpm@sgi.com>,
xfs@oss.sgi.com, Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH 4/4] xfs: fix the logspace waiting algorithm
Date: Mon, 5 Dec 2011 13:53:16 +1100 [thread overview]
Message-ID: <20111205025316.GD7046@dastard> (raw)
In-Reply-To: <20111202115141.GA21643@infradead.org>
On Fri, Dec 02, 2011 at 06:51:41AM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 01:51:28PM -0600, Ben Myers wrote:
> > Process A reads from the grant reserve head at 2641 (and there currently is enough space)
> > Process B wakes at either 2646 or 2650, in xlog_reserveq_wait, locks, and reads from the grant reserve head (and currently there is enough space)
> > Process B removes itself from the list
> > Process A reads from the reservq list and finds it to be empty
> > Process A finds that there was enough space at 2646
> > Process B returns from xlog_reserveq_wait, unlocks, grants space at 2656, returns
> > Process A grants log space at 2656, and returns
> >
> > AFAICS there is nothing that prevents these guys from granting the same
> > space when you approach free_bytes >= need_bytes concurrently.
> >
> > This lockless stuff is always a mind job for me. I'll take another look at
> > some of the other aspects of the patch. Even if it doesn't resolve my
> > question about the lockless issue, it seems to resolve Chandra's race.
>
> Indeed, I think we have this race. Then again I I think we had
> exactly the same one before, too.
What the current code does is this (ignoring this patch):
again:
check free space
if (no free space) {
add to list;
sleep;
space becomes available
wake up
goto again;
}
remove from list
grant space
do {
grant calc
} while (head does not move)
The problem being talked about, as I understand it, is that free
space can change between the check and the grant, resulting in
process B also seeing free space because the space process A thinks
it has but has not yet accounted for it.
It seems to me that this can happen regardless of whether we have a
queue of waiters or not, so the change to how the waiters are woken
in Christoph's patch is irrelevant.
> The only way to fix it would be
> to do a sort of double cmpxchg that only moves the grant head forward
> if it's still in available space vs the tails lsn.
Actually, it's quite simple to fix simply by having the free space
check return the grant head value at the time of the check and then
using that as the old value in the cmpxchg loop for updating the
grant head. That is:
again:
free = xlog_space_left(..... &old_head);
......
if (xlog_grant_add_space(log, l_grant_reserve_head, need_bytes, old_head)
goto again;
and xlog_grant_log_space() returns 1 if the old_head does not match
the current head so we go araound again.
I don't think we don't need to do this for the write head update in
xlog_grant_log_space() because once we have been granted reserve
head space we are guarantee to have write head space.
We do, however, need to make the same change for the write head in
xlog_regrant_write_log_space() as that has potential for the same
race when two permanent transactions commit and race to re-reserve
the write space that was just consumed by the committed transaction.
And to retain existing behaviour for other callers, perhaps an
old_head value of -1 (NULL LSN value) can be passed to indicate that
current head value should be used and to loop until the change is
made....
Does that make sense? It retains the same lockless behaviour, but
closes the race condition of the grant head changing from the time
of read to the time of actual modification....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-12-05 2:53 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-28 8:17 [PATCH 0/4] xfs fixes for Linux 3.2-rc3 Christoph Hellwig
2011-11-28 8:17 ` [PATCH 1/4] xfs: fix attr2 vs large data fork assert Christoph Hellwig
2011-11-28 18:01 ` Ben Myers
2011-11-28 18:03 ` Christoph Hellwig
2011-11-28 8:17 ` [PATCH 2/4] xfs: validate acl count Christoph Hellwig
2011-11-28 8:17 ` [PATCH 3/4] xfs: force buffer writeback before blocking on the ilock in inode reclaim Christoph Hellwig
2011-11-28 8:17 ` [PATCH 4/4] xfs: fix the logspace waiting algorithm Christoph Hellwig
2011-11-30 23:56 ` Ben Myers
2011-12-01 7:21 ` Christoph Hellwig
2011-12-01 19:51 ` Ben Myers
2011-12-02 11:51 ` Christoph Hellwig
2011-12-05 2:53 ` Dave Chinner [this message]
2011-12-05 9:05 ` Christoph Hellwig
2011-12-01 18:32 ` Ben Myers
2011-12-01 20:43 ` Chandra Seetharaman
2011-12-01 19:28 ` Chandra Seetharaman
2011-12-02 11:16 ` Christoph Hellwig
2011-12-02 16:02 ` Chandra Seetharaman
2011-11-29 19:23 ` [PATCH 0/4] xfs fixes for Linux 3.2-rc3 Ben Myers
2011-11-30 8:54 ` Christoph Hellwig
2011-11-30 8:58 ` [PATCH 5/4] xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels Christoph Hellwig
2011-12-02 16:07 ` Ben Myers
2011-12-02 17:29 ` Christoph Hellwig
2011-12-06 16:39 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111205025316.GD7046@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=dchinner@redhat.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox