* deadlocked xfs
@ 2008-07-11 2:33 Eric Sandeen
2008-07-11 2:48 ` Mark Goodwin
0 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 2:33 UTC (permalink / raw)
To: xfs-oss
while running fs_mark:
fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
which runs 4 threads making 20k files,things got stuffed up in the log,
see sysrq-w output at:
http://sandeen.fedorapeople.org/xfs/deadxfs
2 threads down this path:
[<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
[<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
[<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50
this is on a 4 cpu AMD.
I made the fs with:
mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb
and mounted with:
mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test
has this been seen before?
Thanks,
-Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 2:33 deadlocked xfs Eric Sandeen
@ 2008-07-11 2:48 ` Mark Goodwin
2008-07-11 2:56 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Mark Goodwin @ 2008-07-11 2:48 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs-oss
Thanks for the report Eric. This looks very similar to a
deadlock Lachlan recently hit in the patch for
"Use atomics for iclog reference counting"
http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
It seems this patch can cause deadlocks under heavy log traffic.
I don't think anyone has a fix yet ... Lachlan is out this week,
but Tim can follow-up here ...
Cheers
-- Mark
Eric Sandeen wrote:
> while running fs_mark:
>
> fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
>
> which runs 4 threads making 20k files,things got stuffed up in the log,
> see sysrq-w output at:
>
> http://sandeen.fedorapeople.org/xfs/deadxfs
>
> 2 threads down this path:
>
> [<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
> [<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
> [<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50
>
> this is on a 4 cpu AMD.
>
> I made the fs with:
>
> mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb
>
> and mounted with:
>
> mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test
>
> has this been seen before?
>
> Thanks,
>
> -Eric
>
>
--
Mark Goodwin markgw@sgi.com
Engineering Manager for XFS and PCP Phone: +61-3-99631937
SGI Australian Software Group Cell: +61-4-18969583
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 2:48 ` Mark Goodwin
@ 2008-07-11 2:56 ` Eric Sandeen
2008-07-11 3:21 ` Timothy Shimmin
2008-07-11 3:22 ` Dave Chinner
2 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 2:56 UTC (permalink / raw)
To: markgw; +Cc: xfs-oss
Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...
thanks I'll look into that.
Forgot to mention this was on 2.6.26-rc2
-Eric
> Cheers
> -- Mark
>
> Eric Sandeen wrote:
>> while running fs_mark:
>>
>> fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
>>
>> which runs 4 threads making 20k files,things got stuffed up in the log,
>> see sysrq-w output at:
>>
>> http://sandeen.fedorapeople.org/xfs/deadxfs
>>
>> 2 threads down this path:
>>
>> [<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
>> [<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
>> [<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50
>>
>> this is on a 4 cpu AMD.
>>
>> I made the fs with:
>>
>> mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb
>>
>> and mounted with:
>>
>> mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test
>>
>> has this been seen before?
>>
>> Thanks,
>>
>> -Eric
>>
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 2:48 ` Mark Goodwin
2008-07-11 2:56 ` Eric Sandeen
@ 2008-07-11 3:21 ` Timothy Shimmin
2008-07-11 4:04 ` Dave Chinner
2008-07-11 3:22 ` Dave Chinner
2 siblings, 1 reply; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11 3:21 UTC (permalink / raw)
To: Eric Sandeen; +Cc: markgw, xfs-oss
Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...
>
> Cheers
> -- Mark
>
Okay, what Mark is talking about is pv#983925.
Details below from the bug - from Lachlan and me.
I'm sorry that this info didn't go out sooner.
I'm not sure if Lachlan got any further with this but he's
been away.
--Tim
Lachlan wrote:
[....stuff deleted...]
> It's the same signature every time with the current iclog in WANT_SYNC state
> and the rest of the iclogs in DO_CALLBACK state. The current iclog log should
> transition from WANT_SYNC to SYNCING to DONE_SYNC and eventually to ACTIVE so
> the threads above can continue but it's not a happening.
>
> I suspect this mod is the culprit:
>
> mod xfs-linux-melb:xfs-kern:30505a header
> ==========================================
> - SM_Location: longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
> - Workarea: chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs
> - xfs-linux-melb:xfs-kern:30505a 02/15/08
> - PV Incidents affected: 975671
> - Inspected by: tes@sgi.com
> - Description:
> Use atomics for iclog reference counting
>
> Now that we update the log tail LSN less frequently on
> transaction completion, we pass the contention straight to
> the global log state lock (l_iclog_lock) during transaction
> completion.
>
> We currently have to take this lock to decrement the iclog
> reference count. there is a reference count on each iclog,
> so we need to take �he global lock for all refcount changes.
>
> When large numbers of processes are all doing small trnasctions,
> the iclog reference counts will be quite high, and the state change
> that absolutely requires the l_iclog_lock is the except rather than
> the norm.
>
> Change the reference counting on the iclogs to use atomic_inc/dec
> so that we can use atomic_dec_and_lock during transaction completion
> and avoid the need for grabbing the l_iclog_lock for every reference
> count decrement except the one that matters - the last.
> - Files affected: xfsidbg.c 1.343, xfs_log.c 1.347,
> xfs_log_priv.h 1.125
> - Author: dgc
> - xfsidbg.c:
> use correct atomic accessor for the iclog refcount.
> - xfs_log.c:
> Reduce contention on the iclog state lock by using atomic
> reference counters for the iclogs and only grabbing the
> iclog lock on transaction completion when the last reference
> to the iclog is being removed.
> - xfs_log_priv.h:
> change ic_refcount to an atomic_t.
>
>
> It made this change which moved the refcount 'decrement and test' operation
> outside the l_icloglock spinlock and therefore can race.
>
> @@ -2819,18 +2819,21 @@ xlog_state_release_iclog(
> {
> int sync = 0; /* do we sync? */
>
> - spin_lock(&log->l_icloglock);
> + if (iclog->ic_state & XLOG_STATE_IOERROR)
> + return XFS_ERROR(EIO);
> +
> + ASSERT(atomic_read(&iclog->ic_refcnt) > 0);
> + if (!atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock))
> + return 0;
> +
> if (iclog->ic_state & XLOG_STATE_IOERROR) {
> spin_unlock(&log->l_icloglock);
> return XFS_ERROR(EIO);
> }
>
> The above code will race with this code in xlog_state_get_iclog_space():
>
> spin_lock(&log->l_icloglock);
> atomic_inc(&iclog->ic_refcnt); /* prevents sync */
>
> ...
>
> if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
> xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
>
> /* If I'm the only one writing to this iclog, sync it to disk */
> if (atomic_read(&iclog->ic_refcnt) == 1) {
> spin_unlock(&log->l_icloglock);
> if ((error = xlog_state_release_iclog(log, iclog)))
> return error;
> } else {
> atomic_dec(&iclog->ic_refcnt);
> spin_unlock(&log->l_icloglock);
> }
> goto restart;
> }
>
> Previously xlog_state_release_iclog() would wait for the spinlock to be
> released above but will now do the atomic_dec_and_lock() while the above
> code is executing. The above atomic_read() will return 2, the
> atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
> return without syncing the iclog and then the above code will decrement the
> atomic counter. The iclog never gets out of WANT_SYNC state and everything
> gets stuck behind it.
>
>
> ==========================
> ADDITIONAL INFORMATION (ADD)
> From: tes@sgi.com (BugWorks)
> Date: Jul 01 2008 09:59:49PM
> ==========================
>
> Hi,
>
> Okay, I've just been trying to understand what Lachlan was saying
> about the problem and Lachlan kindly gave me an explanation.
>
> Based on that, below is my understanding so I remember what is going
> wrong here ;-) Correct me if I got this wrong.
>
>
> Bad scenario (current code)
> ----------------------------
> Process A has a reference on iclog
> and iclog-refcnt == 1
>
> Process B tries to get some log space,
> and takes lock and bumps count to 2
> Process B tests to see if its refcnt is just 1
> but it isn't so it moves on, _up_ to the refcnt decrement
>
> Process A decides to release its iclog
> and calls xlog_state_release_iclog
> It does a decrement WITHOUT taking lock
> The refcnt goes from 2 to 1 and so it just
> returns without sync'ing.
>
> Process B now does its decrement and takes
> the refcnt from 1 to 0. and releases the lock.
>
>
> Good scenario (with previous code)
> -----------------------------------
> Process A has a reference on iclog
> and iclog-refcnt == 1
>
> Process B tries to get some logspace,
> and takes lock and bumps count to 2.
> Process B tests to see if its refcnt is just 1
> but it isn't so it moves on, _up_ to the refcnt decrement.
>
> Process A decides to release its iclog
> and calls xlog_state_release_iclog.
> However, it can't get the lock yet and so it waits.
>
> Process B now decrements its refcnt from 2
> back to 1 and releases the lock.
>
> Process A now can get its lock and decrements
> the refcnt, finds it goes from 1 to zero and
> can now do the sync'ing.
> -----------------------------------
>
> --Tim
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 2:48 ` Mark Goodwin
2008-07-11 2:56 ` Eric Sandeen
2008-07-11 3:21 ` Timothy Shimmin
@ 2008-07-11 3:22 ` Dave Chinner
2008-07-11 3:50 ` Mark Goodwin
2 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-07-11 3:22 UTC (permalink / raw)
To: Mark Goodwin; +Cc: Eric Sandeen, xfs-oss
On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...
Nice to know - why didn't anyone email me or report this to the
list when the bug was first found? I mean, I wrote that code, I know
what it is supposed to be doing and as a result should be able
help find and fix the bug. Can you please post what details you have
about the problem (test case, stack traces, debugging info, etc)
so I can try to find the problem.
This is a regression that is in the mainline kernel that is due to
be released probably in the next couple of days. Having a little
bit of time to try and find the bug would have been nice...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 3:22 ` Dave Chinner
@ 2008-07-11 3:50 ` Mark Goodwin
2008-07-11 4:02 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Mark Goodwin @ 2008-07-11 3:50 UTC (permalink / raw)
To: Mark Goodwin, Eric Sandeen, xfs-oss
Dave Chinner wrote:
> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>> Thanks for the report Eric. This looks very similar to a
>> deadlock Lachlan recently hit in the patch for
>> "Use atomics for iclog reference counting"
>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>
>> It seems this patch can cause deadlocks under heavy log traffic.
>> I don't think anyone has a fix yet ... Lachlan is out this week,
>> but Tim can follow-up here ...
>
> Nice to know - why didn't anyone email me or report this to the
> list when the bug was first found? I mean, I wrote that code, I know
> what it is supposed to be doing and as a result should be able
Only recently found and didn't think it was this easy to hit.
But no excuses ...
> help find and fix the bug. Can you please post what details you have
> about the problem (test case, stack traces, debugging info, etc)
> so I can try to find the problem.
See Tim's follow-up.
> This is a regression that is in the mainline kernel that is due to
> be released probably in the next couple of days. Having a little
> bit of time to try and find the bug would have been nice...
At this stage, I think it would be safest to back-out the commit,
all the way to mainline. Tim, can you please work thru that today
with priority.
Thanks
--
Mark Goodwin markgw@sgi.com
Engineering Manager for XFS and PCP Phone: +61-3-99631937
SGI Australian Software Group Cell: +61-4-18969583
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 3:50 ` Mark Goodwin
@ 2008-07-11 4:02 ` Eric Sandeen
2008-07-11 4:05 ` Dave Chinner
2008-07-11 4:08 ` Timothy Shimmin
2 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 4:02 UTC (permalink / raw)
To: markgw; +Cc: xfs-oss
Mark Goodwin wrote:
> At this stage, I think it would be safest to back-out the commit,
> all the way to mainline. Tim, can you please work thru that today
> with priority.
how 'bout I test my testcase w/ the patch removed to at least be sure
that's it. :)
Then maybe we can see if Dave, having been alerted to the problem, has a
notion of how to fix it?
-Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 3:21 ` Timothy Shimmin
@ 2008-07-11 4:04 ` Dave Chinner
2008-07-11 4:10 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11 4:04 UTC (permalink / raw)
To: Timothy Shimmin; +Cc: Eric Sandeen, markgw, xfs-oss
On Fri, Jul 11, 2008 at 01:21:39PM +1000, Timothy Shimmin wrote:
> > It made this change which moved the refcount 'decrement and test' operation
> > outside the l_icloglock spinlock and therefore can race.
Just FYI, this is not a 'decrement and test' operation - it's
an 'decrement and return locked if zero' operation. Best to
explain it is this comment in lib/dec_and_lock.c:
/*
* This is an implementation of the notion of "decrement a
* reference count, and return locked if it decremented to zero".
*
* NOTE NOTE NOTE! This is _not_ equivalent to
*
* if (atomic_dec_and_test(&atomic)) {
* spin_lock(&lock);
* return 1;
* }
* return 0;
*
* because the spin-lock and the decrement must be
* "atomic".
*/
> > spin_lock(&log->l_icloglock);
> > atomic_inc(&iclog->ic_refcnt); /* prevents sync */
> >
> > ...
> >
> > if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
> > xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
> >
> > /* If I'm the only one writing to this iclog, sync it to disk */
> > if (atomic_read(&iclog->ic_refcnt) == 1) {
> > spin_unlock(&log->l_icloglock);
> > if ((error = xlog_state_release_iclog(log, iclog)))
> > return error;
> > } else {
> > atomic_dec(&iclog->ic_refcnt);
> > spin_unlock(&log->l_icloglock);
> > }
> > goto restart;
> > }
Ok, so all the increments of ic_refcnt are done under l_icloglock.
That means through this section of code we can not ever have teh reference
count increase. It can decrease, but never to zero because that
requires holding the l_icloglock.
> > Previously xlog_state_release_iclog() would wait for the spinlock to be
> > released above but will now do the atomic_dec_and_lock() while the above
> > code is executing. The above atomic_read() will return 2, the
> > atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
> > return without syncing the iclog and then the above code will decrement the
> > atomic counter. The iclog never gets out of WANT_SYNC state and everything
> > gets stuck behind it.
So what that means is that need the semantics here to be an
atomic 'decrement unless the count is 1' i.e.:
if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
spin_unlock(&log->l_icloglock);
if ((error = xlog_state_release_iclog(log, iclog)))
return error;
} else {
spin_unlock(&log->l_icloglock);
}
This means that the compare + decrement becomes atomic, thereby
closing the race condition where another unlocked thread can
decrement the reference count in between the compare and decrement
here.
Patch for you to test, Eric, below.
(I haven't been able to test it because UML panics on boot right now.
I need to hoover in the unit-at-atime build patch that Jeff Dike
posted this morning)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
Fix reference counting race on log buffers
When we release the iclog, we do an atomic_dec_and_lock to determine
if we are the last reference to enable update of log headers and
writeout. however, in xlog_state_get_iclog_space() we need to check
if we have the last reference count there. if we do, we release the
log buffer, otherwise we decrement the reference count.
The issue is that the compare and decrement in
xlog_state_get_iclog_space() is not atomic, so both places can see a
reference count of 2 and neither will release the iclog. That leads
to a filesystem hang.
Close the hole replacing the compare and decrement with
atomic_add_unless() to ensure that they are executed atomically.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/xfs_log.c | 14 ++++++++++----
1 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 760d543..8033f8a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2425,13 +2425,19 @@ restart:
if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
- /* If I'm the only one writing to this iclog, sync it to disk */
- if (atomic_read(&iclog->ic_refcnt) == 1) {
+ /*
+ * If I'm the only one writing to this iclog, sync it to disk.
+ * We need to do an atomic compare and decrement here to avoid
+ * racing with concurrent atomic_dec_and_lock() calls in
+ * xlog_state_release_iclog() when there is more than one
+ * reference to the iclog.
+ */
+ if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
spin_unlock(&log->l_icloglock);
- if ((error = xlog_state_release_iclog(log, iclog)))
+ error = xlog_state_release_iclog(log, iclog);
+ if (error)
return error;
} else {
- atomic_dec(&iclog->ic_refcnt);
spin_unlock(&log->l_icloglock);
}
goto restart;
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 3:50 ` Mark Goodwin
2008-07-11 4:02 ` Eric Sandeen
@ 2008-07-11 4:05 ` Dave Chinner
2008-07-11 4:08 ` Timothy Shimmin
2 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11 4:05 UTC (permalink / raw)
To: Mark Goodwin; +Cc: Eric Sandeen, xfs-oss
On Fri, Jul 11, 2008 at 01:50:10PM +1000, Mark Goodwin wrote:
>
>
> Dave Chinner wrote:
>> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>>> Thanks for the report Eric. This looks very similar to a
>>> deadlock Lachlan recently hit in the patch for
>>> "Use atomics for iclog reference counting"
>>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>>
>>> It seems this patch can cause deadlocks under heavy log traffic.
>>> I don't think anyone has a fix yet ... Lachlan is out this week,
>>> but Tim can follow-up here ...
>>
>> Nice to know - why didn't anyone email me or report this to the
>> list when the bug was first found? I mean, I wrote that code, I know
>> what it is supposed to be doing and as a result should be able
>
> Only recently found and didn't think it was this easy to hit.
> But no excuses ...
>
>> help find and fix the bug. Can you please post what details you have
>> about the problem (test case, stack traces, debugging info, etc)
>> so I can try to find the problem.
>
> See Tim's follow-up.
>
>> This is a regression that is in the mainline kernel that is due to
>> be released probably in the next couple of days. Having a little
>> bit of time to try and find the bug would have been nice...
>
> At this stage, I think it would be safest to back-out the commit,
> all the way to mainline. Tim, can you please work thru that today
> with priority.
No, do not back it out. I just posted the fix.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 3:50 ` Mark Goodwin
2008-07-11 4:02 ` Eric Sandeen
2008-07-11 4:05 ` Dave Chinner
@ 2008-07-11 4:08 ` Timothy Shimmin
2 siblings, 0 replies; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11 4:08 UTC (permalink / raw)
To: David Chinner; +Cc: markgw, Eric Sandeen, xfs-oss
Mark Goodwin wrote:
>
>
> Dave Chinner wrote:
>> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>>> Thanks for the report Eric. This looks very similar to a
>>> deadlock Lachlan recently hit in the patch for
>>> "Use atomics for iclog reference counting"
>>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>>
>>> It seems this patch can cause deadlocks under heavy log traffic.
>>> I don't think anyone has a fix yet ... Lachlan is out this week,
>>> but Tim can follow-up here ...
>>
>> Nice to know - why didn't anyone email me or report this to the
>> list when the bug was first found? I mean, I wrote that code, I know
>> what it is supposed to be doing and as a result should be able
>
> Only recently found and didn't think it was this easy to hit.
> But no excuses ...
>
>> help find and fix the bug. Can you please post what details you have
>> about the problem (test case, stack traces, debugging info, etc)
>> so I can try to find the problem.
>
> See Tim's follow-up.
>
Remaining info from bug....
> While stress testing I'm frequently hitting a deadlock in the log code where
> many threads are stuck in xlog_state_get_iclog_space().
>
> Stack traceback for pid 16570
> 0xffff8100421fa4c0 16570 4620 0 2 D 0xffff8100421fa810 rm
> rsp rip Function (args)
> 0xffff81003c41d908 0xffffffff8062be32 thread_return
> 0xffff81003c41d9a0 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003c41da10 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003c41db00, 0xffff81007c1d39c0, 0xffff81003c41db08, 0xffff81003c41db0c)
> 0xffff81003c41daa0 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003c41db58, invalid, 0xffff81007c1d39c0, 0xffff81003c41dba0, 0xffff81003c41dce0, 0xffff810000000002)
> 0xffff81003c41db40 0xffffffff80395cb4 xlog_commit_record+0x6e (0xffff81007c97d338, 0xffff81007c1d39c0, 0xffff81003c41dce0, 0xffff81003c41dba0)
> 0xffff81003c41db90 0xffffffff80396c31 xfs_log_done+0x34 (invalid, 0xffff81007c1d39c0, invalid, invalid)
> 0xffff81003c41dbc0 0xffffffff803a1271 _xfs_trans_commit+0x24a (0xffff81007a99c780, invalid, 0x0)
> 0xffff81003c41dd20 0xffffffff8035eb2e xfs_bmap_finish+0xaa (0xffff81003c41de38, 0xffff81003c41ddc8, 0xffff81003c41dde0)
> 0xffff81003c41dd80 0xffffffff8038ae2b xfs_itruncate_finish+0x376 (0xffff81003c41de38, 0xffff81007b9c5670, invalid, invalid, 0x1)
> 0xffff81003c41de20 0xffffffff803ab720 xfs_inactive+0x27d (0xffff81007b9c5670)
> 0xffff81003c41de70 0xffffffff803b663e xfs_fs_clear_inode+0xbd (0xffff810073cf0438)
> 0xffff81003c41de90 0xffffffff802964af clear_inode+0x9c (0xffff810073cf0438)
> 0xffff81003c41dea0 0xffffffff80296598 generic_delete_inode+0x92 (0xffff810073cf0438)
> 0xffff81003c41dec0 0xffffffff8028d953 do_unlinkat+0xdd (invalid, invalid)
> 0xffff81003c41df80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
> Enter <q> to end, <cr> to continue:
> Stack traceback for pid 24088
> 0xffff81003b804a00 24088 4649 0 6 D 0xffff81003b804d50 xfs_io
> rsp rip Function (args)
> 0xffff81003e403288 0xffffffff8062be32 thread_return
> 0xffff81003e403320 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003e403390 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003e403480, 0xffff8100741499c0, 0xffff81003e403488, 0xffff81003e40348c)
> 0xffff81003e403420 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003e4034f8, invalid, 0xffff8100741499c0, 0xffff81003e5b9858, 0x0, 0x0)
> 0xffff81003e4034c0 0xffffffff80395d13 xfs_log_write+0x33 (0xffff81007c97d338)
> 0xffff81003e4034e0 0xffffffff803a1256 _xfs_trans_commit+0x22f (0xffff81003e5b9810, invalid, 0x0)
> 0xffff81003e403640 0xffffffff8038fdba xfs_iomap_write_allocate+0x2eb (0xffff81007b9e7c90, invalid, invalid, 0xffff81003e403788, 0xffff81003e4037b4)
> 0xffff81003e403730 0xffffffff80390dd6 xfs_iomap+0x35a (0xffff81007b9e7c90, 0x19edd000, invalid, invalid, 0xffff81003e403898, 0xffff81003e403804)
> 0xffff81003e4037f0 0xffffffff803acf45 xfs_map_blocks+0x2c (invalid, invalid, invalid, invalid, invalid)
> 0xffff81003e403820 0xffffffff803ae04e xfs_page_state_convert+0x28b (0xffff81007b9e6c18, 0xffffe200028ac0f0, 0xffff81003e403b58, invalid, 0x100000000)
> 0xffff81003e403910 0xffffffff803ae61e xfs_vm_writepage+0xca (0xffffe200028ac0f0, 0xffff81003e403b58)
> 0xffff81003e403940 0xffffffff80260c16 __writepage+0xd (invalid, invalid, 0xffff81007b9e6e00)
> 0xffff81003e403950 0xffffffff8026112c write_cache_pages+0x19e (0xffff81007b9e6e00, 0xffff81003e403b58, 0xffffffff80260c09, 0xffff81007b9e6e00)
> 0xffff81003e403a50 0xffffffff802612a0 do_writepages+0x23 (invalid, 0xffff81003e403b58)
> 0xffff81003e403a60 0xffffffff8029ebca __writeback_single_inode+0x150 (0xffff81007b9e6c18, 0xffff81003e403b58)
> 0xffff81003e403ae0 0xffffffff8029f124 sync_sb_inodes+0x1b4 (0xffff81007c15d3f8, 0xffff81003e403b58)
> 0xffff81003e403b20 0xffffffff8029f406 writeback_inodes+0x81 (0xffff81003e403b58)
> 0xffff81003e403b40 0xffffffff802616f9 balance_dirty_pages_ratelimited_nr+0x15e (invalid, invalid)
> 0xffff81003e403bf0 0xffffffff8025bcb6 generic_file_buffered_write+0x20c (0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0x3c7c8000, 0xffff81003e403e78, invalid, 0x0)
> 0xffff81003e403cf0 0xffffffff803b56b4 xfs_write+0x68a (0xffff81007b58d360, 0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0xffff81003e403e78, invalid)
> 0xffff81003e403df0 0xffffffff8028411f do_sync_write+0xc9 (0xffff810079de9530, 0x51d000, 0x1000, 0xffff81003e403f48)
> 0xffff81003e403f10 0xffffffff8028485f vfs_write+0xad (0xffff810079de9530, 0x51d000, invalid, 0xffff81003e403f48)
> 0xffff81003e403f40 0xffffffff80284e87 sys_pwrite64+0x50 (invalid, 0x51d000, 0x1000, 0x3c7c8000)
> 0xffff81003e403f80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
>
> Taking a look at the log and iclog structures...
>
> [2]kdb> xlog 0xffff81007d33d650
> xlog at 0xffff81007d33d650
> &flush_wait: 0xffff81007d33d6d0 ICLOG: 0xffff81007c193710
> &icloglock: 0xffff81007d33d720 tail_lsn: [58:173319] last_sync_lsn: [58:173319]
> mp: 0xffff81007c97d338 xbuf: 0xffff81007c8e9e08 l_covered_state: need
> flags: log 0x0 <> logBBstart: 0 logsize: 134217728 logBBsize: 262144
> curr_cycle: 58 prev_cycle: 58 curr_block: 213344 prev_block: 213280
> iclog_bak: 0xffff81007d33d830 iclog_size: 0x8000 (32768) num iclogs: 8
> l_iclog_hsize 512 l_iclog_heads 1
> l_sectbb_log 0 l_sectbb_mask 0
> &grant_lock: 0xffff81007d33d7d0 resHeadQ: 0x0000000000000000 wrHeadQ: 0x0000000000000000
> GResCycle: 58 GResBytes: 110263292 GWrCycle: 58 GWrBytes: 110263292
> GResBlocks: 215357 GResRemain: 508 GWrBlocks: 215357 GWrRemain: 508
> trace: 0x0000000000000000 grant_trace: use xlog value
> [2]kdb>
> [2]kdb> xicall 0xffff81007c193710
> xlog_in_core/header at 0xffff81007c193710/0xffffc2001110d000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00033f60
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 212768 num_ops: 92
> cycle_data: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc2001110d200 &forcesema: 0xffff81007c193710 next: 0xffff81007c1932f8 bp: 0xffff81007d102608
> log: 0xffff81007d33d650 callb: 0xffff810077915898 callb_tail: 0xffff81003e43cad8
> size: 32256 (OFFSET: 32240) trace: 0xffff81007aa78560 refcnt: 0 bwritecnt: 0 state: 0x2 <WANT_SYNC >
> =================================================
> xlog_in_core/header at 0xffff81007c1932f8/0xffffc20011116000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00033fa0
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 212832 num_ops: 99
> cycle_data: 7c1d39c0 127c8 121 121 121 121 122 122 123 123 36120100 3412 0 0 0 0 0 0 0 0 69020000 5 41425443 1c84 4 16 155 60 0 1cd9400 1d05600 1d24600 1d52a00 1e3
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc20011116200 &forcesema: 0xffff81007c1932f8 next: 0xffff81007d297610 bp: 0xffff81007d102408
> log: 0xffff81007d33d650 callb: 0xffff81007206b898 callb_tail: 0xffff8100715010f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa781d0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007d297610/0xffffc2001111f000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00033fe0
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 212896 num_ops: 102
> cycle_data: 7c1d39c0 2dd 3d 547a 1f7d 0 24 2000000 2f6 1 11 1 4a 6730 72d9 631a 6d46 db00 0 6240000d 8400000b 9fa00025 d2e00010 c2c00001 3400001 3da00001 85600001 100000
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc2001111f200 &forcesema: 0xffff81007d297610 next: 0xffff81007d296198 bp: 0xffff81007d102208
> log: 0xffff81007d33d650 callb: 0xffff81007aac3548 callb_tail: 0xffff81006cdf2788
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9ef860 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007d296198/0xffffc20011128000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00034020
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 212960 num_ops: 105
> cycle_data: 7aa7bcd8 0 7e00001 4de00001 4d000001 56600001 49400001 5ae00001 9f600001 4f800001 1f03000 0 1 3 5 a 1f 4347 55b4 5e62 644a 716a df1400 e59800 ea0600 efc000 f9
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc20011128200 &forcesema: 0xffff81007d296198 next: 0xffff81007e59cd20 bp: 0xffff81007d102008
> log: 0xffff81007d33d650 callb: 0xffff81006cd5b798 callb_tail: 0xffff81006c4d10f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78680 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007e59cd20/0xffffc20000354000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00034060
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 213024 num_ops: 93
> cycle_data: 7aa7bcd8 0 91400001 46600001 67200001 8800001 49400001 5ae00001 9f600001 4f800001 0 4fe00001 79c00001 99600001 c9c00001 b4e0000d ede0000d 31000017 7580003a 0 5f60
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc20000354200 &forcesema: 0xffff81007e59cd20 next: 0xffff81007e59c4f0 bp: 0xffff81007d102e08
> log: 0xffff81007d33d650 callb: 0xffff81003fd86068 callb_tail: 0xffff81003e4fca58
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9efef0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007e59c4f0/0xffffc2000035d000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a000340a0
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 213088 num_ops: 88
> cycle_data: 7c1d39c0 127c8 121 121 121 121 122 122 123 123 36120100 5d10 6a72 1 3 2030000 ba00001 35600001 3a800001 d6800001 dd200001 e0800001 83000800 6127 6164 58b7 72d
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc2000035d200 &forcesema: 0xffff81007e59c4f0 next: 0xffff81007d5084f0 bp: 0xffff81007d102c08
> log: 0xffff81007d33d650 callb: 0xffff81006a091818 callb_tail: 0xffff81007206b548
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa786e0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007d5084f0/0xffffc20000366000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a000340e0
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 213152 num_ops: 101
> cycle_data: 7aa7bcd8 5f75 6af1 1 3 e359cb09 4 7 11 143 1c 0 1ceba00 1d10800 1d29800 1d5f800 1e53000 1e97000 1ed1800 1f16400 7936448 35 de7a00 e4c800 e9dc00 ee3c00 f7a800
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc20000366200 &forcesema: 0xffff81007d5084f0 next: 0xffff81007e59c908 bp: 0xffff81007d102a08
> log: 0xffff81007d33d650 callb: 0xffff8100778a6da8 callb_tail: 0xffff8100779144b8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a981b60 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
> xlog_in_core/header at 0xffff81007e59c908/0xffffc2000036f000
> magicno: feedbabe cycle: 58 version: 2 lsn: 0x3a00034120
> tail_lsn: 0x3a0000d587 len: 32256 prev_block: 213216 num_ops: 118
> cycle_data: 7aa7bcd8 5f75 6af1 1 3 0 1f01 7b 79 c42c00 0 0 0 0 0 0 0 0 0 4ce00001 b174 80000000 80000000 80000000 80000000 0 0 0 0 68800010 72e6 2000000 31c 1 8c
> size: 32768
>
> --------------------------------------------------
> data: 0xffffc2000036f200 &forcesema: 0xffff81007e59c908 next: 0xffff81007c193710 bp: 0xffff81007d102808
> log: 0xffff81007d33d650 callb: 0xffff81006a1c8168 callb_tail: 0xffff81004714d0f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78f80 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK >
> =================================================
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:04 ` Dave Chinner
@ 2008-07-11 4:10 ` Eric Sandeen
2008-07-11 4:13 ` Timothy Shimmin
2008-07-11 4:17 ` Christoph Hellwig
2008-07-11 4:44 ` Timothy Shimmin
2 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 4:10 UTC (permalink / raw)
To: Timothy Shimmin, Eric Sandeen, markgw, xfs-oss
Dave Chinner wrote:
> Patch for you to test, Eric, below.
>
> (I haven't been able to test it because UML panics on boot right now.
> I need to hoover in the unit-at-atime build patch that Jeff Dike
> posted this morning)
>
> Cheers,
>
> Dave.
Dave, you frighten me.
Testing now, will let it run a while.
-Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:10 ` Eric Sandeen
@ 2008-07-11 4:13 ` Timothy Shimmin
2008-07-11 4:18 ` Eric Sandeen
0 siblings, 1 reply; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11 4:13 UTC (permalink / raw)
To: Eric Sandeen; +Cc: markgw, xfs-oss
Eric Sandeen wrote:
> Dave Chinner wrote:
>
>> Patch for you to test, Eric, below.
>>
>> (I haven't been able to test it because UML panics on boot right now.
>> I need to hoover in the unit-at-atime build patch that Jeff Dike
>> posted this morning)
>>
>> Cheers,
>>
>> Dave.
>
> Dave, you frighten me.
>
> Testing now, will let it run a while.
>
Excellent.
And it happened pretty reliably before, right?
Thanks,
Tim.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:04 ` Dave Chinner
2008-07-11 4:10 ` Eric Sandeen
@ 2008-07-11 4:17 ` Christoph Hellwig
2008-07-11 4:44 ` Timothy Shimmin
2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2008-07-11 4:17 UTC (permalink / raw)
To: Timothy Shimmin, Eric Sandeen, markgw, xfs-oss
On Fri, Jul 11, 2008 at 02:04:47PM +1000, Dave Chinner wrote:
> xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
>
> - /* If I'm the only one writing to this iclog, sync it to disk */
> - if (atomic_read(&iclog->ic_refcnt) == 1) {
> + /*
> + * If I'm the only one writing to this iclog, sync it to disk.
> + * We need to do an atomic compare and decrement here to avoid
> + * racing with concurrent atomic_dec_and_lock() calls in
> + * xlog_state_release_iclog() when there is more than one
> + * reference to the iclog.
> + */
> + if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
> spin_unlock(&log->l_icloglock);
> - if ((error = xlog_state_release_iclog(log, iclog)))
> + error = xlog_state_release_iclog(log, iclog);
> + if (error)
> return error;
> } else {
> - atomic_dec(&iclog->ic_refcnt);
> spin_unlock(&log->l_icloglock);
> }
> goto restart;
Independent ofwether it actually fixes the bug (which I think it will)
this looks good. Doing anything with the return value from atomic_read
except for printing it is most likely bogus, and this one clearly is.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:13 ` Timothy Shimmin
@ 2008-07-11 4:18 ` Eric Sandeen
2008-07-11 4:26 ` Eric Sandeen
2008-07-11 4:27 ` Dave Chinner
0 siblings, 2 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 4:18 UTC (permalink / raw)
To: Timothy Shimmin; +Cc: markgw, xfs-oss
Timothy Shimmin wrote:
> Eric Sandeen wrote:
>> Dave Chinner wrote:
>>
>>> Patch for you to test, Eric, below.
>>>
>>> (I haven't been able to test it because UML panics on boot right now.
>>> I need to hoover in the unit-at-atime build patch that Jeff Dike
>>> posted this morning)
>>>
>>> Cheers,
>>>
>>> Dave.
>> Dave, you frighten me.
>>
>> Testing now, will let it run a while.
>>
> Excellent.
>
> And it happened pretty reliably before, right?
It did, yep, though sometimes took a while.
But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
written)
-Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:18 ` Eric Sandeen
@ 2008-07-11 4:26 ` Eric Sandeen
2008-07-11 4:27 ` Dave Chinner
1 sibling, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11 4:26 UTC (permalink / raw)
To: Timothy Shimmin; +Cc: markgw, xfs-oss
Eric Sandeen wrote:
> But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
> written)
ok testing a fixed up version ...
will let you know in my AM if it survived.
-Eric
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:18 ` Eric Sandeen
2008-07-11 4:26 ` Eric Sandeen
@ 2008-07-11 4:27 ` Dave Chinner
1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11 4:27 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Timothy Shimmin, markgw, xfs-oss
On Thu, Jul 10, 2008 at 11:18:31PM -0500, Eric Sandeen wrote:
> Timothy Shimmin wrote:
> > Eric Sandeen wrote:
> >> Dave Chinner wrote:
> >>
> >>> Patch for you to test, Eric, below.
> >>>
> >>> (I haven't been able to test it because UML panics on boot right now.
> >>> I need to hoover in the unit-at-atime build patch that Jeff Dike
> >>> posted this morning)
> >>>
> >>> Cheers,
> >>>
> >>> Dave.
> >> Dave, you frighten me.
> >>
> >> Testing now, will let it run a while.
> >>
> > Excellent.
> >
> > And it happened pretty reliably before, right?
>
> It did, yep, though sometimes took a while.
>
> But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
> written)
Yeah, inverted the if/else logic, I think. Like I
said, untested.
I'll send out an updated patch as soon as I get uml running
again.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: deadlocked xfs
2008-07-11 4:04 ` Dave Chinner
2008-07-11 4:10 ` Eric Sandeen
2008-07-11 4:17 ` Christoph Hellwig
@ 2008-07-11 4:44 ` Timothy Shimmin
2 siblings, 0 replies; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11 4:44 UTC (permalink / raw)
To: Eric Sandeen, markgw, xfs-oss; +Cc: Timothy Shimmin
Dave Chinner wrote:
> On Fri, Jul 11, 2008 at 01:21:39PM +1000, Timothy Shimmin wrote:
>>> It made this change which moved the refcount 'decrement and test' operation
>>> outside the l_icloglock spinlock and therefore can race.
>
> Just FYI, this is not a 'decrement and test' operation - it's
> an 'decrement and return locked if zero' operation. Best to
> explain it is this comment in lib/dec_and_lock.c:
>
Fine, although I didn't write the above :)
> /*
> * This is an implementation of the notion of "decrement a
> * reference count, and return locked if it decremented to zero".
> *
> * NOTE NOTE NOTE! This is _not_ equivalent to
> *
> * if (atomic_dec_and_test(&atomic)) {
> * spin_lock(&lock);
> * return 1;
> * }
> * return 0;
> *
> * because the spin-lock and the decrement must be
> * "atomic".
> */
>
>
>>> spin_lock(&log->l_icloglock);
>>> atomic_inc(&iclog->ic_refcnt); /* prevents sync */
>>>
>>> ...
>>>
>>> if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
>>> xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
>>>
>>> /* If I'm the only one writing to this iclog, sync it to disk */
>>> if (atomic_read(&iclog->ic_refcnt) == 1) {
>>> spin_unlock(&log->l_icloglock);
>>> if ((error = xlog_state_release_iclog(log, iclog)))
>>> return error;
>>> } else {
>>> atomic_dec(&iclog->ic_refcnt);
>>> spin_unlock(&log->l_icloglock);
>>> }
>>> goto restart;
>>> }
>
> Ok, so all the increments of ic_refcnt are done under l_icloglock.
> That means through this section of code we can not ever have teh reference
> count increase. It can decrease, but never to zero because that
> requires holding the l_icloglock.
>
>>> Previously xlog_state_release_iclog() would wait for the spinlock to be
>>> released above but will now do the atomic_dec_and_lock() while the above
>>> code is executing. The above atomic_read() will return 2, the
>>> atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
>>> return without syncing the iclog and then the above code will decrement the
>>> atomic counter. The iclog never gets out of WANT_SYNC state and everything
>>> gets stuck behind it.
>
> So what that means is that need the semantics here to be an
> atomic 'decrement unless the count is 1' i.e.:
>
> if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
> spin_unlock(&log->l_icloglock);
> if ((error = xlog_state_release_iclog(log, iclog)))
> return error;
> } else {
> spin_unlock(&log->l_icloglock);
> }
>
Gee, they have an atomic_ almost everything :-)
When I was talking to Lachlan he was saying it would need to make the
atomic_read and the dec somehow atomic (close the window)
- didn't realise such a func existed.
(And yeah, I hear you saying well one needs to look...
he must have gotten on to one of the corruption fixes..dunno...)
Just saw email, yeah didn't pick up on the reverse logic there -
atomic decrement unless equal to one, yep (reverse if/else logic).
> This means that the compare + decrement becomes atomic, thereby
> closing the race condition where another unlocked thread can
> decrement the reference count in between the compare and decrement
> here.
>
Yep.
--Tim
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2008-07-11 4:44 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 2:33 deadlocked xfs Eric Sandeen
2008-07-11 2:48 ` Mark Goodwin
2008-07-11 2:56 ` Eric Sandeen
2008-07-11 3:21 ` Timothy Shimmin
2008-07-11 4:04 ` Dave Chinner
2008-07-11 4:10 ` Eric Sandeen
2008-07-11 4:13 ` Timothy Shimmin
2008-07-11 4:18 ` Eric Sandeen
2008-07-11 4:26 ` Eric Sandeen
2008-07-11 4:27 ` Dave Chinner
2008-07-11 4:17 ` Christoph Hellwig
2008-07-11 4:44 ` Timothy Shimmin
2008-07-11 3:22 ` Dave Chinner
2008-07-11 3:50 ` Mark Goodwin
2008-07-11 4:02 ` Eric Sandeen
2008-07-11 4:05 ` Dave Chinner
2008-07-11 4:08 ` Timothy Shimmin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox