public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* deadlocked xfs
@ 2008-07-11  2:33 Eric Sandeen
  2008-07-11  2:48 ` Mark Goodwin
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  2:33 UTC (permalink / raw)
  To: xfs-oss

while running fs_mark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

which runs 4 threads making 20k files,things got stuffed up in the log,
see sysrq-w output at:

http://sandeen.fedorapeople.org/xfs/deadxfs

2 threads down this path:

 [<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
 [<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
 [<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50

this is on a 4 cpu AMD.

I made the fs with:

mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb

and mounted with:

mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test

has this been seen before?

Thanks,

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  2:33 deadlocked xfs Eric Sandeen
@ 2008-07-11  2:48 ` Mark Goodwin
  2008-07-11  2:56   ` Eric Sandeen
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mark Goodwin @ 2008-07-11  2:48 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

Thanks for the report Eric. This looks very similar to a
deadlock Lachlan recently hit in the patch for
"Use atomics for iclog reference counting"
http://oss.sgi.com/archives/xfs/2008-02/msg00130.html

It seems this patch can cause deadlocks under heavy log traffic.
I don't think anyone has a fix yet ... Lachlan is out this week,
but Tim can follow-up here ...

Cheers
-- Mark

Eric Sandeen wrote:
> while running fs_mark:
> 
> fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
> 
> which runs 4 threads making 20k files,things got stuffed up in the log,
> see sysrq-w output at:
> 
> http://sandeen.fedorapeople.org/xfs/deadxfs
> 
> 2 threads down this path:
> 
>  [<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
>  [<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
>  [<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50
> 
> this is on a 4 cpu AMD.
> 
> I made the fs with:
> 
> mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb
> 
> and mounted with:
> 
> mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test
> 
> has this been seen before?
> 
> Thanks,
> 
> -Eric
> 
> 

-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  2:48 ` Mark Goodwin
@ 2008-07-11  2:56   ` Eric Sandeen
  2008-07-11  3:21   ` Timothy Shimmin
  2008-07-11  3:22   ` Dave Chinner
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  2:56 UTC (permalink / raw)
  To: markgw; +Cc: xfs-oss

Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
> 
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...

thanks I'll look into that.

Forgot to mention this was on 2.6.26-rc2

-Eric

> Cheers
> -- Mark
> 
> Eric Sandeen wrote:
>> while running fs_mark:
>>
>> fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0
>>
>> which runs 4 threads making 20k files,things got stuffed up in the log,
>> see sysrq-w output at:
>>
>> http://sandeen.fedorapeople.org/xfs/deadxfs
>>
>> 2 threads down this path:
>>
>>  [<ffffffffa03c89f5>] :xfs:xlog_state_get_iclog_space+0x83/0x187
>>  [<ffffffffa03c8c1a>] :xfs:xlog_write+0x121/0x486
>>  [<ffffffffa03c9043>] :xfs:xfs_log_write+0x2a/0x50
>>
>> this is on a 4 cpu AMD.
>>
>> I made the fs with:
>>
>> mkfs.xfs -dsize=1t,agcount=32 -l lazy-count=1 -f /dev/sdb
>>
>> and mounted with:
>>
>> mount -o nobarrier,logbsize=256k,logbufs=8,inode64 /dev/sdb /mnt/test
>>
>> has this been seen before?
>>
>> Thanks,
>>
>> -Eric
>>
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  2:48 ` Mark Goodwin
  2008-07-11  2:56   ` Eric Sandeen
@ 2008-07-11  3:21   ` Timothy Shimmin
  2008-07-11  4:04     ` Dave Chinner
  2008-07-11  3:22   ` Dave Chinner
  2 siblings, 1 reply; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11  3:21 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: markgw, xfs-oss

Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
> 
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...
> 
> Cheers
> -- Mark
> 

Okay, what Mark is talking about is pv#983925.
Details below from the bug - from Lachlan and me.
I'm sorry that this info didn't go out sooner.
I'm not sure if Lachlan got any further with this but he's
been away.

--Tim

Lachlan wrote:
[....stuff deleted...]
> It's the same signature every time with the current iclog in WANT_SYNC state
> and the rest of the iclogs in DO_CALLBACK state.  The current iclog log should
> transition from WANT_SYNC to SYNCING to DONE_SYNC and eventually to ACTIVE so
> the threads above can continue but it's not a happening.
> 
> I suspect this mod is the culprit:
> 
> mod xfs-linux-melb:xfs-kern:30505a header
> ==========================================
>  - SM_Location:  longdrop.melbourne.sgi.com:/isms/linux/2.6.x-xfs-melb
>  - Workarea:     chook.melbourne.sgi.com:/build/dgc/isms/2.6.x-xfs
>  - xfs-linux-melb:xfs-kern:30505a 02/15/08
>  - PV Incidents affected:        975671
>  - Inspected by:         tes@sgi.com
>  - Description:
>    Use atomics for iclog reference counting
> 
>    Now that we update the log tail LSN less frequently on
>    transaction completion, we pass the contention straight to
>    the global log state lock (l_iclog_lock) during transaction
>    completion.
>    
>    We currently have to take this lock to decrement the iclog
>    reference count. there is a reference count on each iclog,
>    so we need to take &#65533;he global lock for all refcount changes.
>    
>    When large numbers of processes are all doing small trnasctions,
>    the iclog reference counts will be quite high, and the state change
>    that absolutely requires the l_iclog_lock is the except rather than
>    the norm.
>    
>    Change the reference counting on the iclogs to use atomic_inc/dec
>    so that we can use atomic_dec_and_lock during transaction completion
>    and avoid the need for grabbing the l_iclog_lock for every reference
>    count decrement except the one that matters - the last.
>  - Files affected:  xfsidbg.c 1.343, xfs_log.c 1.347,
>    xfs_log_priv.h 1.125
>  - Author:       dgc
>  - xfsidbg.c:
>    use correct atomic accessor for the iclog refcount.
>  - xfs_log.c:
>    Reduce contention on the iclog state lock by using atomic
>    reference counters for the iclogs and only grabbing the
>    iclog lock on transaction completion when the last reference
>    to the iclog is being removed.
>  - xfs_log_priv.h:
>    change ic_refcount to an atomic_t.
> 
> 
> It made this change which moved the refcount 'decrement and test' operation
> outside the l_icloglock spinlock and therefore can race. 
> 
> @@ -2819,18 +2819,21 @@ xlog_state_release_iclog(
>  {
>         int             sync = 0;       /* do we sync? */
>  
> -       spin_lock(&log->l_icloglock);
> +       if (iclog->ic_state & XLOG_STATE_IOERROR)
> +               return XFS_ERROR(EIO);
> +
> +       ASSERT(atomic_read(&iclog->ic_refcnt) > 0);
> +       if (!atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock))
> +               return 0;
> +
>         if (iclog->ic_state & XLOG_STATE_IOERROR) {
>                 spin_unlock(&log->l_icloglock);
>                 return XFS_ERROR(EIO);
>         }
> 
> The above code will race with this code in xlog_state_get_iclog_space():
> 
> 	spin_lock(&log->l_icloglock);
> 	atomic_inc(&iclog->ic_refcnt);	/* prevents sync */
> 
> 	...
> 
> 	if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
> 		xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
> 
> 		/* If I'm the only one writing to this iclog, sync it to disk */
> 		if (atomic_read(&iclog->ic_refcnt) == 1) {
> 			spin_unlock(&log->l_icloglock);
> 			if ((error = xlog_state_release_iclog(log, iclog)))
> 				return error;
> 		} else {
> 			atomic_dec(&iclog->ic_refcnt);
> 			spin_unlock(&log->l_icloglock);
> 		}
> 		goto restart;
> 	}
> 
> Previously xlog_state_release_iclog() would wait for the spinlock to be
> released above but will now do the atomic_dec_and_lock() while the above
> code is executing.  The above atomic_read() will return 2, the
> atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
> return without syncing the iclog and then the above code will decrement the
> atomic counter.  The iclog never gets out of WANT_SYNC state and everything
> gets stuck behind it.
> 
> 
> ==========================
> ADDITIONAL INFORMATION (ADD)
> From: tes@sgi.com (BugWorks)
> Date: Jul 01 2008 09:59:49PM
> ==========================
> 
> Hi,
> 
> Okay, I've just been trying to understand what Lachlan was saying
> about the problem and Lachlan kindly gave me an explanation.
> 
> Based on that, below is my understanding so I remember what is going
> wrong here ;-) Correct me if I got this wrong.
> 
> 
> Bad scenario (current code)
> ----------------------------
> Process A has a reference on iclog
> and iclog-refcnt == 1
> 
> Process B tries to get some log space,
> and takes lock and bumps count to 2
> Process B tests to see if its refcnt is just 1
> but it isn't so it moves on, _up_ to the refcnt decrement
> 
> Process A decides to release its iclog
> and calls xlog_state_release_iclog
> It does a decrement WITHOUT taking lock
> The refcnt goes from 2 to 1 and so it just
> returns without sync'ing.
> 
> Process B now does its decrement and takes
> the refcnt from 1 to 0. and releases the lock.
> 
> 
> Good scenario (with previous code)
> -----------------------------------
> Process A has a reference on iclog
> and iclog-refcnt == 1
> 
> Process B tries to get some logspace,
> and takes lock and bumps count to 2.
> Process B tests to see if its refcnt is just 1
> but it isn't so it moves on, _up_ to the refcnt decrement.
> 
> Process A decides to release its iclog
> and calls xlog_state_release_iclog.
> However, it can't get the lock yet and so it waits.
> 
> Process B now decrements its refcnt from 2
> back to 1 and releases the lock.
> 
> Process A now can get its lock and decrements
> the refcnt, finds it goes from 1 to zero and
> can now do the sync'ing.
> -----------------------------------
> 
> --Tim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  2:48 ` Mark Goodwin
  2008-07-11  2:56   ` Eric Sandeen
  2008-07-11  3:21   ` Timothy Shimmin
@ 2008-07-11  3:22   ` Dave Chinner
  2008-07-11  3:50     ` Mark Goodwin
  2 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2008-07-11  3:22 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: Eric Sandeen, xfs-oss

On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
> Thanks for the report Eric. This looks very similar to a
> deadlock Lachlan recently hit in the patch for
> "Use atomics for iclog reference counting"
> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>
> It seems this patch can cause deadlocks under heavy log traffic.
> I don't think anyone has a fix yet ... Lachlan is out this week,
> but Tim can follow-up here ...

Nice to know - why didn't anyone email me or report this to the
list when the bug was first found? I mean, I wrote that code, I know
what it is supposed to be doing and as a result should be able
help find and fix the bug. Can you please post what details you have
about the problem (test case, stack traces, debugging info, etc)
so I can try to find the problem.

This is a regression that is in the mainline kernel that is due to
be released probably in the next couple of days. Having a little
bit of time to try and find the bug would have been nice...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  3:22   ` Dave Chinner
@ 2008-07-11  3:50     ` Mark Goodwin
  2008-07-11  4:02       ` Eric Sandeen
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mark Goodwin @ 2008-07-11  3:50 UTC (permalink / raw)
  To: Mark Goodwin, Eric Sandeen, xfs-oss



Dave Chinner wrote:
> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>> Thanks for the report Eric. This looks very similar to a
>> deadlock Lachlan recently hit in the patch for
>> "Use atomics for iclog reference counting"
>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>
>> It seems this patch can cause deadlocks under heavy log traffic.
>> I don't think anyone has a fix yet ... Lachlan is out this week,
>> but Tim can follow-up here ...
> 
> Nice to know - why didn't anyone email me or report this to the
> list when the bug was first found? I mean, I wrote that code, I know
> what it is supposed to be doing and as a result should be able

Only recently found and didn't think it was this easy to hit.
But no excuses ...

> help find and fix the bug. Can you please post what details you have
> about the problem (test case, stack traces, debugging info, etc)
> so I can try to find the problem.

See Tim's follow-up.

> This is a regression that is in the mainline kernel that is due to
> be released probably in the next couple of days. Having a little
> bit of time to try and find the bug would have been nice...

At this stage, I think it would be safest to back-out the commit,
all the way to mainline. Tim, can you please work thru that today
with priority.

Thanks

-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  3:50     ` Mark Goodwin
@ 2008-07-11  4:02       ` Eric Sandeen
  2008-07-11  4:05       ` Dave Chinner
  2008-07-11  4:08       ` Timothy Shimmin
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  4:02 UTC (permalink / raw)
  To: markgw; +Cc: xfs-oss

Mark Goodwin wrote:

> At this stage, I think it would be safest to back-out the commit,
> all the way to mainline. Tim, can you please work thru that today
> with priority.

how 'bout I test my testcase w/ the patch removed to at least be sure
that's it.  :)

Then maybe we can see if Dave, having been alerted to the problem, has a
notion of how to fix it?

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  3:21   ` Timothy Shimmin
@ 2008-07-11  4:04     ` Dave Chinner
  2008-07-11  4:10       ` Eric Sandeen
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11  4:04 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: Eric Sandeen, markgw, xfs-oss

On Fri, Jul 11, 2008 at 01:21:39PM +1000, Timothy Shimmin wrote:
> > It made this change which moved the refcount 'decrement and test' operation
> > outside the l_icloglock spinlock and therefore can race. 

Just FYI, this is not a 'decrement and test' operation - it's
an 'decrement and return locked if zero' operation. Best to
 explain it is this comment in lib/dec_and_lock.c:

/*
 * This is an implementation of the notion of "decrement a
 * reference count, and return locked if it decremented to zero".
 *
 * NOTE NOTE NOTE! This is _not_ equivalent to
 *
 *      if (atomic_dec_and_test(&atomic)) {
 *              spin_lock(&lock);
 *              return 1;
 *      }
 *      return 0;
 *
 * because the spin-lock and the decrement must be
 * "atomic".
 */


> > 	spin_lock(&log->l_icloglock);
> > 	atomic_inc(&iclog->ic_refcnt);	/* prevents sync */
> > 
> > 	...
> > 
> > 	if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
> > 		xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
> > 
> > 		/* If I'm the only one writing to this iclog, sync it to disk */
> > 		if (atomic_read(&iclog->ic_refcnt) == 1) {
> > 			spin_unlock(&log->l_icloglock);
> > 			if ((error = xlog_state_release_iclog(log, iclog)))
> > 				return error;
> > 		} else {
> > 			atomic_dec(&iclog->ic_refcnt);
> > 			spin_unlock(&log->l_icloglock);
> > 		}
> > 		goto restart;
> > 	}

Ok, so all the increments of ic_refcnt are done under l_icloglock.
That means through this section of code we can not ever have teh reference
count increase. It can decrease, but never to zero because that
requires holding the l_icloglock.

> > Previously xlog_state_release_iclog() would wait for the spinlock to be
> > released above but will now do the atomic_dec_and_lock() while the above
> > code is executing.  The above atomic_read() will return 2, the
> > atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
> > return without syncing the iclog and then the above code will decrement the
> > atomic counter.  The iclog never gets out of WANT_SYNC state and everything
> > gets stuck behind it.

So what that means is that need the semantics here to be an
atomic 'decrement unless the count is 1' i.e.:

	if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
		spin_unlock(&log->l_icloglock);
		if ((error = xlog_state_release_iclog(log, iclog)))
			return error;
	} else {
			spin_unlock(&log->l_icloglock);
	}

This means that the compare + decrement becomes atomic, thereby
closing the race condition where another unlocked thread can
decrement the reference count in between the compare and decrement
here.

Patch for you to test, Eric, below.

(I haven't been able to test it because UML panics on boot right now.
I need to hoover in the unit-at-atime build patch that Jeff Dike
posted this morning)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

Fix reference counting race on log buffers

When we release the iclog, we do an atomic_dec_and_lock to determine
if we are the last reference to enable update of log headers and
writeout. however, in xlog_state_get_iclog_space() we need to check
if we have the last reference count there.  if we do, we release the
log buffer, otherwise we decrement the reference count.

The issue is that the compare and decrement in
xlog_state_get_iclog_space() is not atomic, so both places can see a
reference count of 2 and neither will release the iclog. That leads
to a filesystem hang.

Close the hole replacing the compare and decrement with
atomic_add_unless() to ensure that they are executed atomically.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_log.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 760d543..8033f8a 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2425,13 +2425,19 @@ restart:
 	if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
 		xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
 
-		/* If I'm the only one writing to this iclog, sync it to disk */
-		if (atomic_read(&iclog->ic_refcnt) == 1) {
+		/*
+		 * If I'm the only one writing to this iclog, sync it to disk.
+		 * We need to do an atomic compare and decrement here to avoid
+		 * racing with concurrent atomic_dec_and_lock() calls in
+		 * xlog_state_release_iclog() when there is more than one
+		 * reference to the iclog.
+		 */
+		if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
 			spin_unlock(&log->l_icloglock);
-			if ((error = xlog_state_release_iclog(log, iclog)))
+			error = xlog_state_release_iclog(log, iclog);
+			if (error)
 				return error;
 		} else {
-			atomic_dec(&iclog->ic_refcnt);
 			spin_unlock(&log->l_icloglock);
 		}
 		goto restart;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  3:50     ` Mark Goodwin
  2008-07-11  4:02       ` Eric Sandeen
@ 2008-07-11  4:05       ` Dave Chinner
  2008-07-11  4:08       ` Timothy Shimmin
  2 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11  4:05 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: Eric Sandeen, xfs-oss

On Fri, Jul 11, 2008 at 01:50:10PM +1000, Mark Goodwin wrote:
>
>
> Dave Chinner wrote:
>> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>>> Thanks for the report Eric. This looks very similar to a
>>> deadlock Lachlan recently hit in the patch for
>>> "Use atomics for iclog reference counting"
>>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>>
>>> It seems this patch can cause deadlocks under heavy log traffic.
>>> I don't think anyone has a fix yet ... Lachlan is out this week,
>>> but Tim can follow-up here ...
>>
>> Nice to know - why didn't anyone email me or report this to the
>> list when the bug was first found? I mean, I wrote that code, I know
>> what it is supposed to be doing and as a result should be able
>
> Only recently found and didn't think it was this easy to hit.
> But no excuses ...
>
>> help find and fix the bug. Can you please post what details you have
>> about the problem (test case, stack traces, debugging info, etc)
>> so I can try to find the problem.
>
> See Tim's follow-up.
>
>> This is a regression that is in the mainline kernel that is due to
>> be released probably in the next couple of days. Having a little
>> bit of time to try and find the bug would have been nice...
>
> At this stage, I think it would be safest to back-out the commit,
> all the way to mainline. Tim, can you please work thru that today
> with priority.

No, do not back it out. I just posted the fix.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  3:50     ` Mark Goodwin
  2008-07-11  4:02       ` Eric Sandeen
  2008-07-11  4:05       ` Dave Chinner
@ 2008-07-11  4:08       ` Timothy Shimmin
  2 siblings, 0 replies; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11  4:08 UTC (permalink / raw)
  To: David Chinner; +Cc: markgw, Eric Sandeen, xfs-oss

Mark Goodwin wrote:
> 
> 
> Dave Chinner wrote:
>> On Fri, Jul 11, 2008 at 12:48:11PM +1000, Mark Goodwin wrote:
>>> Thanks for the report Eric. This looks very similar to a
>>> deadlock Lachlan recently hit in the patch for
>>> "Use atomics for iclog reference counting"
>>> http://oss.sgi.com/archives/xfs/2008-02/msg00130.html
>>>
>>> It seems this patch can cause deadlocks under heavy log traffic.
>>> I don't think anyone has a fix yet ... Lachlan is out this week,
>>> but Tim can follow-up here ...
>>
>> Nice to know - why didn't anyone email me or report this to the
>> list when the bug was first found? I mean, I wrote that code, I know
>> what it is supposed to be doing and as a result should be able
> 
> Only recently found and didn't think it was this easy to hit.
> But no excuses ...
> 
>> help find and fix the bug. Can you please post what details you have
>> about the problem (test case, stack traces, debugging info, etc)
>> so I can try to find the problem.
> 
> See Tim's follow-up.
> 

Remaining info from bug....

> While stress testing I'm frequently hitting a deadlock in the log code where
> many threads are stuck in xlog_state_get_iclog_space().
> 
> Stack traceback for pid 16570
> 0xffff8100421fa4c0    16570     4620  0    2   D  0xffff8100421fa810  rm
> rsp                rip                Function (args)
> 0xffff81003c41d908 0xffffffff8062be32 thread_return
> 0xffff81003c41d9a0 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003c41da10 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003c41db00, 0xffff81007c1d39c0, 0xffff81003c41db08, 0xffff81003c41db0c)
> 0xffff81003c41daa0 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003c41db58, invalid, 0xffff81007c1d39c0, 0xffff81003c41dba0, 0xffff81003c41dce0, 0xffff810000000002)
> 0xffff81003c41db40 0xffffffff80395cb4 xlog_commit_record+0x6e (0xffff81007c97d338, 0xffff81007c1d39c0, 0xffff81003c41dce0, 0xffff81003c41dba0)
> 0xffff81003c41db90 0xffffffff80396c31 xfs_log_done+0x34 (invalid, 0xffff81007c1d39c0, invalid, invalid)
> 0xffff81003c41dbc0 0xffffffff803a1271 _xfs_trans_commit+0x24a (0xffff81007a99c780, invalid, 0x0)
> 0xffff81003c41dd20 0xffffffff8035eb2e xfs_bmap_finish+0xaa (0xffff81003c41de38, 0xffff81003c41ddc8, 0xffff81003c41dde0)
> 0xffff81003c41dd80 0xffffffff8038ae2b xfs_itruncate_finish+0x376 (0xffff81003c41de38, 0xffff81007b9c5670, invalid, invalid, 0x1)
> 0xffff81003c41de20 0xffffffff803ab720 xfs_inactive+0x27d (0xffff81007b9c5670)
> 0xffff81003c41de70 0xffffffff803b663e xfs_fs_clear_inode+0xbd (0xffff810073cf0438)
> 0xffff81003c41de90 0xffffffff802964af clear_inode+0x9c (0xffff810073cf0438)
> 0xffff81003c41dea0 0xffffffff80296598 generic_delete_inode+0x92 (0xffff810073cf0438)
> 0xffff81003c41dec0 0xffffffff8028d953 do_unlinkat+0xdd (invalid, invalid)
> 0xffff81003c41df80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
> Enter <q> to end, <cr> to continue:
> Stack traceback for pid 24088
> 0xffff81003b804a00    24088     4649  0    6   D  0xffff81003b804d50  xfs_io
> rsp                rip                Function (args)
> 0xffff81003e403288 0xffffffff8062be32 thread_return
> 0xffff81003e403320 0xffffffff8062c205 schedule_timeout+0x1e (0x7fffffffffffffff)
> 0xffff81003e403390 0xffffffff803954f0 xlog_state_get_iclog_space+0xed (0xffff81007d33d650, invalid, 0xffff81003e403480, 0xffff8100741499c0, 0xffff81003e403488, 0xffff81003e40348c)
> 0xffff81003e403420 0xffffffff8039578d xlog_write+0x118 (0xffff81007c97d338, 0xffff81003e4034f8, invalid, 0xffff8100741499c0, 0xffff81003e5b9858, 0x0, 0x0)
> 0xffff81003e4034c0 0xffffffff80395d13 xfs_log_write+0x33 (0xffff81007c97d338)
> 0xffff81003e4034e0 0xffffffff803a1256 _xfs_trans_commit+0x22f (0xffff81003e5b9810, invalid, 0x0)
> 0xffff81003e403640 0xffffffff8038fdba xfs_iomap_write_allocate+0x2eb (0xffff81007b9e7c90, invalid, invalid, 0xffff81003e403788, 0xffff81003e4037b4)
> 0xffff81003e403730 0xffffffff80390dd6 xfs_iomap+0x35a (0xffff81007b9e7c90, 0x19edd000, invalid, invalid, 0xffff81003e403898, 0xffff81003e403804)
> 0xffff81003e4037f0 0xffffffff803acf45 xfs_map_blocks+0x2c (invalid, invalid, invalid, invalid, invalid)
> 0xffff81003e403820 0xffffffff803ae04e xfs_page_state_convert+0x28b (0xffff81007b9e6c18, 0xffffe200028ac0f0, 0xffff81003e403b58, invalid, 0x100000000)
> 0xffff81003e403910 0xffffffff803ae61e xfs_vm_writepage+0xca (0xffffe200028ac0f0, 0xffff81003e403b58)
> 0xffff81003e403940 0xffffffff80260c16 __writepage+0xd (invalid, invalid, 0xffff81007b9e6e00)
> 0xffff81003e403950 0xffffffff8026112c write_cache_pages+0x19e (0xffff81007b9e6e00, 0xffff81003e403b58, 0xffffffff80260c09, 0xffff81007b9e6e00)
> 0xffff81003e403a50 0xffffffff802612a0 do_writepages+0x23 (invalid, 0xffff81003e403b58)
> 0xffff81003e403a60 0xffffffff8029ebca __writeback_single_inode+0x150 (0xffff81007b9e6c18, 0xffff81003e403b58)
> 0xffff81003e403ae0 0xffffffff8029f124 sync_sb_inodes+0x1b4 (0xffff81007c15d3f8, 0xffff81003e403b58)
> 0xffff81003e403b20 0xffffffff8029f406 writeback_inodes+0x81 (0xffff81003e403b58)
> 0xffff81003e403b40 0xffffffff802616f9 balance_dirty_pages_ratelimited_nr+0x15e (invalid, invalid)
> 0xffff81003e403bf0 0xffffffff8025bcb6 generic_file_buffered_write+0x20c (0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0x3c7c8000, 0xffff81003e403e78, invalid, 0x0)
> 0xffff81003e403cf0 0xffffffff803b56b4 xfs_write+0x68a (0xffff81007b58d360, 0xffff81003e403df8, 0xffff81003e403ee8, 0x1, 0xffff81003e403e78, invalid)
> 0xffff81003e403df0 0xffffffff8028411f do_sync_write+0xc9 (0xffff810079de9530, 0x51d000, 0x1000, 0xffff81003e403f48)
> 0xffff81003e403f10 0xffffffff8028485f vfs_write+0xad (0xffff810079de9530, 0x51d000, invalid, 0xffff81003e403f48)
> 0xffff81003e403f40 0xffffffff80284e87 sys_pwrite64+0x50 (invalid, 0x51d000, 0x1000, 0x3c7c8000)
> 0xffff81003e403f80 0xffffffff8020ae4b system_call_after_swapgs+0x7b (invalid, invalid, invalid, invalid, invalid, invalid)
> 
> Taking a look at the log and iclog structures...
> 
> [2]kdb> xlog 0xffff81007d33d650
> xlog at 0xffff81007d33d650
> &flush_wait: 0xffff81007d33d6d0  ICLOG: 0xffff81007c193710  
> &icloglock: 0xffff81007d33d720  tail_lsn: [58:173319]  last_sync_lsn: [58:173319] 
> mp: 0xffff81007c97d338  xbuf: 0xffff81007c8e9e08  l_covered_state: need 
> flags: log 0x0 <>   logBBstart: 0 logsize: 134217728 logBBsize: 262144
> curr_cycle: 58  prev_cycle: 58  curr_block: 213344  prev_block: 213280
> iclog_bak: 0xffff81007d33d830  iclog_size: 0x8000 (32768)  num iclogs: 8
> l_iclog_hsize 512 l_iclog_heads 1
> l_sectbb_log 0 l_sectbb_mask 0
> &grant_lock: 0xffff81007d33d7d0  resHeadQ: 0x0000000000000000  wrHeadQ: 0x0000000000000000
> GResCycle: 58  GResBytes: 110263292  GWrCycle: 58  GWrBytes: 110263292
> GResBlocks: 215357 GResRemain: 508  GWrBlocks: 215357 GWrRemain: 508
> trace: 0x0000000000000000  grant_trace: use xlog value
> [2]kdb> 
> [2]kdb> xicall 0xffff81007c193710
> xlog_in_core/header at 0xffff81007c193710/0xffffc2001110d000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033f60
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212768  num_ops: 92
> cycle_data: 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2001110d200  &forcesema: 0xffff81007c193710  next: 0xffff81007c1932f8 bp: 0xffff81007d102608
> log: 0xffff81007d33d650  callb: 0xffff810077915898  callb_tail: 0xffff81003e43cad8
> size: 32256 (OFFSET: 32240) trace: 0xffff81007aa78560 refcnt: 0 bwritecnt: 0 state: 0x2 <WANT_SYNC > 
> =================================================
> xlog_in_core/header at 0xffff81007c1932f8/0xffffc20011116000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033fa0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212832  num_ops: 99
> cycle_data: 7c1d39c0  127c8  121  121  121  121  122  122  123  123  36120100  3412  0  0  0  0  0  0  0  0  69020000  5  41425443  1c84  4  16  155  60  0  1cd9400  1d05600  1d24600  1d52a00  1e3 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20011116200  &forcesema: 0xffff81007c1932f8  next: 0xffff81007d297610 bp: 0xffff81007d102408
> log: 0xffff81007d33d650  callb: 0xffff81007206b898  callb_tail: 0xffff8100715010f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa781d0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d297610/0xffffc2001111f000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00033fe0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212896  num_ops: 102
> cycle_data: 7c1d39c0  2dd  3d  547a  1f7d  0  24  2000000  2f6  1  11  1  4a  6730  72d9  631a  6d46  db00  0  6240000d  8400000b  9fa00025  d2e00010  c2c00001  3400001  3da00001  85600001  100000 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2001111f200  &forcesema: 0xffff81007d297610  next: 0xffff81007d296198 bp: 0xffff81007d102208
> log: 0xffff81007d33d650  callb: 0xffff81007aac3548  callb_tail: 0xffff81006cdf2788
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9ef860 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d296198/0xffffc20011128000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034020
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 212960  num_ops: 105
> cycle_data: 7aa7bcd8  0  7e00001  4de00001  4d000001  56600001  49400001  5ae00001  9f600001  4f800001  1f03000  0  1  3  5  a  1f  4347  55b4  5e62  644a  716a  df1400  e59800  ea0600  efc000  f9 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20011128200  &forcesema: 0xffff81007d296198  next: 0xffff81007e59cd20 bp: 0xffff81007d102008
> log: 0xffff81007d33d650  callb: 0xffff81006cd5b798  callb_tail: 0xffff81006c4d10f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78680 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59cd20/0xffffc20000354000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034060
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213024  num_ops: 93
> cycle_data: 7aa7bcd8  0  91400001  46600001  67200001  8800001  49400001  5ae00001  9f600001  4f800001  0  4fe00001  79c00001  99600001  c9c00001  b4e0000d  ede0000d  31000017  7580003a  0  5f60   
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20000354200  &forcesema: 0xffff81007e59cd20  next: 0xffff81007e59c4f0 bp: 0xffff81007d102e08
> log: 0xffff81007d33d650  callb: 0xffff81003fd86068  callb_tail: 0xffff81003e4fca58
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a9efef0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59c4f0/0xffffc2000035d000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a000340a0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213088  num_ops: 88
> cycle_data: 7c1d39c0  127c8  121  121  121  121  122  122  123  123  36120100  5d10  6a72  1  3  2030000  ba00001  35600001  3a800001  d6800001  dd200001  e0800001  83000800  6127  6164  58b7  72d 
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2000035d200  &forcesema: 0xffff81007e59c4f0  next: 0xffff81007d5084f0 bp: 0xffff81007d102c08
> log: 0xffff81007d33d650  callb: 0xffff81006a091818  callb_tail: 0xffff81007206b548
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa786e0 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007d5084f0/0xffffc20000366000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a000340e0
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213152  num_ops: 101
> cycle_data: 7aa7bcd8  5f75  6af1  1  3  e359cb09  4  7  11  143  1c  0  1ceba00  1d10800  1d29800  1d5f800  1e53000  1e97000  1ed1800  1f16400  7936448  35  de7a00  e4c800  e9dc00  ee3c00  f7a800  
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc20000366200  &forcesema: 0xffff81007d5084f0  next: 0xffff81007e59c908 bp: 0xffff81007d102a08
> log: 0xffff81007d33d650  callb: 0xffff8100778a6da8  callb_tail: 0xffff8100779144b8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007a981b60 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================
> xlog_in_core/header at 0xffff81007e59c908/0xffffc2000036f000
> magicno: feedbabe  cycle: 58  version: 2  lsn: 0x3a00034120
> tail_lsn: 0x3a0000d587  len: 32256  prev_block: 213216  num_ops: 118
> cycle_data: 7aa7bcd8  5f75  6af1  1  3  0  1f01  7b  79  c42c00  0  0  0  0  0  0  0  0  0  4ce00001  b174  80000000  80000000  80000000  80000000  0  0  0  0  68800010  72e6  2000000  31c  1  8c  
> size: 32768
> 
> --------------------------------------------------
> data: 0xffffc2000036f200  &forcesema: 0xffff81007e59c908  next: 0xffff81007c193710 bp: 0xffff81007d102808
> log: 0xffff81007d33d650  callb: 0xffff81006a1c8168  callb_tail: 0xffff81004714d0f8
> size: 32256 (OFFSET: 32256) trace: 0xffff81007aa78f80 refcnt: 0 bwritecnt: 0 state: 0x10 <DO_CALLBACK > 
> =================================================

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:04     ` Dave Chinner
@ 2008-07-11  4:10       ` Eric Sandeen
  2008-07-11  4:13         ` Timothy Shimmin
  2008-07-11  4:17       ` Christoph Hellwig
  2008-07-11  4:44       ` Timothy Shimmin
  2 siblings, 1 reply; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  4:10 UTC (permalink / raw)
  To: Timothy Shimmin, Eric Sandeen, markgw, xfs-oss

Dave Chinner wrote:

> Patch for you to test, Eric, below.
> 
> (I haven't been able to test it because UML panics on boot right now.
> I need to hoover in the unit-at-atime build patch that Jeff Dike
> posted this morning)
> 
> Cheers,
> 
> Dave.

Dave, you frighten me.

Testing now, will let it run a while.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:10       ` Eric Sandeen
@ 2008-07-11  4:13         ` Timothy Shimmin
  2008-07-11  4:18           ` Eric Sandeen
  0 siblings, 1 reply; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11  4:13 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: markgw, xfs-oss

Eric Sandeen wrote:
> Dave Chinner wrote:
> 
>> Patch for you to test, Eric, below.
>>
>> (I haven't been able to test it because UML panics on boot right now.
>> I need to hoover in the unit-at-atime build patch that Jeff Dike
>> posted this morning)
>>
>> Cheers,
>>
>> Dave.
> 
> Dave, you frighten me.
> 
> Testing now, will let it run a while.
> 
Excellent.

And it happened pretty reliably before, right?

Thanks,
Tim.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:04     ` Dave Chinner
  2008-07-11  4:10       ` Eric Sandeen
@ 2008-07-11  4:17       ` Christoph Hellwig
  2008-07-11  4:44       ` Timothy Shimmin
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2008-07-11  4:17 UTC (permalink / raw)
  To: Timothy Shimmin, Eric Sandeen, markgw, xfs-oss

On Fri, Jul 11, 2008 at 02:04:47PM +1000, Dave Chinner wrote:
>  		xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
>  
> -		/* If I'm the only one writing to this iclog, sync it to disk */
> -		if (atomic_read(&iclog->ic_refcnt) == 1) {
> +		/*
> +		 * If I'm the only one writing to this iclog, sync it to disk.
> +		 * We need to do an atomic compare and decrement here to avoid
> +		 * racing with concurrent atomic_dec_and_lock() calls in
> +		 * xlog_state_release_iclog() when there is more than one
> +		 * reference to the iclog.
> +		 */
> +		if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
>  			spin_unlock(&log->l_icloglock);
> -			if ((error = xlog_state_release_iclog(log, iclog)))
> +			error = xlog_state_release_iclog(log, iclog);
> +			if (error)
>  				return error;
>  		} else {
> -			atomic_dec(&iclog->ic_refcnt);
>  			spin_unlock(&log->l_icloglock);
>  		}
>  		goto restart;

Independent ofwether it actually fixes the bug (which I think it will)
this looks good.  Doing anything with the return value from atomic_read
except for printing it is most likely bogus, and this one clearly is.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:13         ` Timothy Shimmin
@ 2008-07-11  4:18           ` Eric Sandeen
  2008-07-11  4:26             ` Eric Sandeen
  2008-07-11  4:27             ` Dave Chinner
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  4:18 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: markgw, xfs-oss

Timothy Shimmin wrote:
> Eric Sandeen wrote:
>> Dave Chinner wrote:
>>
>>> Patch for you to test, Eric, below.
>>>
>>> (I haven't been able to test it because UML panics on boot right now.
>>> I need to hoover in the unit-at-atime build patch that Jeff Dike
>>> posted this morning)
>>>
>>> Cheers,
>>>
>>> Dave.
>> Dave, you frighten me.
>>
>> Testing now, will let it run a while.
>>
> Excellent.
> 
> And it happened pretty reliably before, right?

It did, yep, though sometimes took a while.

But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
written)

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:18           ` Eric Sandeen
@ 2008-07-11  4:26             ` Eric Sandeen
  2008-07-11  4:27             ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Sandeen @ 2008-07-11  4:26 UTC (permalink / raw)
  To: Timothy Shimmin; +Cc: markgw, xfs-oss

Eric Sandeen wrote:

> But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
> written)

ok testing a fixed up version ...

will let you know in my AM if it survived.

-Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:18           ` Eric Sandeen
  2008-07-11  4:26             ` Eric Sandeen
@ 2008-07-11  4:27             ` Dave Chinner
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2008-07-11  4:27 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Timothy Shimmin, markgw, xfs-oss

On Thu, Jul 10, 2008 at 11:18:31PM -0500, Eric Sandeen wrote:
> Timothy Shimmin wrote:
> > Eric Sandeen wrote:
> >> Dave Chinner wrote:
> >>
> >>> Patch for you to test, Eric, below.
> >>>
> >>> (I haven't been able to test it because UML panics on boot right now.
> >>> I need to hoover in the unit-at-atime build patch that Jeff Dike
> >>> posted this morning)
> >>>
> >>> Cheers,
> >>>
> >>> Dave.
> >> Dave, you frighten me.
> >>
> >> Testing now, will let it run a while.
> >>
> > Excellent.
> > 
> > And it happened pretty reliably before, right?
> 
> It did, yep, though sometimes took a while.
> 
> But, w/ the patch it actually deadlocked really quickly (< 2000 inodes
> written)

Yeah, inverted the if/else logic, I think. Like I
said, untested.

I'll send out an updated patch as soon as I get uml running
again.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: deadlocked xfs
  2008-07-11  4:04     ` Dave Chinner
  2008-07-11  4:10       ` Eric Sandeen
  2008-07-11  4:17       ` Christoph Hellwig
@ 2008-07-11  4:44       ` Timothy Shimmin
  2 siblings, 0 replies; 17+ messages in thread
From: Timothy Shimmin @ 2008-07-11  4:44 UTC (permalink / raw)
  To: Eric Sandeen, markgw, xfs-oss; +Cc: Timothy Shimmin

Dave Chinner wrote:
> On Fri, Jul 11, 2008 at 01:21:39PM +1000, Timothy Shimmin wrote:
>>> It made this change which moved the refcount 'decrement and test' operation
>>> outside the l_icloglock spinlock and therefore can race. 
> 
> Just FYI, this is not a 'decrement and test' operation - it's
> an 'decrement and return locked if zero' operation. Best to
>  explain it is this comment in lib/dec_and_lock.c:
> 
Fine, although I didn't write the above :)

> /*
>  * This is an implementation of the notion of "decrement a
>  * reference count, and return locked if it decremented to zero".
>  *
>  * NOTE NOTE NOTE! This is _not_ equivalent to
>  *
>  *      if (atomic_dec_and_test(&atomic)) {
>  *              spin_lock(&lock);
>  *              return 1;
>  *      }
>  *      return 0;
>  *
>  * because the spin-lock and the decrement must be
>  * "atomic".
>  */
> 
> 
>>> 	spin_lock(&log->l_icloglock);
>>> 	atomic_inc(&iclog->ic_refcnt);	/* prevents sync */
>>>
>>> 	...
>>>
>>> 	if (iclog->ic_size - iclog->ic_offset < 2*sizeof(xlog_op_header_t)) {
>>> 		xlog_state_switch_iclogs(log, iclog, iclog->ic_size);
>>>
>>> 		/* If I'm the only one writing to this iclog, sync it to disk */
>>> 		if (atomic_read(&iclog->ic_refcnt) == 1) {
>>> 			spin_unlock(&log->l_icloglock);
>>> 			if ((error = xlog_state_release_iclog(log, iclog)))
>>> 				return error;
>>> 		} else {
>>> 			atomic_dec(&iclog->ic_refcnt);
>>> 			spin_unlock(&log->l_icloglock);
>>> 		}
>>> 		goto restart;
>>> 	}
> 
> Ok, so all the increments of ic_refcnt are done under l_icloglock.
> That means through this section of code we can not ever have teh reference
> count increase. It can decrease, but never to zero because that
> requires holding the l_icloglock.
> 
>>> Previously xlog_state_release_iclog() would wait for the spinlock to be
>>> released above but will now do the atomic_dec_and_lock() while the above
>>> code is executing.  The above atomic_read() will return 2, the
>>> atomic_dec_and_lock() in xlog_state_release_iclog() will return false and
>>> return without syncing the iclog and then the above code will decrement the
>>> atomic counter.  The iclog never gets out of WANT_SYNC state and everything
>>> gets stuck behind it.
> 
> So what that means is that need the semantics here to be an
> atomic 'decrement unless the count is 1' i.e.:
> 
> 	if (atomic_add_unless(&iclog->ic_refcnt, -1, 1)) {
> 		spin_unlock(&log->l_icloglock);
> 		if ((error = xlog_state_release_iclog(log, iclog)))
> 			return error;
> 	} else {
> 			spin_unlock(&log->l_icloglock);
> 	}
> 
Gee, they have an atomic_ almost everything :-)
When I was talking to Lachlan he was saying it would need to make the
atomic_read and the dec somehow atomic (close the window)
- didn't realise such a func existed. 
(And yeah, I hear you saying well one needs to look...
he must have gotten on to one of the corruption fixes..dunno...)

Just saw email, yeah didn't pick up on the reverse logic there -
atomic decrement unless equal to one, yep (reverse if/else logic).

> This means that the compare + decrement becomes atomic, thereby
> closing the race condition where another unlocked thread can
> decrement the reference count in between the compare and decrement
> here.
> 
Yep.

--Tim

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-07-11  4:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11  2:33 deadlocked xfs Eric Sandeen
2008-07-11  2:48 ` Mark Goodwin
2008-07-11  2:56   ` Eric Sandeen
2008-07-11  3:21   ` Timothy Shimmin
2008-07-11  4:04     ` Dave Chinner
2008-07-11  4:10       ` Eric Sandeen
2008-07-11  4:13         ` Timothy Shimmin
2008-07-11  4:18           ` Eric Sandeen
2008-07-11  4:26             ` Eric Sandeen
2008-07-11  4:27             ` Dave Chinner
2008-07-11  4:17       ` Christoph Hellwig
2008-07-11  4:44       ` Timothy Shimmin
2008-07-11  3:22   ` Dave Chinner
2008-07-11  3:50     ` Mark Goodwin
2008-07-11  4:02       ` Eric Sandeen
2008-07-11  4:05       ` Dave Chinner
2008-07-11  4:08       ` Timothy Shimmin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox