* [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push
@ 2013-01-29 20:42 Brian Foster
2013-01-29 20:42 ` [PATCH RFC 1/2] xfs: conditionally force log on trylock failure of pinned/stale buf Brian Foster
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Brian Foster @ 2013-01-29 20:42 UTC (permalink / raw)
To: xfs
Hi all,
This patchset fixes a spinlock recursion we've reproduced initially on RHEL
kernels[1]. The problem is that we issue an xfs_log_force() via
xfs_buf_trylock() with the xa_lock held and ultimately drive down into
xlog_assign_tail_lsn(), which attempts to reacquire xa_lock[2].
Note that this lockup was difficult to reproduce and I was not able to
reproduce on an upstream kernel without a hack to comment out the pinned buf
check in xfs_buf_item_push() (presumably because the log force itself only
happens when the buf is pinned, so the window here is tight).
This patchset is what I'm testing to avoid the lockup, but I'm posting this RFC
to get some early thoughts:
- Patch 1 - Creates a flag to conditionally force the log in xfs_buf_trylock().
The alternative I considered is to pull out the check and log force and
sprinkle that code amongst the trylock callers.
- Patch 2 - Utilizes the flag created in patch 1 and duplicates the log force
in xfs_buf_item_push() after dropping xa_lock.
The change in patch 2 makes me wonder how important the immediate flush is in
the context of xfsaild_push(), where we already pend up a flush if the item is
pinned. IOWs, I wonder if replacing what I have now with something like the
following would be acceptable and cleaner:
if (!__xfs_buf_trylock(bp, false)) {
if (xfs_buf_ispinned(bp)
return XFS_ITEM_PINNED;
return XFS_ITEM_LOCKED;
}
Thoughts appreciated.
Brian
[1] - http://bugzilla.redhat.com/show_bug.cgi?id=896224
[2] - stacktrace:
BUG: spinlock recursion on CPU#5, xfsaild/dm-3/2690
Pid: 2690, comm: xfsaild/dm-3 Not tainted 3.8.0-rc1+ #46
Call Trace:
[<ffffffff8163211c>] spin_dump+0x8a/0x8f
[<ffffffff81632142>] spin_bug+0x21/0x26
[<ffffffff812f66a1>] do_raw_spin_lock+0x101/0x150
[<ffffffff816378ce>] _raw_spin_lock+0xe/0x10
[<ffffffffa0522c85>] xlog_assign_tail_lsn+0x25/0x50 [xfs]
[<ffffffffa0523286>] xlog_state_release_iclog+0x86/0xd0 [xfs]
[<ffffffffa0523c89>] xlog_write+0x569/0x710 [xfs]
[<ffffffffa052555c>] xlog_cil_push+0x29c/0x3c0 [xfs]
[<ffffffffa04cbfe2>] ? xfs_buf_get_map+0xf2/0x1b0 [xfs]
[<ffffffffa0525d17>] xlog_cil_force_lsn+0x157/0x160 [xfs]
[<ffffffffa04cced1>] ? xfs_buf_read_map+0x31/0x130 [xfs]
[<ffffffffa0529e99>] ? xfs_trans_read_buf_map+0x279/0x4b0 [xfs]
[<ffffffff8117e45d>] ? __kmalloc+0x15d/0x1b0
[<ffffffffa0523f7d>] _xfs_log_force+0x6d/0x290 [xfs]
[<ffffffffa051450f>] ? xfs_iflush_cluster+0x25f/0x3d0 [xfs]
[<ffffffffa05241d9>] xfs_log_force+0x39/0xc0 [xfs]
[<ffffffffa04cbaa0>] xfs_buf_trylock+0xd0/0xe0 [xfs]
[<ffffffffa0526369>] xfs_buf_item_push+0x39/0xd0 [xfs]
[<ffffffffa0527bdf>] ? xfs_inode_item_push+0x8f/0x140 [xfs]
[<ffffffffa0528c01>] xfsaild+0x2e1/0x6e0 [xfs]
[<ffffffff8108aa08>] ? __wake_up_common+0x58/0x90
[<ffffffffa0528920>] ? xfs_trans_ail_cursor_first+0xc0/0xc0 [xfs]
[<ffffffff81081708>] kthread+0xd8/0xe0
[<ffffffff81081630>] ? flush_kthread_work+0x150/0x150
[<ffffffff816400ac>] ret_from_fork+0x7c/0xb0
[<ffffffff81081630>] ? flush_kthread_work+0x150/0x150
Brian Foster (2):
xfs: conditionally force log on trylock failure of pinned/stale buf
xfs: drop xa_lock around log force in xfs_buf_item push
fs/xfs/xfs_buf.c | 8 +++++---
fs/xfs/xfs_buf.h | 3 ++-
fs/xfs/xfs_buf_item.c | 10 +++++++++-
3 files changed, 16 insertions(+), 5 deletions(-)
--
1.7.7.6
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH RFC 1/2] xfs: conditionally force log on trylock failure of pinned/stale buf 2013-01-29 20:42 [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Brian Foster @ 2013-01-29 20:42 ` Brian Foster 2013-01-29 20:42 ` [PATCH RFC 2/2] xfs: drop xa_lock around log force in xfs_buf_item push Brian Foster ` (2 subsequent siblings) 3 siblings, 0 replies; 10+ messages in thread From: Brian Foster @ 2013-01-29 20:42 UTC (permalink / raw) To: xfs xfs_force_log() is not safe from all contexts. Add a flag parameter to xfs_buf_trylock() to specify when the force is appropriate and create a macro to preserve current behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_buf.c | 8 +++++--- fs/xfs/xfs_buf.h | 3 ++- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index fbbb9eb..2e04a44 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -943,15 +943,17 @@ xfs_buf_rele( * to push on stale inode buffers. */ int -xfs_buf_trylock( - struct xfs_buf *bp) +__xfs_buf_trylock( + struct xfs_buf *bp, + bool force_log) { int locked; locked = down_trylock(&bp->b_sema) == 0; if (locked) XB_SET_OWNER(bp); - else if (atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) + else if (force_log && + atomic_read(&bp->b_pin_count) && (bp->b_flags & XBF_STALE)) xfs_log_force(bp->b_target->bt_mount, 0); trace_xfs_buf_trylock(bp, _RET_IP_); diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 433a12e..667b723 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -258,7 +258,8 @@ extern void xfs_buf_free(xfs_buf_t *); extern void xfs_buf_rele(xfs_buf_t *); /* Locking and Unlocking Buffers */ -extern int xfs_buf_trylock(xfs_buf_t *); +#define xfs_buf_trylock(bp) __xfs_buf_trylock(bp, true) +extern int __xfs_buf_trylock(xfs_buf_t *, bool); extern void xfs_buf_lock(xfs_buf_t *); extern void xfs_buf_unlock(xfs_buf_t *); #define xfs_buf_islocked(bp) \ -- 1.7.7.6 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH RFC 2/2] xfs: drop xa_lock around log force in xfs_buf_item push 2013-01-29 20:42 [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Brian Foster 2013-01-29 20:42 ` [PATCH RFC 1/2] xfs: conditionally force log on trylock failure of pinned/stale buf Brian Foster @ 2013-01-29 20:42 ` Brian Foster 2013-01-29 22:41 ` [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Mark Tinguely 2013-01-30 6:05 ` Dave Chinner 3 siblings, 0 replies; 10+ messages in thread From: Brian Foster @ 2013-01-29 20:42 UTC (permalink / raw) To: xfs If the trylock fails and the buffer is pinned and stale, the resulting xfs_log_force() can lead to lock recursion on the ailp xa_lock. Call __xfs_buf_trylock() to attempt the buf lock with xa_lock held, but to avoid the log force. We subsequently pull the check and log force up into xfs_buf_item_push() where we can drop and reacquire xa_lock appropriately. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_buf_item.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 9c4c050..7b6e98a 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -469,8 +469,16 @@ xfs_buf_item_push( if (xfs_buf_ispinned(bp)) return XFS_ITEM_PINNED; - if (!xfs_buf_trylock(bp)) + + if (!__xfs_buf_trylock(bp, false)) { + if (atomic_read(&bp->b_pin_count) && + (bp->b_flags & XBF_STALE)) { + spin_unlock(&lip->li_ailp->xa_lock); + xfs_log_force(bp->b_target->bt_mount, 0); + spin_lock(&lip->li_ailp->xa_lock); + } return XFS_ITEM_LOCKED; + } ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); -- 1.7.7.6 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-01-29 20:42 [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Brian Foster 2013-01-29 20:42 ` [PATCH RFC 1/2] xfs: conditionally force log on trylock failure of pinned/stale buf Brian Foster 2013-01-29 20:42 ` [PATCH RFC 2/2] xfs: drop xa_lock around log force in xfs_buf_item push Brian Foster @ 2013-01-29 22:41 ` Mark Tinguely 2013-01-30 6:05 ` Dave Chinner 3 siblings, 0 replies; 10+ messages in thread From: Mark Tinguely @ 2013-01-29 22:41 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On 01/29/13 14:42, Brian Foster wrote: > Hi all, > > This patchset fixes a spinlock recursion we've reproduced initially on RHEL > kernels[1]. The problem is that we issue an xfs_log_force() via > xfs_buf_trylock() with the xa_lock held and ultimately drive down into > xlog_assign_tail_lsn(), which attempts to reacquire xa_lock[2]. > > Note that this lockup was difficult to reproduce and I was not able to > reproduce on an upstream kernel without a hack to comment out the pinned buf > check in xfs_buf_item_push() (presumably because the log force itself only > happens when the buf is pinned, so the window here is tight). Interesting. So this a result of a change of the b_pin_count between xfs_buf_ispinned() and xfs_buf_trylock() tests in xfs_buf_item_push()? > > This patchset is what I'm testing to avoid the lockup, but I'm posting this RFC > to get some early thoughts: > > - Patch 1 - Creates a flag to conditionally force the log in xfs_buf_trylock(). > The alternative I considered is to pull out the check and log force and > sprinkle that code amongst the trylock callers. > - Patch 2 - Utilizes the flag created in patch 1 and duplicates the log force > in xfs_buf_item_push() after dropping xa_lock. > > The change in patch 2 makes me wonder how important the immediate flush is in > the context of xfsaild_push(), where we already pend up a flush if the item is > pinned. IOWs, I wonder if replacing what I have now with something like the > following would be acceptable and cleaner: > > if (!__xfs_buf_trylock(bp, false)) { > if (xfs_buf_ispinned(bp) > return XFS_ITEM_PINNED; > return XFS_ITEM_LOCKED; This looks better to me that patch 2. > } > > Thoughts appreciated. > > Brian > > [1] - http://bugzilla.redhat.com/show_bug.cgi?id=896224 > [2] - stacktrace: > > BUG: spinlock recursion on CPU#5, xfsaild/dm-3/2690 > Pid: 2690, comm: xfsaild/dm-3 Not tainted 3.8.0-rc1+ #46 > Call Trace: > [<ffffffff8163211c>] spin_dump+0x8a/0x8f > [<ffffffff81632142>] spin_bug+0x21/0x26 > [<ffffffff812f66a1>] do_raw_spin_lock+0x101/0x150 > [<ffffffff816378ce>] _raw_spin_lock+0xe/0x10 > [<ffffffffa0522c85>] xlog_assign_tail_lsn+0x25/0x50 [xfs] > [<ffffffffa0523286>] xlog_state_release_iclog+0x86/0xd0 [xfs] > [<ffffffffa0523c89>] xlog_write+0x569/0x710 [xfs] > [<ffffffffa052555c>] xlog_cil_push+0x29c/0x3c0 [xfs] > [<ffffffffa04cbfe2>] ? xfs_buf_get_map+0xf2/0x1b0 [xfs] > [<ffffffffa0525d17>] xlog_cil_force_lsn+0x157/0x160 [xfs] > [<ffffffffa04cced1>] ? xfs_buf_read_map+0x31/0x130 [xfs] > [<ffffffffa0529e99>] ? xfs_trans_read_buf_map+0x279/0x4b0 [xfs] > [<ffffffff8117e45d>] ? __kmalloc+0x15d/0x1b0 > [<ffffffffa0523f7d>] _xfs_log_force+0x6d/0x290 [xfs] > [<ffffffffa051450f>] ? xfs_iflush_cluster+0x25f/0x3d0 [xfs] > [<ffffffffa05241d9>] xfs_log_force+0x39/0xc0 [xfs] > [<ffffffffa04cbaa0>] xfs_buf_trylock+0xd0/0xe0 [xfs] > [<ffffffffa0526369>] xfs_buf_item_push+0x39/0xd0 [xfs] > [<ffffffffa0527bdf>] ? xfs_inode_item_push+0x8f/0x140 [xfs] > [<ffffffffa0528c01>] xfsaild+0x2e1/0x6e0 [xfs] > [<ffffffff8108aa08>] ? __wake_up_common+0x58/0x90 > [<ffffffffa0528920>] ? xfs_trans_ail_cursor_first+0xc0/0xc0 [xfs] > [<ffffffff81081708>] kthread+0xd8/0xe0 > [<ffffffff81081630>] ? flush_kthread_work+0x150/0x150 > [<ffffffff816400ac>] ret_from_fork+0x7c/0xb0 > [<ffffffff81081630>] ? flush_kthread_work+0x150/0x150 > > Brian Foster (2): > xfs: conditionally force log on trylock failure of pinned/stale buf > xfs: drop xa_lock around log force in xfs_buf_item push > > fs/xfs/xfs_buf.c | 8 +++++--- > fs/xfs/xfs_buf.h | 3 ++- > fs/xfs/xfs_buf_item.c | 10 +++++++++- > 3 files changed, 16 insertions(+), 5 deletions(-) > --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-01-29 20:42 [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Brian Foster ` (2 preceding siblings ...) 2013-01-29 22:41 ` [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Mark Tinguely @ 2013-01-30 6:05 ` Dave Chinner [not found] ` <5109291E.6090303@sgi.com> 3 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2013-01-30 6:05 UTC (permalink / raw) To: Brian Foster; +Cc: xfs On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: > Hi all, > > This patchset fixes a spinlock recursion we've reproduced initially on RHEL > kernels[1]. The problem is that we issue an xfs_log_force() via > xfs_buf_trylock() with the xa_lock held and ultimately drive down into > xlog_assign_tail_lsn(), which attempts to reacquire xa_lock[2]. > > Note that this lockup was difficult to reproduce and I was not able to > reproduce on an upstream kernel without a hack to comment out the pinned buf > check in xfs_buf_item_push() (presumably because the log force itself only > happens when the buf is pinned, so the window here is tight). > > This patchset is what I'm testing to avoid the lockup, but I'm posting this RFC > to get some early thoughts: > > - Patch 1 - Creates a flag to conditionally force the log in xfs_buf_trylock(). > The alternative I considered is to pull out the check and log force and > sprinkle that code amongst the trylock callers. > - Patch 2 - Utilizes the flag created in patch 1 and duplicates the log force > in xfs_buf_item_push() after dropping xa_lock. > > The change in patch 2 makes me wonder how important the immediate flush is in > the context of xfsaild_push(), where we already pend up a flush if the item is > pinned. IOWs, I wonder if replacing what I have now with something like the > following would be acceptable and cleaner: > > if (!__xfs_buf_trylock(bp, false)) { > if (xfs_buf_ispinned(bp) > return XFS_ITEM_PINNED; > return XFS_ITEM_LOCKED; > } > > Thoughts appreciated. I haven't had time to really think about this, but I'll say up front that I don't really like the fix. The log force in xfs_buf_trylock() was added here: commit 90810b9e82a36c3c57c1aeb8b2918b242a130b26 Author: Dave Chinner <dchinner@redhat.com> Date: Tue Nov 30 15:16:16 2010 +1100 xfs: push stale, pinned buffers on trylock failures As reported by Nick Piggin, XFS is suffering from long pauses under highly concurrent workloads when hosted on ramdisks. The problem is that an inode buffer is stuck in the pinned state in memory and as a result either the inode buffer or one of the inodes within the buffer is stopping the tail of the log from being moved forward. The system remains in this state until a periodic log force issued by xfssyncd causes the buffer to be unpinned. The main problem is that these are stale buffers, and are hence held locked until the transaction/checkpoint that marked them state has been committed to disk. When the filesystem gets into this state, only the xfssyncd can cause the async transactions to be committed to disk and hence unpin the inode buffer. This problem was encountered when scaling the busy extent list, but only the blocking lock interface was fixed to solve the problem. Extend the same fix to the buffer trylock operations - if we fail to lock a pinned, stale buffer, then force the log immediately so that when the next attempt to lock it comes around, it will have been unpinned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> And the initial commit that introduced this dependency was this: commit ed3b4d6cdc81e8feefdbfa3c584614be301b6d39 Author: Dave Chinner <david@fromorbit.com> Date: Fri May 21 12:07:08 2010 +1000 xfs: Improve scalability of busy extent tracking .... The only problem with this approach is that when a metadata buffer is marked stale (e.g. a directory block is removed), then buffer remains pinned and locked until the log goes to disk. The issue here is that if that stale buffer is reallocated in a subsequent transaction, the attempt to lock that buffer in the transaction will hang waiting the log to go to disk to unlock and unpin the buffer. Hence if someone tries to lock a pinned, stale, locked buffer we need to push on the log to get it unlocked ASAP. Effectively we are trading off a guaranteed log force for a much less common trigger for log force to occur. .... So essentially what is happening here is that we are trying to lock a stale buffer in the AIL to flush it. Well, we can't flush it from the AIL, and indeed the next line of code is this: if (!xfs_buf_trylock(bp)) return XFS_ITEM_LOCKED; >>>>> ASSERT(!(bip->bli_flags & XFS_BLI_STALE)); The only reason this ASSERT is not firing is that we are failing to lock stale buffers. Hence we are relying on the failed lock to force the log, instead of detecting that we need to force the log after we drop the AIL lock and letting the caller handle it. So, wouldn't a better solution be to do something like simply like: + if (bp->b_flags & XBF_STALE) + return XFS_ITEM_PINNED; + if (!xfs_buf_trylock(bp)) return XFS_ITEM_LOCKED; Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <5109291E.6090303@sgi.com>]
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push [not found] ` <5109291E.6090303@sgi.com> @ 2013-01-30 16:02 ` Brian Foster 2013-01-30 21:59 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Brian Foster @ 2013-01-30 16:02 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs (added Dave and the list back on CC) On 01/30/2013 09:07 AM, Mark Tinguely wrote: > On 01/30/13 00:05, Dave Chinner wrote: >> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: ... > >> So essentially what is happening here is that we are trying to lock >> a stale buffer in the AIL to flush it. Well, we can't flush it from >> the AIL, and indeed the next line of code is this: >> >> if (!xfs_buf_trylock(bp)) >> return XFS_ITEM_LOCKED; >> >>>>>>> ASSERT(!(bip->bli_flags& XFS_BLI_STALE)); >> >> The only reason this ASSERT is not firing is that we are failing to >> lock stale buffers. Hence we are relying on the failed lock to force >> the log, instead of detecting that we need to force the log after we >> drop the AIL lock and letting the caller handle it. >> >> So, wouldn't a better solution be to do something like simply like: >> >> + if (bp->b_flags& XBF_STALE) >> + return XFS_ITEM_PINNED; >> + >> if (!xfs_buf_trylock(bp)) >> return XFS_ITEM_LOCKED; ... > Thanks guys. This certainly looks nicer than messing with the lock wrapper, but is it susceptible to the same problem? In other words, does this fix the problem or just tighten the window? I'm going to go back to my original reproduction case and enable some select tracepoints to try and get a specific sequence of events, but given the code as it is, the problem seems to be that the buffer goes from !pinned to pinned between the time we actually check for pinned and try the buf lock. So if the buf lock covers the pinned state (e.g., buffer gets locked, added to a transaction, the transaction gets committed and pins and unlocks the buffer, IIUC) and the stale state (buf gets locked, added to a new transaction and inval'd before the original transaction was written ?), but we don't hold the buf lock in xfs_buf_item_push(), how can we guarantee the state of either doesn't change between the time we check the flags and the time the lock fails? > Makes sense. It would prevent the lock recursion. The more that I think > about, we do not want to release xa_lock during an AIL scan. > FWIW, the other log item abstractions appear to use this model (e.g., xfs_inode_item_push()), where it appears safe to drop xa_lock once the actual object lock/ref is acquired and reacquire xa_lock before returning. It looks like this behavior was introduced in: 43ff2122 xfs: on-stack delayed write buffer lists Brian > We would still want to see if the buffer is re-pinned (and not STALE) to > the AIL. > > --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-01-30 16:02 ` Brian Foster @ 2013-01-30 21:59 ` Dave Chinner 2013-01-31 17:45 ` Brian Foster 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2013-01-30 21:59 UTC (permalink / raw) To: Brian Foster; +Cc: Mark Tinguely, xfs On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote: > (added Dave and the list back on CC) > > On 01/30/2013 09:07 AM, Mark Tinguely wrote: > > On 01/30/13 00:05, Dave Chinner wrote: > >> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: > ... > > > >> So essentially what is happening here is that we are trying to lock > >> a stale buffer in the AIL to flush it. Well, we can't flush it from > >> the AIL, and indeed the next line of code is this: > >> > >> if (!xfs_buf_trylock(bp)) > >> return XFS_ITEM_LOCKED; > >> > >>>>>>> ASSERT(!(bip->bli_flags& XFS_BLI_STALE)); > >> > >> The only reason this ASSERT is not firing is that we are failing to > >> lock stale buffers. Hence we are relying on the failed lock to force > >> the log, instead of detecting that we need to force the log after we > >> drop the AIL lock and letting the caller handle it. > >> > >> So, wouldn't a better solution be to do something like simply like: > >> > >> + if (bp->b_flags& XBF_STALE) > >> + return XFS_ITEM_PINNED; > >> + > >> if (!xfs_buf_trylock(bp)) > >> return XFS_ITEM_LOCKED; > ... > > > > Thanks guys. This certainly looks nicer than messing with the lock > wrapper, but is it susceptible to the same problem? In other words, does > this fix the problem or just tighten the window? That's what I need to think about more - the only difference here is that we are checking the flag before the down_trylock() instead of after.... > So if the buf lock covers the pinned state (e.g., buffer gets locked, > added to a transaction, the transaction gets committed and pins and > unlocks the buffer, IIUC) and the stale state (buf gets locked, added to > a new transaction and inval'd before the original transaction was > written ?), but we don't hold the buf lock in xfs_buf_item_push(), how > can we guarantee the state of either doesn't change between the time we > check the flags and the time the lock fails? ... but the order of them being set and checked may be significant and hence checking the stale flag first might be sufficient to avoid the pin count race and hence the log force. Hence this might just need a pair of memory barriers - one here and one in xfs_buf_stale() - to ensure that we always see the XBF_STALE flag without needing to lock the buffer first. > > Makes sense. It would prevent the lock recursion. The more that I think > > about, we do not want to release xa_lock during an AIL scan. > > > > FWIW, the other log item abstractions appear to use this model (e.g., > xfs_inode_item_push()), where it appears safe to drop xa_lock once the > actual object lock/ref is acquired and reacquire xa_lock before > returning. Right. We can't hold the AIL while doing blocking lock operations because the AIL lock is an innermost lock. That's fundamentally the cause of this problem - the trylock is doing more work that we can allow in the context that it is being called in... We also do not hold the AIL lock for the entire traversal as long holds of the AIL lock result in serialisation across the entire transaction subsystem. A successful IOP_PUSH() typically drops and reacquires the AIL lock, but a failure doesn't. Hence we limit the number of failed pushes under a single traversal so that we don't spend a long time holding the lock doing no work and instead back off and try again later.... Cheers, Dave. -- Dave Chinner david@fromorbit.come _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-01-30 21:59 ` Dave Chinner @ 2013-01-31 17:45 ` Brian Foster 2013-02-01 1:26 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Brian Foster @ 2013-01-31 17:45 UTC (permalink / raw) To: Mark Tinguely, xfs, david On 01/30/2013 04:59 PM, Dave Chinner wrote: > On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote: >> (added Dave and the list back on CC) >> >> On 01/30/2013 09:07 AM, Mark Tinguely wrote: >>> On 01/30/13 00:05, Dave Chinner wrote: >>>> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: ... >> >> Thanks guys. This certainly looks nicer than messing with the lock >> wrapper, but is it susceptible to the same problem? In other words, does >> this fix the problem or just tighten the window? > > That's what I need to think about more - the only difference here is > that we are checking the flag before the down_trylock() instead of > after.... > >> So if the buf lock covers the pinned state (e.g., buffer gets locked, >> added to a transaction, the transaction gets committed and pins and >> unlocks the buffer, IIUC) and the stale state (buf gets locked, added to >> a new transaction and inval'd before the original transaction was >> written ?), but we don't hold the buf lock in xfs_buf_item_push(), how >> can we guarantee the state of either doesn't change between the time we >> check the flags and the time the lock fails? > > ... but the order of them being set and checked may be significant > and hence checking the stale flag first might be sufficient to avoid > the pin count race and hence the log force. Hence this might just > need a pair of memory barriers - one here and one in xfs_buf_stale() > - to ensure that we always see the XBF_STALE flag without needing to > lock the buffer first. > I _think_ I follow your train of thought. If we're racing on the pin check, presumably the lock holder is committing the transaction and we should either already see the buffer being stale, being pinned or we should get the lock (assuming the order is: stale, pinned, unlocked). That aside for a moment, here's some specific tracepoint (some of which I've hacked in) data for when the recursion occurs: xfsalloc/5-3220 [005] 4304.223158: xfs_buf_item_stale: dev 253:3 bno 0x847feb28 len 0x1000 hold 2 pincount 0 lock 0 flags |MAPPED|ASYNC|DONE|STALE|PAGES recur 0 refcount 1 bliflags |DIRTY|STALE lidesc 0xffff88067ada5610 liflags IN_AIL smallfile_cli.p-3702 [007] 4304.223209: xfs_buf_item_format_stale: dev 253:3 bno 0x847feb28 len 0x1000 hold 2 pincount 0 lock 0 flags |MAPPED|ASYNC|DONE|STALE|PAGES recur 0 refcount 1 bliflags |STALE lidesc 0xffff88067ada5610 liflags IN_AIL ... xfsaild/dm-3-3695 [002] 4304.223217: xfs_buf_item_trylock: dev 253:3 bno 0x847feb28 len 0x1000 hold 2 pincount 1 lock 0 flags |MAPPED|ASYNC|DONE|STALE|PAGES recur 0 refcount 1 bliflags |STALE lidesc 0xffff88067ada5610 liflags IN_AIL smallfile_cli.p-3702 [007] 4304.223217: xfs_buf_item_pin: dev 253:3 bno 0x847feb28 len 0x1000 hold 2 pincount 0 lock 0 flags |MAPPED|ASYNC|DONE|STALE|PAGES recur 0 refcount 1 bliflags |STALE lidesc 0xffff88067ada5610 liflags IN_AIL ... xfsaild/dm-3-3695 [002] 4304.223219: xfs_buf_cond_lock_log_force: dev 253:3 bno 0x847feb28 len 0x1000 hold 2 pincount 1 lock 0 flags MAPPED|ASYNC|DONE|STALE|PAGES caller xfs_buf_item_trylock [NOTES: - I moved xfs_buf_item_trylock up to after the pinned check but before the trylock. - xfs_buf_item_stale() is in xfs_trans_binval(). This was an oversight, as I was looking at bli_flags, but still illustrates the sequence.] ... so as expected, the buffer is marked stale, we attempt the trylock, the buf is pinned, we run the log force and we're dead. >From the looks of the trace, I'd expect an additional stale check to eliminate the ability to reproduce this, but that doesn't necessarily make it correct of course. Regardless, I'm putting that to the test now and letting it run for a bit while we get this sorted out. I also need to stare at the code some more. My pending questions are: - Is it always reasonable to to assume/consider a stale buf as pinned in the context of xfsaild? - If we currently reproduce the following sequence: A xfsaild stale (!pinned) ==> trylock() pinned (!trylock && pinned && stale) ==> xfs_log_force() (boom) ... what prevents the following sequence from occurring sometime in the future or with some alternate high-level sequence of events? A xfsaild locked (!pinned && !stale) ==> trylock() pinned stale (!trylock && pinned && stale) ==> xfs_log_force() Granted the window is seriously tight, but is there some crazy sequence where we could race with the pin count and the stale state? E.g., I notice some error or forced shutdown sequences mark buffers as stale, etc. Brian ... > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-01-31 17:45 ` Brian Foster @ 2013-02-01 1:26 ` Dave Chinner 2013-02-01 16:01 ` Brian Foster 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2013-02-01 1:26 UTC (permalink / raw) To: Brian Foster; +Cc: Mark Tinguely, xfs On Thu, Jan 31, 2013 at 12:45:41PM -0500, Brian Foster wrote: > On 01/30/2013 04:59 PM, Dave Chinner wrote: > > On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote: > >> (added Dave and the list back on CC) > >> > >> On 01/30/2013 09:07 AM, Mark Tinguely wrote: > >>> On 01/30/13 00:05, Dave Chinner wrote: > >>>> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: > ... > >> > >> Thanks guys. This certainly looks nicer than messing with the lock > >> wrapper, but is it susceptible to the same problem? In other words, does > >> this fix the problem or just tighten the window? > > > > That's what I need to think about more - the only difference here is > > that we are checking the flag before the down_trylock() instead of > > after.... > > > >> So if the buf lock covers the pinned state (e.g., buffer gets locked, > >> added to a transaction, the transaction gets committed and pins and > >> unlocks the buffer, IIUC) and the stale state (buf gets locked, added to > >> a new transaction and inval'd before the original transaction was > >> written ?), but we don't hold the buf lock in xfs_buf_item_push(), how > >> can we guarantee the state of either doesn't change between the time we > >> check the flags and the time the lock fails? > > > > ... but the order of them being set and checked may be significant > > and hence checking the stale flag first might be sufficient to avoid > > the pin count race and hence the log force. Hence this might just > > need a pair of memory barriers - one here and one in xfs_buf_stale() > > - to ensure that we always see the XBF_STALE flag without needing to > > lock the buffer first. > > > > I _think_ I follow your train of thought. If we're racing on the pin > check, presumably the lock holder is committing the transaction and we > should either already see the buffer being stale, being pinned or we > should get the lock (assuming the order is: stale, pinned, unlocked). Right, that is the order in which operations occur. Marking a buffer stale happens in the transaction body, the pin and subsequent unlock occur some time later during the transaction commit. > That aside for a moment, here's some specific tracepoint (some of which > I've hacked in) data for when the recursion occurs: .... > ... so as expected, the buffer is marked stale, we attempt the trylock, > the buf is pinned, we run the log force and we're dead. *nod* > From the looks of the trace, I'd expect an additional stale check to > eliminate the ability to reproduce this, but that doesn't necessarily > make it correct of course. Regardless, I'm putting that to the test now > and letting it run for a bit while we get this sorted out. > > I also need to stare at the code some more. My pending questions are: > > - Is it always reasonable to to assume/consider a stale buf as pinned in > the context of xfsaild? A stale buffer in the AIL is either pinned (transaction committed) or on it's way to being pinned (transaction that marked it stale is currently in progress). When the transaction commit completes, the IOP_UNPIN() call will remove the item from the AIL if it is the last reference to the item. Hence we generally do not get unpinned, stale buffers in the AIL because the last reference to the stale buffer is usually the transaction that marked it stale.... > - If we currently reproduce the following sequence: > > A xfsaild > stale > (!pinned) ==> trylock() > pinned > (!trylock && pinned && stale) > ==> xfs_log_force() (boom) > > ... what prevents the following sequence from occurring sometime in the > future or with some alternate high-level sequence of events? > > A xfsaild > locked > (!pinned && !stale) ==> trylock() > pinned > stale > (!trylock && pinned && stale) > ==> xfs_log_force() You can't get that order and trigger the race. If the item is pinned before it is marked stale, that means we will always see pin count because it was pinned by a previous transaction commit. Pinning occurs on the commit of the first transaction commit that inserts the item into the CIL. It doesn't get unpinned until the CIL is checkpointed and inserted into the AIL. Hence the order of operations that marks an uncommitted buffer stale should always be lock->stale->pinned->unlock. However, if the buffer has been previously modified and is in the CIL, you'll see: lock->pinned->CIL insert->unlock.....->lock->stale->unlock If the buffer was modified a while back and the CIL is committed, the buffer will be in the AIL but not the CIL. If the buffer was modified a while back, and then again recently, it will be in both pinned in the CIL and the AIL. Neither of these cases can trigger this problem because the pinned check will fire reliably. Hence the specific case here is the buffer has been previously modified, the CIL committed so it's in the AIL, and we are marking the buffer stale as the first modification of the buffer after it was added to the AIL. IOWs, the order that is of concern is this while the item is in the AIL: lock->stale->pin->CIL insert->unlock So in terms of the xfsaild racing and causing problems, the only case it will occur in is this: Thread 1 xfsaild lock xfs_buf_item_push .... not pinned stale trylock <delay> commit pin if(pinned && stale) <splat> unlock So what I was asking is whether we can do checks in the order of: smb_mb(); if (stale) return if (pinned) return What I've just realised is that we really don't care if we race in xfs_buf_item_push(). The race that matters is in xfs_buf_trylock() and at that point it is too late to avoid it. So I think your original patch is on the right path but having the xfsaild handle the log force gets rid of most of the nasty cruft it had.... ie. we're going to have to tell xfs_buf_trylock() that we should not do a log force if the lock fails, and return XFS_ITEM_PINNED rather than XFS_ITEM_LOCKED in xfs_buf_item_push().... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push 2013-02-01 1:26 ` Dave Chinner @ 2013-02-01 16:01 ` Brian Foster 0 siblings, 0 replies; 10+ messages in thread From: Brian Foster @ 2013-02-01 16:01 UTC (permalink / raw) To: Mark Tinguely, xfs, david On 01/31/2013 08:26 PM, Dave Chinner wrote: > On Thu, Jan 31, 2013 at 12:45:41PM -0500, Brian Foster wrote: >> On 01/30/2013 04:59 PM, Dave Chinner wrote: >>> On Wed, Jan 30, 2013 at 11:02:43AM -0500, Brian Foster wrote: >>>> (added Dave and the list back on CC) >>>> >>>> On 01/30/2013 09:07 AM, Mark Tinguely wrote: >>>>> On 01/30/13 00:05, Dave Chinner wrote: >>>>>> On Tue, Jan 29, 2013 at 03:42:35PM -0500, Brian Foster wrote: ... >> ... what prevents the following sequence from occurring sometime in the >> future or with some alternate high-level sequence of events? >> >> A xfsaild >> locked >> (!pinned && !stale) ==> trylock() >> pinned >> stale >> (!trylock && pinned && stale) >> ==> xfs_log_force() > > You can't get that order and trigger the race. If the item is pinned > before it is marked stale, that means we will always see pin count > because it was pinned by a previous transaction commit. > Ok, so if we're seeing the pin count race, we have a buffer that is obviously in the ail and we're racing with another transaction commit on the buffer. This means the buf is either (!pinned && locked), (pinned && locked) or (pinned && !locked). The first two cases imply that if the buf is stale, it is marked stale by this transaction, or as you point out, by a previous transaction (because it would have required the lock). If we suppose it is possible for the buf to go from (!pinned && locked) to (pinned && !locked) after the trylock fails but before the log force check, what I'm not quite following is what would prevent some third thread (I guess my diagram was wrong to only include xfsaild and thread A) to squeeze in there and set the buf stale? For example, I was looking at xfs_buf_iodone_callbacks() and the forced shutdown check (is the bp always locked here?). > Pinning occurs on the commit of the first transaction commit that > inserts the item into the CIL. It doesn't get unpinned until the CIL > is checkpointed and inserted into the AIL. Hence the order of > operations that marks an uncommitted buffer stale should always be > lock->stale->pinned->unlock. > > However, if the buffer has been previously modified and is in the > CIL, you'll see: > > lock->pinned->CIL insert->unlock.....->lock->stale->unlock > > If the buffer was modified a while back and the CIL is committed, the > buffer will be in the AIL but not the CIL. If the buffer was > modified a while back, and then again recently, it will be in both > pinned in the CIL and the AIL. Neither of these cases can trigger > this problem because the pinned check will fire reliably. > > Hence the specific case here is the buffer has been previously modified, > the CIL committed so it's in the AIL, and we are marking the buffer > stale as the first modification of the buffer after it was added to > the AIL. IOWs, the order that is of concern is this while the item > is in the AIL: > > lock->stale->pin->CIL insert->unlock > > So in terms of the xfsaild racing and causing problems, the only case > it will occur in is this: > > Thread 1 xfsaild > lock xfs_buf_item_push > .... not pinned > stale > trylock > <delay> > commit > pin > if(pinned && stale) > <splat> > unlock > Ok, thanks for the context. > So what I was asking is whether we can do checks in the order of: > > smb_mb(); > if (stale) > return > if (pinned) > return > Right... > What I've just realised is that we really don't care if we race in > xfs_buf_item_push(). The race that matters is in xfs_buf_trylock() > and at that point it is too late to avoid it. > > So I think your original patch is on the right path but having the > xfsaild handle the log force gets rid of most of the nasty cruft it > had.... ie. we're going to have to tell xfs_buf_trylock() that we > should not do a log force if the lock fails, and return > XFS_ITEM_PINNED rather than XFS_ITEM_LOCKED in > xfs_buf_item_push().... > ... and that's the bit I wasn't so sure about. If it's Ok to pend up the log force as such, the xfs_buf_item_push() part can be made cleaner (xfs_buf_trylock() is still uglified with the new parameter). I'll start testing something like that and follow up with a new set if it addresses the problem and survives some sanity testing. Thanks. Brian > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-02-01 16:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-29 20:42 [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Brian Foster
2013-01-29 20:42 ` [PATCH RFC 1/2] xfs: conditionally force log on trylock failure of pinned/stale buf Brian Foster
2013-01-29 20:42 ` [PATCH RFC 2/2] xfs: drop xa_lock around log force in xfs_buf_item push Brian Foster
2013-01-29 22:41 ` [PATCH RFC 0/2] fix spinlock recursion on xa_lock in xfs_buf_item_push Mark Tinguely
2013-01-30 6:05 ` Dave Chinner
[not found] ` <5109291E.6090303@sgi.com>
2013-01-30 16:02 ` Brian Foster
2013-01-30 21:59 ` Dave Chinner
2013-01-31 17:45 ` Brian Foster
2013-02-01 1:26 ` Dave Chinner
2013-02-01 16:01 ` Brian Foster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox