* [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed @ 2013-12-24 12:48 Jeff Liu 2013-12-28 16:19 ` Mark Tinguely 0 siblings, 1 reply; 3+ messages in thread From: Jeff Liu @ 2013-12-24 12:48 UTC (permalink / raw) To: xfs@oss.sgi.com From: Jie Liu <jeff.liu@oracle.com> With fsstress+godown test I observed an XFS hang up during umount which yielding a backtrace like below: [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds. [20876.193641] Tainted: PF O 3.13.0-rc2+ #8 [20876.193643] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [20876.193645] umount D ffff88026f294440 0 9853 9372 <snip> [20876.193663] Call Trace: [20876.193672] [<ffffffff816f3829>] schedule+0x29/0x70 [20876.193701] [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0 [xfs] [20876.193707] [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100 [20876.193726] [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs] [20876.193746] [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs] [20876.193751] [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0 [20876.193754] [<ffffffff811bc217>] kill_block_super+0x27/0x70 [20876.193757] [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60 [20876.193761] [<ffffffff811bcab6>] deactivate_super+0x46/0x60 [20876.193765] [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170 [20876.193769] [<ffffffff811da64e>] SyS_umount+0x8e/0x100 [20876.193774] [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f As per above backtraces, the umount process is already scheduled out in xfs_ail_push_all_sync() because it should push out all of pending changes in AIL and wait until the AIL is empty. Then it will wake up xfsaild thread to do the actual flushing business. However, I found that the AIL does not became empty in some situations because of some EFI are still being on it, but in EFI's iop_push operation, we always returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering into an infinite loop. Since EFI items have no locking or pushing, they are pulled from the AIL when their corresponding EFDs are committed to disk, and we have guaranteed that the EFI should not be freed until it has been unppined and the EFD has been committed in commit 666d644cd7, this is done via an EFI reference count by initializing it to 2 in xfs_efi_init() -- one is it's own count which is not released until it is unpinned, the other one is taken by its corresponding EFD which will be released during EFD commit operation. IMHO we should always releasing it's reference to the corresponding EFI item once the EFD item is committed to disk regardless of the log item is marked with XFS_LI_ABORTED flag or not. Signed-off-by: Jie Liu <jeff.liu@oracle.com> --- fs/xfs/xfs_extfree_item.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 3680d04..16c0396 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -437,13 +437,7 @@ xfs_efd_item_committed( { struct xfs_efd_log_item *efdp = EFD_ITEM(lip); - /* - * If we got a log I/O error, it's always the case that the LR with the - * EFI got unpinned and freed before the EFD got aborted. - */ - if (!(lip->li_flags & XFS_LI_ABORTED)) - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); - + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); xfs_efd_item_free(efdp); return (xfs_lsn_t)-1; } -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed 2013-12-24 12:48 [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed Jeff Liu @ 2013-12-28 16:19 ` Mark Tinguely 2013-12-29 13:19 ` Jeff Liu 0 siblings, 1 reply; 3+ messages in thread From: Mark Tinguely @ 2013-12-28 16:19 UTC (permalink / raw) To: Jeff Liu; +Cc: xfs@oss.sgi.com On 12/24/13 06:48, Jeff Liu wrote: > From: Jie Liu<jeff.liu@oracle.com> > > With fsstress+godown test I observed an XFS hang up during umount which > yielding a backtrace like below: > > [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds. > [20876.193641] Tainted: PF O 3.13.0-rc2+ #8 > [20876.193643] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [20876.193645] umount D ffff88026f294440 0 9853 9372 > <snip> > [20876.193663] Call Trace: > [20876.193672] [<ffffffff816f3829>] schedule+0x29/0x70 > [20876.193701] [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0 [xfs] > [20876.193707] [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100 > [20876.193726] [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs] > [20876.193746] [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs] > [20876.193751] [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0 > [20876.193754] [<ffffffff811bc217>] kill_block_super+0x27/0x70 > [20876.193757] [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60 > [20876.193761] [<ffffffff811bcab6>] deactivate_super+0x46/0x60 > [20876.193765] [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170 > [20876.193769] [<ffffffff811da64e>] SyS_umount+0x8e/0x100 > [20876.193774] [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f > > As per above backtraces, the umount process is already scheduled out > in xfs_ail_push_all_sync() because it should push out all of pending > changes in AIL and wait until the AIL is empty. Then it will wake up > xfsaild thread to do the actual flushing business. However, I found > that the AIL does not became empty in some situations because of some > EFI are still being on it, but in EFI's iop_push operation, we always > returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering > into an infinite loop. > > Since EFI items have no locking or pushing, they are pulled from the > AIL when their corresponding EFDs are committed to disk, and we have > guaranteed that the EFI should not be freed until it has been unppined > and the EFD has been committed in commit 666d644cd7, this is done via > an EFI reference count by initializing it to 2 in xfs_efi_init() -- one > is it's own count which is not released until it is unpinned, the other > one is taken by its corresponding EFD which will be released during EFD > commit operation. > > IMHO we should always releasing it's reference to the corresponding EFI > item once the EFD item is committed to disk regardless of the log item > is marked with XFS_LI_ABORTED flag or not. > > Signed-off-by: Jie Liu<jeff.liu@oracle.com> > --- > fs/xfs/xfs_extfree_item.c | 8 +------- > 1 file changed, 1 insertion(+), 7 deletions(-) > > diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c > index 3680d04..16c0396 100644 > --- a/fs/xfs/xfs_extfree_item.c > +++ b/fs/xfs/xfs_extfree_item.c > @@ -437,13 +437,7 @@ xfs_efd_item_committed( > { > struct xfs_efd_log_item *efdp = EFD_ITEM(lip); > > - /* > - * If we got a log I/O error, it's always the case that the LR with the > - * EFI got unpinned and freed before the EFD got aborted. > - */ > - if (!(lip->li_flags& XFS_LI_ABORTED)) > - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > - > + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); > xfs_efd_item_free(efdp); > return (xfs_lsn_t)-1; > } Hi Jeff. This would work if the forced shutdown happened after both the EFI and EFD transaction were committed and successfully placed on the CIL. If the sequence went EFI commit, CIL push (EFI is now in the AIL), forced shutdown, and then EFD commit. In this sequence, the EFD item would not be placed on the CIL and therefore the iop.committed would not be called. In this patch only iop_committing and iop_unlock would be run on the EFD item. Thanks, --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed 2013-12-28 16:19 ` Mark Tinguely @ 2013-12-29 13:19 ` Jeff Liu 0 siblings, 0 replies; 3+ messages in thread From: Jeff Liu @ 2013-12-29 13:19 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs@oss.sgi.com On 2013年12月29日 00:19, Mark Tinguely wrote: > On 12/24/13 06:48, Jeff Liu wrote: >> From: Jie Liu<jeff.liu@oracle.com> >> >> With fsstress+godown test I observed an XFS hang up during umount which >> yielding a backtrace like below: >> >> [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds. >> [20876.193641] Tainted: PF O 3.13.0-rc2+ #8 >> [20876.193643] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [20876.193645] umount D ffff88026f294440 0 9853 9372 >> <snip> >> [20876.193663] Call Trace: >> [20876.193672] [<ffffffff816f3829>] schedule+0x29/0x70 >> [20876.193701] [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0 >> [xfs] >> [20876.193707] [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100 >> [20876.193726] [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs] >> [20876.193746] [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs] >> [20876.193751] [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0 >> [20876.193754] [<ffffffff811bc217>] kill_block_super+0x27/0x70 >> [20876.193757] [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60 >> [20876.193761] [<ffffffff811bcab6>] deactivate_super+0x46/0x60 >> [20876.193765] [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170 >> [20876.193769] [<ffffffff811da64e>] SyS_umount+0x8e/0x100 >> [20876.193774] [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f >> >> As per above backtraces, the umount process is already scheduled out >> in xfs_ail_push_all_sync() because it should push out all of pending >> changes in AIL and wait until the AIL is empty. Then it will wake up >> xfsaild thread to do the actual flushing business. However, I found >> that the AIL does not became empty in some situations because of some >> EFI are still being on it, but in EFI's iop_push operation, we always >> returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering >> into an infinite loop. >> >> Since EFI items have no locking or pushing, they are pulled from the >> AIL when their corresponding EFDs are committed to disk, and we have >> guaranteed that the EFI should not be freed until it has been unppined >> and the EFD has been committed in commit 666d644cd7, this is done via >> an EFI reference count by initializing it to 2 in xfs_efi_init() -- one >> is it's own count which is not released until it is unpinned, the other >> one is taken by its corresponding EFD which will be released during EFD >> commit operation. >> >> IMHO we should always releasing it's reference to the corresponding EFI >> item once the EFD item is committed to disk regardless of the log item >> is marked with XFS_LI_ABORTED flag or not. >> >> Signed-off-by: Jie Liu<jeff.liu@oracle.com> >> --- >> fs/xfs/xfs_extfree_item.c | 8 +------- >> 1 file changed, 1 insertion(+), 7 deletions(-) >> >> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c >> index 3680d04..16c0396 100644 >> --- a/fs/xfs/xfs_extfree_item.c >> +++ b/fs/xfs/xfs_extfree_item.c >> @@ -437,13 +437,7 @@ xfs_efd_item_committed( >> { >> struct xfs_efd_log_item *efdp = EFD_ITEM(lip); >> >> - /* >> - * If we got a log I/O error, it's always the case that the LR >> with the >> - * EFI got unpinned and freed before the EFD got aborted. >> - */ >> - if (!(lip->li_flags& XFS_LI_ABORTED)) >> - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); >> - >> + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); >> xfs_efd_item_free(efdp); >> return (xfs_lsn_t)-1; >> } > Hi Mark, > Hi Jeff. > > This would work if the forced shutdown happened after both the EFI and > EFD transaction were committed and successfully placed on the CIL. Yep. > If the sequence went EFI commit, CIL push (EFI is now in the AIL), > forced shutdown, and then EFD commit. In this sequence, the EFD item > would not be placed on the CIL and therefore the iop.committed would not > be called. In this patch only iop_committing and iop_unlock would be run > on the EFD item. Agree, it seems we have to release EFI reference count in xfs_efd_item_unlock() if XFS_LI_ABORTED is detected. However, the story was not yet completed, I just thought another possible order between EFI/EFD. If EFD is committed into CIL and just before calling xfs_efd_item_committed(), the EFI transaction commit is aborted and thus the EFI item is released in it's iop_unlock() callback, hence my current fix also has problem, i.e, in iop_committed(), it will try to drop an EFI reference count which is already released. This problem seems a bit complicated than I thought before, let me think it over once back from vacations until 02, Jan -- no development environment on hand... Thanks, -Jeff _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-12-29 13:19 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-24 12:48 [PATCH 2/4] xfs: always releasing EFD's reference to EFI in xfs_efd_item_committed Jeff Liu 2013-12-28 16:19 ` Mark Tinguely 2013-12-29 13:19 ` Jeff Liu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox