* [PATCH] xfs: copy li_lsn before dropping AIL lock
@ 2009-11-16 19:51 Nathaniel W. Turner
2009-11-17 15:13 ` Christoph Hellwig
0 siblings, 1 reply; 3+ messages in thread
From: Nathaniel W. Turner @ 2009-11-16 19:51 UTC (permalink / raw)
To: xfs
Access to log items on the AIL is generally protected by m_ail_lock;
this is particularly needed when we're getting or setting the 64-bit
li_lsn on a 32-bit platform. This patch fixes a couple places where we
were accessing the log item after dropping the AIL lock on 32-bit
machines.
This can result in a partially-zeroed log->l_tail_lsn if
xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at
least some cases, this can leave the l_tail_lsn with a zero cycle
number, which means xlog_space_left will think the log is full (unless
CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to
processes stuck forever in xlog_grant_log_space.
Thanks to Adrian VanderSpek for first spotting the race potential and to
Dave Chinner for debug assistance.
Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net>
---
fs/xfs/xfs_trans_ail.c | 23 ++++++++++++++++++++---
1 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index f31271c..2ffc570 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -467,6 +467,7 @@ xfs_trans_ail_update(
{
xfs_log_item_t *dlip = NULL;
xfs_log_item_t *mlip; /* ptr to minimum lip */
+ xfs_lsn_t tail_lsn;
mlip = xfs_ail_min(ailp);
@@ -483,8 +484,16 @@ xfs_trans_ail_update(
if (mlip == dlip) {
mlip = xfs_ail_min(ailp);
+ /*
+ * It is not safe to access mlip after the AIL lock is
+ * dropped, so we must get a copy of li_lsn before we do
+ * so. This is especially important on 32-bit platforms
+ * where accessing and updating 64-bit values like li_lsn
+ * is not atomic.
+ */
+ tail_lsn = mlip->li_lsn;
spin_unlock(&ailp->xa_lock);
- xfs_log_move_tail(ailp->xa_mount, mlip->li_lsn);
+ xfs_log_move_tail(ailp->xa_mount, tail_lsn);
} else {
spin_unlock(&ailp->xa_lock);
}
@@ -514,6 +523,7 @@ xfs_trans_ail_delete(
{
xfs_log_item_t *dlip;
xfs_log_item_t *mlip;
+ xfs_lsn_t tail_lsn;
if (lip->li_flags & XFS_LI_IN_AIL) {
mlip = xfs_ail_min(ailp);
@@ -527,9 +537,16 @@ xfs_trans_ail_delete(
if (mlip == dlip) {
mlip = xfs_ail_min(ailp);
+ /*
+ * It is not safe to access mlip after the AIL lock
+ * is dropped, so we must get a copy of li_lsn
+ * before we do so. This is especially important
+ * on 32-bit platforms where accessing and updating
+ * 64-bit values like li_lsn is not atomic.
+ */
+ tail_lsn = mlip ? mlip->li_lsn : 0;
spin_unlock(&ailp->xa_lock);
- xfs_log_move_tail(ailp->xa_mount,
- (mlip ? mlip->li_lsn : 0));
+ xfs_log_move_tail(ailp->xa_mount, tail_lsn);
} else {
spin_unlock(&ailp->xa_lock);
}
--
Nathaniel W. Turner
http://houseofnate.net/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] xfs: copy li_lsn before dropping AIL lock
2009-11-16 19:51 [PATCH] xfs: copy li_lsn before dropping AIL lock Nathaniel W. Turner
@ 2009-11-17 15:13 ` Christoph Hellwig
2009-11-17 21:48 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2009-11-17 15:13 UTC (permalink / raw)
To: Nathaniel W. Turner; +Cc: xfs
On Mon, Nov 16, 2009 at 02:51:48PM -0500, Nathaniel W. Turner wrote:
> Access to log items on the AIL is generally protected by m_ail_lock;
> this is particularly needed when we're getting or setting the 64-bit
> li_lsn on a 32-bit platform. This patch fixes a couple places where we
> were accessing the log item after dropping the AIL lock on 32-bit
> machines.
>
> This can result in a partially-zeroed log->l_tail_lsn if
> xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at
> least some cases, this can leave the l_tail_lsn with a zero cycle
> number, which means xlog_space_left will think the log is full (unless
> CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to
> processes stuck forever in xlog_grant_log_space.
Might this also cause this oops?
http://www.kerneloops.org/raw.php?rawid=944396&msgid=
It's been shoving up a few times recently.
Anyway, the patch looks good to me, but I wonder if we should abstract
the li_lsn handling a bit more to avoid easily running into this kind of
problems.
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: copy li_lsn before dropping AIL lock
2009-11-17 15:13 ` Christoph Hellwig
@ 2009-11-17 21:48 ` Dave Chinner
0 siblings, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2009-11-17 21:48 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Nathaniel W. Turner, xfs
On Tue, Nov 17, 2009 at 10:13:18AM -0500, Christoph Hellwig wrote:
> On Mon, Nov 16, 2009 at 02:51:48PM -0500, Nathaniel W. Turner wrote:
> > Access to log items on the AIL is generally protected by m_ail_lock;
> > this is particularly needed when we're getting or setting the 64-bit
> > li_lsn on a 32-bit platform. This patch fixes a couple places where we
> > were accessing the log item after dropping the AIL lock on 32-bit
> > machines.
> >
> > This can result in a partially-zeroed log->l_tail_lsn if
> > xfs_trans_ail_delete is racing with xfs_trans_ail_update, and in at
> > least some cases, this can leave the l_tail_lsn with a zero cycle
> > number, which means xlog_space_left will think the log is full (unless
> > CONFIG_XFS_DEBUG is set, in which case we'll trip an ASSERT), leading to
> > processes stuck forever in xlog_grant_log_space.
>
> Might this also cause this oops?
>
> http://www.kerneloops.org/raw.php?rawid=944396&msgid=
>
> It's been shoving up a few times recently.
I don't think so. That trace is in xfs_close_devices(), which is
called after xfs_unmountfs->xfs_log_unmount->xlog_dealloc, which
tends to imply that mp->m_log is NULL.
i.e. we have IO being flushed when no IO should be pending. Given
that there is a xfs->bdstrat_cb->xfs_bioerror trace in there, it
looks like the buffer could not be written because the system is in
a forced shutdown situation. I have no idea what that buffer might
be or why it wasn't issued and completed during the other flushes
and waits during the unmount process....
> Anyway, the patch looks good to me, but I wonder if we should abstract
> the li_lsn handling a bit more to avoid easily running into this kind of
> problems.
Worth considering, IMO.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-11-17 21:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-16 19:51 [PATCH] xfs: copy li_lsn before dropping AIL lock Nathaniel W. Turner
2009-11-17 15:13 ` Christoph Hellwig
2009-11-17 21:48 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox