public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: only update the last_sync_lsn when a transaction completes
@ 2012-09-28  4:37 Dave Chinner
  2012-09-28 12:40 ` Christoph Hellwig
  2012-09-28 17:46 ` Mark Tinguely
  0 siblings, 2 replies; 5+ messages in thread
From: Dave Chinner @ 2012-09-28  4:37 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The log write code stamps each iclog with the current tail LSN in
the iclog header so that recovery knows where to find the tail of
thelog once it has found the head. Normally this is taken from the
first item on the AIL - the log item that corresponds to the oldest
active item in the log.

The problem is that when the AIL is empty, the tail lsn is dervied
from the the l_last_sync_lsn, which is the LSN of the last iclog to
be written to the log. In most cases this doesn't happen, because
the AIL is rarely empty on an active filesystem. However, when it
does, it opens up an interesting case when the transaction being
committed to the iclog spans multiple iclogs.

That is, the first iclog is stamped with the l_last_sync_lsn, and IO
is issued. Then the next iclog is setup, the changes copied into the
iclog (takes some time), and then the l_last_sync_lsn is stamped
into the header and IO is issued. This is still the same
transaction, so the tail lsn of both iclogs must be the same for log
recovery to find the entire transaction to be able to replay it.

The problem arises in that the iclog buffer IO completion updates
the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
completes it's IO before the second iclog is filled and has the tail
lsn stamped in it, it will stamp the LSN of the first iclog into
it's tail lsn field. If the system fails at this point, log recovery
will not see a complete transaction, so the transaction will no be
replayed.

The fix is simple - the l_last_sync_lsn is updated when a iclog
buffer IO completes, and this is incorrect. The l_last_sync_lsn
shoul dbe updated when a transaction is completed by a iclog buffer
IO. That is, only iclog buffers that have transaction commit
callbacks attached to them should update the l_last_sync_lsn. This
means that the last_sync_lsn will only move forward when a commit
record it written, not in the middle of a large transaction that is
rolling through multiple iclog buffers.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 67d63f6..2ec2e0b 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2459,14 +2459,27 @@ xlog_state_do_callback(
 
 
 				/*
-				 * update the last_sync_lsn before we drop the
+				 * Completion of a iclog IO does not imply that
+				 * a transaction has completed, as transactions
+				 * can be large enough to span many iclogs. We
+				 * cannot change the tail of the log half way
+				 * through a transaction as this may be the only
+				 * transaction in the log and moving th etail to
+				 * point to the middle of it will prevent
+				 * recovery from finding the start of the
+				 * transaction. Hence we should only update the
+				 * last_sync_lsn if this iclog contains
+				 * transaction completion callbacks on it.
+				 *
+				 * We have to do this before we drop the
 				 * icloglock to ensure we are the only one that
 				 * can update it.
 				 */
 				ASSERT(XFS_LSN_CMP(atomic64_read(&log->l_last_sync_lsn),
 					be64_to_cpu(iclog->ic_header.h_lsn)) <= 0);
-				atomic64_set(&log->l_last_sync_lsn,
-					be64_to_cpu(iclog->ic_header.h_lsn));
+				if (iclog->ic_callback)
+					atomic64_set(&log->l_last_sync_lsn,
+						be64_to_cpu(iclog->ic_header.h_lsn));
 
 			} else
 				ioerrors++;
-- 
1.7.10

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: only update the last_sync_lsn when a transaction completes
  2012-09-28  4:37 [PATCH] xfs: only update the last_sync_lsn when a transaction completes Dave Chinner
@ 2012-09-28 12:40 ` Christoph Hellwig
  2012-09-28 23:37   ` Dave Chinner
  2012-09-28 17:46 ` Mark Tinguely
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2012-09-28 12:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Looks good.

Do you have any testcase for this?  It doesn't look like an issue found
by code inspection.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: only update the last_sync_lsn when a transaction completes
  2012-09-28  4:37 [PATCH] xfs: only update the last_sync_lsn when a transaction completes Dave Chinner
  2012-09-28 12:40 ` Christoph Hellwig
@ 2012-09-28 17:46 ` Mark Tinguely
  1 sibling, 0 replies; 5+ messages in thread
From: Mark Tinguely @ 2012-09-28 17:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 09/27/12 23:37, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> The log write code stamps each iclog with the current tail LSN in
> the iclog header so that recovery knows where to find the tail of
> thelog once it has found the head. Normally this is taken from the
> first item on the AIL - the log item that corresponds to the oldest
> active item in the log.
>
> The problem is that when the AIL is empty, the tail lsn is dervied
> from the the l_last_sync_lsn, which is the LSN of the last iclog to
> be written to the log. In most cases this doesn't happen, because
> the AIL is rarely empty on an active filesystem. However, when it
> does, it opens up an interesting case when the transaction being
> committed to the iclog spans multiple iclogs.
>
> That is, the first iclog is stamped with the l_last_sync_lsn, and IO
> is issued. Then the next iclog is setup, the changes copied into the
> iclog (takes some time), and then the l_last_sync_lsn is stamped
> into the header and IO is issued. This is still the same
> transaction, so the tail lsn of both iclogs must be the same for log
> recovery to find the entire transaction to be able to replay it.
>
> The problem arises in that the iclog buffer IO completion updates
> the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
> completes it's IO before the second iclog is filled and has the tail
> lsn stamped in it, it will stamp the LSN of the first iclog into
> it's tail lsn field. If the system fails at this point, log recovery
> will not see a complete transaction, so the transaction will no be
> replayed.
>
> The fix is simple - the l_last_sync_lsn is updated when a iclog
> buffer IO completes, and this is incorrect. The l_last_sync_lsn
> shoul dbe updated when a transaction is completed by a iclog buffer
> IO. That is, only iclog buffers that have transaction commit
> callbacks attached to them should update the l_last_sync_lsn. This
> means that the last_sync_lsn will only move forward when a commit
> record it written, not in the middle of a large transaction that is
> rolling through multiple iclog buffers.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Makes a lot of sense. Seems to clean up wrap warnings
and hangs that I started to see in xfstest 273.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: only update the last_sync_lsn when a transaction completes
  2012-09-28 12:40 ` Christoph Hellwig
@ 2012-09-28 23:37   ` Dave Chinner
  2012-09-30  2:22     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2012-09-28 23:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Sep 28, 2012 at 08:40:38AM -0400, Christoph Hellwig wrote:
> 
> Looks good.
> 
> Do you have any testcase for this?  It doesn't look like an issue found
> by code inspection.

Yeah, test 182 with my xfssyncd-die-die-die patchset triggers it
100% reliably.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xfs: only update the last_sync_lsn when a transaction completes
  2012-09-28 23:37   ` Dave Chinner
@ 2012-09-30  2:22     ` Dave Chinner
  0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2012-09-30  2:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sat, Sep 29, 2012 at 09:37:52AM +1000, Dave Chinner wrote:
> On Fri, Sep 28, 2012 at 08:40:38AM -0400, Christoph Hellwig wrote:
> > 
> > Looks good.
> > 
> > Do you have any testcase for this?  It doesn't look like an issue found
> > by code inspection.
> 
> Yeah, test 182 with my xfssyncd-die-die-die patchset triggers it
> 100% reliably.

FWIW, I've occasionally seen unreproducable recovery issues that
could have been caused by this for many years now. This is
effectively correcting a zero-day bug in the log behaviour. Here's
the commit that introduced this behaviour:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=99b58644923246c710c6dc9f7b47bcae66ed3c63

Dated 4th May, 1994.

It's definitely a contender for the oldest bug we've ever found in
the XFS code base...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-09-30  2:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-28  4:37 [PATCH] xfs: only update the last_sync_lsn when a transaction completes Dave Chinner
2012-09-28 12:40 ` Christoph Hellwig
2012-09-28 23:37   ` Dave Chinner
2012-09-30  2:22     ` Dave Chinner
2012-09-28 17:46 ` Mark Tinguely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox