From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id q9AERDCu054067 for ; Wed, 10 Oct 2012 09:27:14 -0500 Date: Wed, 10 Oct 2012 09:28:41 -0500 From: Ben Myers Subject: Re: [PATCH 14/14] xfs: only update the last_sync_lsn when a transaction completes Message-ID: <20121010142841.GV13214@sgi.com> References: <1349693772-8064-1-git-send-email-david@fromorbit.com> <1349693772-8064-15-git-send-email-david@fromorbit.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1349693772-8064-15-git-send-email-david@fromorbit.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com Hey Dave, On Mon, Oct 08, 2012 at 09:56:12PM +1100, Dave Chinner wrote: > From: Dave Chinner > > The log write code stamps each iclog with the current tail LSN in > the iclog header so that recovery knows where to find the tail of > thelog once it has found the head. Normally this is taken from the > first item on the AIL - the log item that corresponds to the oldest > active item in the log. > > The problem is that when the AIL is empty, the tail lsn is dervied > from the the l_last_sync_lsn, which is the LSN of the last iclog to > be written to the log. In most cases this doesn't happen, because > the AIL is rarely empty on an active filesystem. However, when it > does, it opens up an interesting case when the transaction being > committed to the iclog spans multiple iclogs. > > That is, the first iclog is stamped with the l_last_sync_lsn, and IO > is issued. Then the next iclog is setup, the changes copied into the > iclog (takes some time), and then the l_last_sync_lsn is stamped > into the header and IO is issued. This is still the same > transaction, so the tail lsn of both iclogs must be the same for log > recovery to find the entire transaction to be able to replay it. > > The problem arises in that the iclog buffer IO completion updates > the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog > completes it's IO before the second iclog is filled and has the tail > lsn stamped in it, it will stamp the LSN of the first iclog into > it's tail lsn field. If the system fails at this point, log recovery > will not see a complete transaction, so the transaction will no be > replayed. > > The fix is simple - the l_last_sync_lsn is updated when a iclog > buffer IO completes, and this is incorrect. The l_last_sync_lsn > shoul dbe updated when a transaction is completed by a iclog buffer > IO. That is, only iclog buffers that have transaction commit > callbacks attached to them should update the l_last_sync_lsn. This > means that the last_sync_lsn will only move forward when a commit > record it written, not in the middle of a large transaction that is > rolling through multiple iclog buffers. > > Signed-off-by: Dave Chinner > Reviewed-by: Mark Tinguely I think this one is appropriate for the 3.7 release. What say you? -Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs