From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 557987F9B for ; Fri, 27 Jun 2014 09:26:41 -0500 (CDT) Message-ID: <53AD7F1B.500@sgi.com> Date: Fri, 27 Jun 2014 09:26:35 -0500 From: Mark Tinguely MIME-Version: 1.0 Subject: Re: Metadata CRC error upon unclean unmount References: <20140624201946.GJ9508@dastard> <20140625012144.GK9508@dastard> <20140626002859.GQ9508@dastard> <53AC7CA9.9050505@sgi.com> <20140626224727.GS9508@dastard> In-Reply-To: <20140626224727.GS9508@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: xfs@oss.sgi.com On 06/26/14 17:47, Dave Chinner wrote: > On Thu, Jun 26, 2014 at 03:03:53PM -0500, Mark Tinguely wrote: >> Could an out of order CIL push cause this? > > I don't think so - the issue appears to be that a CRC is not being > recalculated on a buffer before IO has been issued to disk, not that > there is incorrect metadata in the buffer. Regardless of how we > modify the buffer, the CRC should always match the contents of the > block on disk because we calculate it with the buffer locked and > just prior to it being written. > >> SGI saw sequence 2 (and sometimes 3/4) of the cil push get in front >> of cil push sequence 1. Looks like the setting of >> log->l_cilp->xc_ctx->commit_lsn in xlog_cil_init_post_recovery() >> lets this happen. > > I don't think can actually happen - the CIL is not used until after > xlog_cil_init_post_recovery() is completed and transactions start > during EFI recovery. Any attempt to use it prior to that call will > oops on the null ctx_ticket. > > As for the ordering issue, I'm pretty sure that was fixed in > commit f876e44 ("xfs: always do log forces via the workqueue"). The problem will be with the first CIL push *after* the xlog_cil_init_post_recovery() especially if the first ctx has a large vector list and the following ones have small ones. Looks to me that the problem is still in the cil push worker. --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs