From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965912AbXC2Cet (ORCPT ); Wed, 28 Mar 2007 22:34:49 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965913AbXC2Cet (ORCPT ); Wed, 28 Mar 2007 22:34:49 -0400 Received: from ishtar.tlinx.org ([64.81.245.74]:56557 "EHLO ishtar.tlinx.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965912AbXC2Ces (ORCPT ); Wed, 28 Mar 2007 22:34:48 -0400 Message-ID: <460B25BE.3050808@tlinx.org> Date: Wed, 28 Mar 2007 19:34:38 -0700 From: Linda Walsh User-Agent: Thunderbird 1.5.0.10 (Windows/20070221) MIME-Version: 1.0 To: Linda Walsh CC: Oliver Joa , Eric Sandeen , David Chinner , linux-kernel@vger.kernel.org, xfs-oss Subject: Re: Corrupt XFS -Filesystems on new Hardware and Kernel References: <46094344.4090007@j-o-a.de> <20070328113141.GQ32597093@melbourne.sgi.com> <460A6298.4040702@j-o-a.de> <460A821B.4080308@sandeen.net> <460AC857.6040305@j-o-a.de> <460B068C.6060903@tlinx.org> In-Reply-To: <460B068C.6060903@tlinx.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Oliver Joa wrote: >> eason or another, xfs has detected a corrupted on-disk inode format >> which it cannot recognize, and shuts down. ---- Oh, one other thing that may not apply in your case, but may. Does your SATA disk support write caching? Does it support something called a barrier function? (not real clear on all the ways this can go wrong, but I believe barriers are supposed to guarantee previous data has been fixed on disk (not in write cache). If the SATA controller issues a reset, it may very well purge the write cache. Theoretically, I can think of a _possibility_, that the reset disk would purge the write cache and the barrier indicator would tell xfs to resume writing. From a recent thread on the xfs list, it would appear this could be a "bad" thing (like crossing the streams ala "ghostbusters", but in a data-integrity context). Just a "shot in the dark" -- absent knowing anything specific about your hardware or situation... If that's the case, you might want to turn off write caching, since when xfs thinks "barriers" work, it turns off some "protection", that can enable some significant speedup in some situations. As an aside, some disks, I gather, may "claim" to support barriers, but really don't. Xfs tries to verify the barrier claim, but I don't know that a reset issued to the disk will have deterministic behavior across all manufacturer's disks. A bunch of "coulds" and "maybe's", but just thinking off top of head... Linda