From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 213CB7F7E for ; Wed, 6 Mar 2013 10:47:46 -0600 (CST) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay1.corp.sgi.com (Postfix) with ESMTP id 596CB8F80A1 for ; Wed, 6 Mar 2013 08:47:45 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by cuda.sgi.com with ESMTP id jUKCI1aLVbmGqNvi for ; Wed, 06 Mar 2013 08:47:44 -0800 (PST) Message-ID: <5137732B.3010703@redhat.com> Date: Wed, 06 Mar 2013 11:47:39 -0500 From: Ric Wheeler MIME-Version: 1.0 Subject: Re: XFS filesystem corruption References: <20130306161519.2c28d911@galadriel.home> In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Julien FERRERO Cc: xfs@oss.sgi.com On 03/06/2013 11:16 AM, Julien FERRERO wrote: > Hi Emmanuel > > 2013/3/6 Emmanuel Florac : >> Le Wed, 6 Mar 2013 16:08:59 +0100 vous =E9criviez: >> >>> I am totally stuck and I really don't know how to duplicate the >>> corruption. I only know that units are used to be power cycle by >>> operator while the fs is still mounted (no proper shutdown / reboot). >>> My guess is the fs journal shall handle this case and avoid such >>> corruption. >> Wrong guess. It may work or not, depending upon a long list of >> parameters, but basically not turning it off properly is asking for >> problems and corruptions. The problem will be tragically aggravated if >> your hardware RAID doesn't have a battery backed-up cache. >> > OK but our server is 95% of the time reading data and 5% of the time > writing data. We have a case of a server that did not write anything > at the time of failure (and during all the uptime session). Moreover, > failure occurs to files that were opened in read-only or weren't > accessed at all at the time of failure. I don't think the H/W RAID is > the issue since we have the same corruption with other setup without > H/W RAID. > > Does the "ls" output with "???" looks like a fs corruption ? > Caching can hold dirty data in volatile cache for a very long time. Even if= you = open a file in "read-only" mode, you still do a fair amount of writes to = storage. You can use blktrace or similar tool to see just how much data is = written. As mentioned earlier, you always must unmount cleanly as a best practice. A= n = operator that powers off with mounted file systems need educated or let go = :) Ric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs