Re: XFS filesystem corruption

From: Stan Hoeppner <stan@hardwarefreak.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Julien FERRERO <jferrero06@gmail.com>,
	Ric Wheeler <rwheeler@redhat.com>,
	xfs@oss.sgi.com
Subject: Re: XFS filesystem corruption
Date: Sun, 10 Mar 2013 18:54:57 -0500	[thread overview]
Message-ID: <513D1D51.7010905@hardwarefreak.com> (raw)
In-Reply-To: <20130310224536.GK23616@dastard>

On 3/10/2013 5:45 PM, Dave Chinner wrote:
> On Sat, Mar 09, 2013 at 12:51:25PM -0600, Stan Hoeppner wrote:
>> On 3/9/2013 3:11 AM, Dave Chinner wrote:
>>> On Fri, Mar 08, 2013 at 12:59:22PM -0600, Stan Hoeppner wrote:
>>>> On 3/8/2013 6:20 AM, Ric Wheeler wrote:
>>>>>> Something that none of us mentioned WRT write barriers is that while the
>>>>>> filesystem structure may avoid corruption when the power is cut, files
>>>>>> may still be corrupted, in conditions such as any/all of these:
>>>>
>>>> I made it very clear I was discussing file corruption here, not
>>>> filesystem corruption.  You already covered that base.  I was
>>>> specifically addressing the fact that XFS performs barriers on metadata
>>>> writes but not file data writes.
>>>
>>> Actually, you're not correct there, either, Stan. ;)
>>
>> With "either" you're implying I was incorrect twice, and I wasn't, not
>> in whole anyway, maybe in part. ;)
> 
> The "either" was in reference to you correcting someone else...

I wasn't attempting to correct Ric on the technicals, as that's simply
not really possible, me being a user talking to a dev.  That would be
really presumptuous on my part, not to mention dumb.  I had made a point
about file data corruption, and he replied talking about metadata
corruption.  My "correction" was simply to clarify I was talking about
file data not metadata.

>>> XFS only issues cache flushes/FUA writes for log IO. Metadata IO is
>>> done exactly the same way that data IO is done - without barriers.
>>> It's because metadata lost in drive caches at the time of a crash is
>>> rewritten by journal replay that filesystem corruption does not
>>> occur.
>>
>> Technical semantics.  Geeze, give the non dev a break now and then.  ;)
> 
> It's the technical semantics that matter when it comes to behaviour
> at power loss.  That's why I pick on "technical semantics" - it's
> makes your analysis and understanding of problems better, and that
> means there's less for me to do in future ;)

I do my best to grab the low hanging fruit when I can so you guys can
concentrate on more important stuff.

>>  Does everyone remember the transitive property of equality from math
>> class decades ago?  It states "If A=B and B=C then A=C".  Thus if
>> barrier writes to the journal protect the journal, and the journal
>> protects metadata, then barrier writes to the journal protect metadata.
> 
> Yup, but the devil is in the detail - we don't protect individual
> metadata writes at all and that difference is significant enough to
> comment on.... :P

Elaborate on this a bit, if you have time.  I was under the impression
that all directory updates were journaled first.

>>  I had a detail incorrect, but not the big picture.  And I'd bet the OP
>> is more interested in the big picture.  So surely I'd get a B or a C
>> here, but certainly not an F.
> 
> Certainly a B+ - like I said, I'm being picky because you seem to
> understand the details once explained... :)

Usually. ;)  Sometimes it takes a couple of sessions before it fully
sinks in.  I must say I've learned a tremendous amount from the devs on
this list, and I'm grateful that you specifically Dave have taken the
time to 'tutor' me, and others, over the last couple of years.

>>> As it is, if the application uses direct IO (likely, as it
>>> sounds like video capture/editing/playout here) then log IO
>>> will also ensure that the data written by the app is on disk (i.e.
>>> that's ithe mechanism by which fsync works).
>>
>> So this would be an interesting upside down case for XFS, as the file
>> data may be intact, but the filesystem gets corrupted, the opposite of
>> the design point.
> 
> Well, if barriers are working correctly, then there won't be any
> filesystem corruption, either...

Ok, see, this is odd part here.  The OP didn't seem to have this
metadata corruption issue with the old 2.6.18 kernel, at least I think
that's the one he mentioned.  Then he switched to 2.6.35.  IIRC there
were a number of commits around that time and some regressions.  I also
recall 2.6.35 is not a long term stable kernel.  I'd guess there were
reasons for that.  So, I'm wondering if there was a bug/regression
relating to XFS metadata in 2.6.35 corrected in .36 or later and simply
not backported.  Seems to ring a bell, vaguely.  I have no idea
where/how to search for such information.

>>>>> Also, if there are active writers, this is inherently racy. A better
>>>>> script would unmount the file systems :)
>>>>
>>>> Yes, a umount would be even better.
>>>
>>> Change the bios so that the power button does not cause a power down
>>> so the OS can capture the button event and trigger an orderly
>>> shutdown.
>>
>> Dare I say "Dave you're incorrect". ;)
> 
> Heh.  Not so much incorrect as "unaware of the entire scope". I
> browsed the thread and didn't pick up on this little detail...

I know.  That was a bit of a cheap shot, hence the judicious use of
quotes and winkies. ;)  I knew you'd missed it or you'd not have
mentioned the ACPI soft power switch option.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs