Temporary drive failure leads to massive data corruption?

Linux XFS filesystem development
 help / color / mirror / Atom feed

* Temporary drive failure leads to massive data corruption?
@ 2018-05-25 17:02 Patrick J. LoPresti
  2018-05-25 17:28 ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick J. LoPresti @ 2018-05-25 17:02 UTC (permalink / raw)
  To: linux-xfs

Hi. We are using XFS on a hardware RAID6 container with around 100
terabytes of data in 500K files. (Actually, we have four such
containers per server and around a dozen servers.)

Anyway, we had a power event a couple of nights ago that took several
of the drives -- and thus the container -- offline.

We got the drives, and thus the hardware RAID6, back online, but when
we tried to mount the file system the message said it was corrupted
and we should run xfs_repair. Running xfs_repair complained that there
were uncommitted entries in the transaction log and we should try to
mount the file system.

Ultimately, we had to use "xfs_repair -L" to get the file system to mount.

Now, I understand that any files or directories being modified during
the event could be corrupted. But we are seeing something completely
different; namely...

Tens of thousands of our files -- each 100-ish megabytes -- appear to
have had large sections replaced with zeroes. (We are still evaluating
the damage.) None of these files were being modified at the time; in
fact, the majority were written years ago and are never changed.

Is this an expected failure mode for XFS? I understand we may have
corrupted a few disk blocks, but should we expect that to corrupt a
significant fraction of our at-rest data?

This is using Red Hat Enterprise Linux 6.6 (kernel
2.6.32-504.16.2.el6.x86_64 of Tue Apr 21 10:35:19 CDT 2015), if it
makes a difference.

Thanks!

 - Pat

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Temporary drive failure leads to massive data corruption?
  2018-05-25 17:02 Temporary drive failure leads to massive data corruption? Patrick J. LoPresti
@ 2018-05-25 17:28 ` Eric Sandeen
  2018-05-29 16:51   ` Patrick J. LoPresti
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2018-05-25 17:28 UTC (permalink / raw)
  To: Patrick J. LoPresti, linux-xfs

On 5/25/18 12:02 PM, Patrick J. LoPresti wrote:
> Hi. We are using XFS on a hardware RAID6 container with around 100
> terabytes of data in 500K files. (Actually, we have four such
> containers per server and around a dozen servers.)
> 
> Anyway, we had a power event a couple of nights ago that took several
> of the drives -- and thus the container -- offline.
> 
> We got the drives, and thus the hardware RAID6, back online, but when
> we tried to mount the file system the message said it was corrupted
> and we should run xfs_repair. Running xfs_repair complained that there
> were uncommitted entries in the transaction log and we should try to
> mount the file system.
> 
> Ultimately, we had to use "xfs_repair -L" to get the file system to mount.
> 
> Now, I understand that any files or directories being modified during
> the event could be corrupted. But we are seeing something completely
> different; namely...
> 
> Tens of thousands of our files -- each 100-ish megabytes -- appear to
> have had large sections replaced with zeroes. (We are still evaluating
> the damage.) None of these files were being modified at the time; in
> fact, the majority were written years ago and are never changed.
> 
> Is this an expected failure mode for XFS? I understand we may have
> corrupted a few disk blocks, but should we expect that to corrupt a
> significant fraction of our at-rest data?
> 
> This is using Red Hat Enterprise Linux 6.6 (kernel
> 2.6.32-504.16.2.el6.x86_64 of Tue Apr 21 10:35:19 CDT 2015), if it
> makes a difference.

I'm sure you won't like this answer, and I can't base it on empirical
evidence, but my first hunch would be that your controller did a poor job
of recovering from the error, and damaged the storage beneath the
filesystem.

I'd at least take a good look at controller logs (if any?) and see what
it did.  In general, you absolutely must make sure that the storage is
in proper shape before running the higher level fs repair tools.

On a more concrete note, it would be interestting to run xfs_bmap -vv
on some of those files with zeros and see what extents, if any,
cover the zeroed ranges.  i.e. are they holes, allocated, unwritten, etc.

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Temporary drive failure leads to massive data corruption?
  2018-05-25 17:28 ` Eric Sandeen
@ 2018-05-29 16:51   ` Patrick J. LoPresti
  2018-05-29 17:00     ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Patrick J. LoPresti @ 2018-05-29 16:51 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

Eric Sandeen <sandeen@sandeen.net> writes:

> I'm sure you won't like this answer,

Hi, Eric. I know enough about XFS to recognize your name, and it is not
like I am paying for support... So actually I am just grateful for your
reply.

> and I can't base it on empirical evidence, but my first hunch would be
> that your controller did a poor job of recovering from the error, and
> damaged the storage beneath the filesystem.

I admit this is possible, but... We have two RAID containers inside each
JBOD. Each JBOD has a single SAS cable to the hardware RAID card. Only
one of the RAID containers suffered damage; the other container in the
same JBOD is fine.

I can believe the RAID card did not recover particularly gracefully, but
I do not think we lost more than a few blocks on the file system. For
one thing, there wasn't enough time.

Until we ran xfs_repair, that is.

> On a more concrete note, it would be interestting to run xfs_bmap -vv
> on some of those files with zeros and see what extents, if any, cover
> the zeroed ranges.  i.e. are they holes, allocated, unwritten, etc.

I tried this on a few of the damaged files. Here is a typical output:

# xfs_bmap -p -v xxx
    xxx:
   EXT: FILE-OFFSET      BLOCK-RANGE                 AG  AG-OFFSET  TOTAL FLAGS
     0: [0..16255]:      195467240568..195467256823  91  (46229328..46245583)  16256 00000
     1: [16256..715959]: 195477629880..195478329583  91  (56618640..57318343) 699704 00000

Looking at the "zeroed" data ranges (there are several), none of them
are near the beginning nor end of either extent.

None of the files I looked at had FLAGS other than 00000.

All of the zeroed ranges I checked are page-aligned (4K multiple).

It really feels like some small amount of damage in one area of the file
system got amplified into corruption across many files' contents by
xfs_repair.

I do not know much about XFS internals, so forgive me if the following
is stupid... I imagine there are global data structures recording the
free/in-use blocks, as well as local data structures recording the
extents used by each file. Is it possible xfs_repair decided to "trust"
some corrupted global data structure instead of the local extents
associated with each file, and responded by wiping parts of the latter?

In general, could anything cause xfs_repair to zero out whole ranges of
blocks allocated to many files?

Thanks again.

 - Pat

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Temporary drive failure leads to massive data corruption?
  2018-05-29 16:51   ` Patrick J. LoPresti
@ 2018-05-29 17:00     ` Eric Sandeen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2018-05-29 17:00 UTC (permalink / raw)
  To: Patrick J. LoPresti; +Cc: linux-xfs

On 5/29/18 11:51 AM, Patrick J. LoPresti wrote:
>> On a more concrete note, it would be interestting to run xfs_bmap -vv
>> on some of those files with zeros and see what extents, if any, cover
>> the zeroed ranges.  i.e. are they holes, allocated, unwritten, etc.
> I tried this on a few of the damaged files. Here is a typical output:
> 
> # xfs_bmap -p -v xxx
>      xxx:
>     EXT: FILE-OFFSET      BLOCK-RANGE                 AG  AG-OFFSET  TOTAL FLAGS
>       0: [0..16255]:      195467240568..195467256823  91  (46229328..46245583)  16256 00000
>       1: [16256..715959]: 195477629880..195478329583  91  (56618640..57318343) 699704 00000
> 
> Looking at the "zeroed" data ranges (there are several), none of them
> are near the beginning nor end of either extent.
> 
> None of the files I looked at had FLAGS other than 00000.

Ok, so flags with 00000 mean "this is a normal, allocated, written extent"
and nothing fancy like preallocated/unwritten - and they aren't holes either.
  
> All of the zeroed ranges I checked are page-aligned (4K multiple).
> 
> It really feels like some small amount of damage in one area of the file
> system got amplified into corruption across many files' contents by
> xfs_repair.
> 
> I do not know much about XFS internals, so forgive me if the following
> is stupid... I imagine there are global data structures recording the
> free/in-use blocks, as well as local data structures recording the
> extents used by each file. Is it possible xfs_repair decided to "trust"
> some corrupted global data structure instead of the local extents
> associated with each file, and responded by wiping parts of the latter?
> 
> In general, could anything cause xfs_repair to zero out whole ranges of
> blocks allocated to many files?

Others may think of a scenario I'm missing, but xfs_repair simply does
not touch the contents of file data blocks.  It might truncate some away,
or remove entire extents from a file, or even junk an inode that looks
irredeemable, but it will never go in and zero out data in file blocks.
That's what leads me to think that there's something else happening on
the storage side of things.

Did you keep the xfs_repair output?  It would be interesting to correlate
the inodes w/ missing data to anything repair might have touched, I guess.

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-29 17:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-25 17:02 Temporary drive failure leads to massive data corruption? Patrick J. LoPresti
2018-05-25 17:28 ` Eric Sandeen
2018-05-29 16:51   ` Patrick J. LoPresti
2018-05-29 17:00     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox