From: lopresti@gmail.com (Patrick J. LoPresti)
To: Eric Sandeen <sandeen@sandeen.net>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Temporary drive failure leads to massive data corruption?
Date: Tue, 29 May 2018 09:51:27 -0700 [thread overview]
Message-ID: <8636yav20g.fsf@self-evident.org> (raw)
In-Reply-To: <1b397e88-0e1f-5f33-7def-a38a4a1e484a@sandeen.net> (Eric Sandeen's message of "Fri, 25 May 2018 12:28:18 -0500")
Eric Sandeen <sandeen@sandeen.net> writes:
> I'm sure you won't like this answer,
Hi, Eric. I know enough about XFS to recognize your name, and it is not
like I am paying for support... So actually I am just grateful for your
reply.
> and I can't base it on empirical evidence, but my first hunch would be
> that your controller did a poor job of recovering from the error, and
> damaged the storage beneath the filesystem.
I admit this is possible, but... We have two RAID containers inside each
JBOD. Each JBOD has a single SAS cable to the hardware RAID card. Only
one of the RAID containers suffered damage; the other container in the
same JBOD is fine.
I can believe the RAID card did not recover particularly gracefully, but
I do not think we lost more than a few blocks on the file system. For
one thing, there wasn't enough time.
Until we ran xfs_repair, that is.
> On a more concrete note, it would be interestting to run xfs_bmap -vv
> on some of those files with zeros and see what extents, if any, cover
> the zeroed ranges. i.e. are they holes, allocated, unwritten, etc.
I tried this on a few of the damaged files. Here is a typical output:
# xfs_bmap -p -v xxx
xxx:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..16255]: 195467240568..195467256823 91 (46229328..46245583) 16256 00000
1: [16256..715959]: 195477629880..195478329583 91 (56618640..57318343) 699704 00000
Looking at the "zeroed" data ranges (there are several), none of them
are near the beginning nor end of either extent.
None of the files I looked at had FLAGS other than 00000.
All of the zeroed ranges I checked are page-aligned (4K multiple).
It really feels like some small amount of damage in one area of the file
system got amplified into corruption across many files' contents by
xfs_repair.
I do not know much about XFS internals, so forgive me if the following
is stupid... I imagine there are global data structures recording the
free/in-use blocks, as well as local data structures recording the
extents used by each file. Is it possible xfs_repair decided to "trust"
some corrupted global data structure instead of the local extents
associated with each file, and responded by wiping parts of the latter?
In general, could anything cause xfs_repair to zero out whole ranges of
blocks allocated to many files?
Thanks again.
- Pat
next prev parent reply other threads:[~2018-05-29 16:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-25 17:02 Temporary drive failure leads to massive data corruption? Patrick J. LoPresti
2018-05-25 17:28 ` Eric Sandeen
2018-05-29 16:51 ` Patrick J. LoPresti [this message]
2018-05-29 17:00 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8636yav20g.fsf@self-evident.org \
--to=lopresti@gmail.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@sandeen.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.