From: Chris Dunlop <chris@onthe.net.au>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: file corruptions, 2nd half of 512b block
Date: Thu, 29 Mar 2018 11:15:59 +1100 [thread overview]
Message-ID: <20180329001559.GA21914@onthe.net.au> (raw)
In-Reply-To: <20180328180916.GC37735@bfoster.bfoster>
On Wed, Mar 28, 2018 at 02:09:16PM -0400, Brian Foster wrote:
> On Wed, Mar 28, 2018 at 09:33:10AM +1100, Chris Dunlop wrote:
>> On Thu, Mar 22, 2018 at 02:03:28PM -0400, Brian Foster wrote:
>>> On Fri, Mar 23, 2018 at 02:02:26AM +1100, Chris Dunlop wrote:
>>>> Hi,
>>>>
>>>> I'm experiencing 256-byte corruptions in files on XFS on 4.9.76.
>>> FWIW, the patterns that you have shown so far do seem to suggest
>>> something higher level than a physical storage problem. Otherwise, I'd
>>> expect these instances wouldn't always necessarily land in file data.
>>> Have you run 'xfs_repair -n' on the fs to confirm there aren't any other
>>> problems?
>>
>> I haven't tried xfs_repair yet. At 181T used and high but unknown at this
>> point number of dirs and files, I imagine it will take quite a while and the
>> filesystem shouldn't really be unavailable for more than low numbers of
>> hours. I can use an LVM snapshot to do the 'xfs_repair -n', but I need to
>> add enough spare capacity to hold the amount of data that arrives (at
>> 0.5-1TB/day) during life of the check / snapshot. That might take a bit of
>> fiddling because the system is getting short on drive bays.
>>
>> Is it possible to work out approximately how long the check might take?
>
> It will probably depend more on the amount of metadata than the size of
> the fs. That said, it's not critical if downtime is an issue. It's more
> something to check when convenient just to be sure there aren't other
> issues in play.
It's not looking too good in terms of how much metadata: I've had
"dircnt" (https://github.com/ChristopherSchultz/fast-file-count) running
for over 24 hours now and it's still going... (unfortunately it doesn't
allow for SIGUSR1 to report current stats a la dd). I guess a simple
directory scan like that is going to be significantly quicker than the
'xfs_repair -n' - unless 'xfs_repair' uses optimisations not available
to a simple directory scan?
>> I have a number of instances where it definitely looks like the file has
>> made it to the filesystem (but not necessarily disk) and checked ok, only to
>> later fail the md5 check, e.g.:
>>
>> 2018-03-12 07:36:56 created
>> 2018-03-12 07:50:05 check ok
>> 2018-03-26 19:02:14 check bad
>>
>> 2018-03-13 08:13:10 created
>> 2018-03-13 08:36:56 check ok
>> 2018-03-26 14:58:39 check bad
>>
>> 2018-03-13 21:06:34 created
>> 2018-03-13 21:11:18 check ok
>> 2018-03-26 19:24:24 check bad
>
> How much is known about possible events related to the file between the
> time the check passes and when the md5 goes bad? For example, do we know
> for certain nothing read or otherwise acted on the file in that time?
>
> If so, it certainly seems like the difference between check ok and check
> bad could be due to cache effects.
At least some of the files were read between the ok and bad checks. In
at least one case the reader complained about a decompression error - in
fact that that was what started me looking into this in detail.
>> ... Most of the time, 'vmtouch -e' clears the
>> file from buffers immediately, but sometimes it leaves a single page
>> resident, even in the face of repeated calls. ...
>>
>> Any idea what that impressively persistent page is about?
>
> Hm, not sure. I see that behavior on one file that was recently cached
> in my dev tree. A local copy of the same file shows the same thing. If I
> copy to a separate fs on another vm (with a newer kernel), I don't see
> that behavior. I'm not sure off hand what the difference is, perhaps it
> has something to do with the kernel. But this is all debug logic so I
> wouldn't worry too much about doing excessive numbers of loops and
> whatnot unless this behavior proves to be somehow relevant to the
> problem.
>
> FWIW, 'vmtouch -v' shows a little table of which pages are actually
> present in the file. In my test, the tail page is the one that persists.
> More importantly, it might be useful to use 'vmtouch -v' in your checks
> above. That way we actually have a record of whether the particular
> corrupted page was cached between a 'check ok' -> 'check bad'
> transition.
Tks, I'll add that to the check script.
>>>> "cmp -l badfile goodfile" shows there are 256 bytes differing, in the
>>>> 2nd half of (512b) block 53906431.
>>>
>>> FWIW, that's the last (512b) sector of the associated (4k) page. Does
>>> that happen to be consistent across whatever other instances you have a
>>> record of?
>>
>> Huh, I should have noticed that! Yes, all corruptions are the last 256b of a
>> 4k page. And in fact all are the last 256b in the first 4k page of an 8k
>> block. That's odd as well!
>
> Ok, that's potentially interesting. But what exactly do you mean by an
> 8k block? This is a 4k block filesystem, correct? Are you just saying
> that the pages that contain the corruption all happen to be at 8k
> aligned offsets?
Yes, I meant 8k aligned offsets. But it turns out I was wrong, they're
not consistently placed within 8k aligned offsets - sorry for the false
alarm. See also the file/source/corrupt table in email to Dave.
> Brian
Chris
prev parent reply other threads:[~2018-03-29 0:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-22 15:02 file corruptions, 2nd half of 512b block Chris Dunlop
2018-03-22 18:03 ` Brian Foster
2018-03-22 23:04 ` Dave Chinner
2018-03-22 23:26 ` Darrick J. Wong
2018-03-22 23:49 ` Dave Chinner
2018-03-28 15:20 ` Chris Dunlop
2018-03-28 22:27 ` Dave Chinner
2018-03-29 1:09 ` Chris Dunlop
2018-03-27 22:33 ` Chris Dunlop
2018-03-28 18:09 ` Brian Foster
2018-03-29 0:15 ` Chris Dunlop [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180329001559.GA21914@onthe.net.au \
--to=chris@onthe.net.au \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).