From: "Edwin Török" <edvin.torok@citrix.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>
Cc: fstests@vger.kernel.org, Mark Syms <Mark.Syms@citrix.com>,
Tim Smith <Tim.Smith@citrix.com>,
Ross Lagerwall <Ross.Lagerwall@citrix.com>
Subject: Re: [PATCH] Add new tests/generic/536: intermittent I/O errors must not corrupt a filesystem
Date: Fri, 22 Mar 2019 14:42:27 +0000 [thread overview]
Message-ID: <4d660c49-e8ba-2dfc-2300-9d9d648e213f@citrix.com> (raw)
In-Reply-To: <20190321202348.GA1180@magnolia>
On 21/03/2019 20:23, Darrick J. Wong wrote:
> On Thu, Mar 21, 2019 at 10:30:46AM +0000, Edwin Török wrote:
>> Based on tests/generic/347.
>>
>> In our lab we've found that if multiple iSCSI connection errors are
>> detected (without completely loosing the iSCSI connection) then the GFS2
>> filesystem becomes corrupt due to differences in filesystem and device blocksizes.
>> Add a test that explicitly checks for this by simulating I/O errors
>> deterministically with dm-thin.
>
> How is this different from generic/475? Is there something specific to
> thin pools here (vs. using dm-error to simulate the errors)?
When I tried generic/475 it hanged in unmount and never reached the data corruption part.
Thanks for the suggestion, dm-error would be better than dm-thin, see below.
On 21/03/2019 21:26, Dave Chinner wrote:> On Thu, Mar 21, 2019 at 10:30:46AM +0000, Edwin Török wrote:
>> Based on tests/generic/347.
>>
>> In our lab we've found that if multiple iSCSI connection errors are
>> detected (without completely loosing the iSCSI connection) then the GFS2
>> filesystem becomes corrupt due to differences in filesystem and device blocksizes.
>> Add a test that explicitly checks for this by simulating I/O errors
>> deterministically with dm-thin.
>
> Exactly what IO errors is dm-thinp generating here? If you run it
> out of space, then it triggers ENOSPC, not EIO. That's very, very
> different to iSCSI throwing random EIO errors..
I agree that dm-error would be a better starting place than dm-thin for this test,
I'll try to modify it and see if I can get it to finish running without hanging, and reproduce the corruption issue.
On 21/03/2019 21:26, Dave Chinner wrote:> On Thu, Mar 21, 2019 at 10:30:46AM +0000, Edwin Török wrote:
>> +# now remount the filesystem without triggering IO errors,
>> +# and check that the filesystem is not corrupt
>> +_dmthin_cycle_mount
>> +# ls --color makes ls stat each file, which finds the corruption
>
> Not sure it always does - ISTR that in the past if the dtype
> returned indicated the type of file, then it ls would omit the stat
> just for the purposes of coloring....
>
> And, realistically, the way we find /filesystem/ corruption is to
> run fsck/repair, not iterate the directory structure.
I don't disagree, however GFS2's fsck is very noisy and complains about inconsistencies
even on a filesystem where I can otherwise list and read each entry correctly.
I wanted to make a clear distinction between that and actual corruption observed, so that the 2 bugs
can be fixed independently.
Perhaps the test should first do an 'ls/stat', and if that is fine then unmount and run the filesystem check as usual.
> If we are
> looking for missing files, then we dump the directory structure to
> the golden output file or dump it before/after errors and compare
> that they are the same.
>
>> +ls --color=always $SCRATCH_MNT/ >/dev/null || _fail "Failed to list filesystem after remount"
>> +ls --color=always $SCRATCH_MNT/ >/dev/null || _fail "Failed to list filesystem after remount"
>> +ls --color=always $SCRATCH_MNT/ >/dev/null || _fail "Failed to list filesystem after remount"
>
> If corruption is not found on the first pass, why would the next 2
> passes find anything different?
Indeed, I'll drop them.
Thanks,
--Edwin
next prev parent reply other threads:[~2019-03-22 14:42 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-21 10:30 [PATCH] Add new tests/generic/536: intermittent I/O errors must not corrupt a filesystem Edwin Török
2019-03-21 20:23 ` Darrick J. Wong
2019-03-22 14:42 ` Edwin Török [this message]
2019-03-21 21:26 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d660c49-e8ba-2dfc-2300-9d9d648e213f@citrix.com \
--to=edvin.torok@citrix.com \
--cc=Mark.Syms@citrix.com \
--cc=Ross.Lagerwall@citrix.com \
--cc=Tim.Smith@citrix.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox