Re: feature re-quest for "re-write"

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eyal Lebedinsky <eyal@eyal.emu.id.au>
Cc: list linux-raid <linux-raid@vger.kernel.org>
Subject: Re: feature re-quest for "re-write"
Date: Tue, 25 Feb 2014 22:08:29 +1100	[thread overview]
Message-ID: <530C79AD.2020400@eyal.emu.id.au> (raw)
In-Reply-To: <20140225193501.080a8e61@notabene.brown>

This is helpful Neil.

I am running blktrace/blkparse and trying to understand what it is telling
me.

If I got it right then I see that doing a check of md127 (from the start)
starts reading with this entry

8,129  6      327     0.992307218 20259  D   R 264200 + 504 [md127_resync]

which means that the real data starts rather further into the stripes.
Actually, further than the bad block: sector 259648 of sdi1 is before the
first read operation. Though I am not even sure that the blkparse 264200
is sectors and now 1KB blocks or 4KB blocks.

Following is some speculation.

Does md127 store a header before it starts striping the data? May this
be why it rarely actually needs to read parts of this header?
(I thought that superblocks and what not are stored at the far end).

If so, then the content of this sector is not part of the redundant data and may
not be trivial to recover. Then again, I expect important data is recorded more
than once.

If this is the case then the calculation to correlate the bad sector to the fs
block (which I need to do whenever I find a bad sector in order to investigate
my data loss) is more complicated than I assumed.

Final thought: if this sector is in an important header, when it *does* need
to be read (and fail), how bad a reaction should I expect?

Eyal

On 02/25/14 19:35, NeilBrown wrote:
> On Tue, 25 Feb 2014 18:58:16 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
> wrote:
>
>> BTW, Is there a monitoring tool to trace all i/o to a device? I could then
>> log activity to /dev/sd[c-i]1 during a (short) 'check' and see if all sectors
>> are really read. Or does md have a debug facility for this?
>
> blktrace will collect a trace, blkparse will print it out for you.
> You need to trace the 'whole' device.
>
> So something like
>
>    blktrace /dev/sd[c-i]
>    # run the test
>    ctrl-C
>    blkparse sd[c-i]*
>
> blktrace creates several files, I think one for each device on each CPU.
>
>
> NeilBrown
>
>>
>> Eyal
>>
>> On 02/25/14 14:16, NeilBrown wrote:
>>> On Tue, 25 Feb 2014 07:39:14 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au>
>>> wrote:
>>>
>>>> My main interest is to understand why 'check' does not actually check.
>>>> I already know how to fix the problem, by writing to the location I
>>>> can force the pending reallocation to happen, but then I will not have
>>>> the test case anymore.
>>>>
>>>> The OP asks for a specific solution, but I think that the 'check' action
>>>> should already correctly rewrite failed (i/o error) sectors. It does not
>>>> always know which sector to rewrite when it finds a raid6 mismatch
>>>> without an i/o error (with raid5 it never knows).
>>>>
>>>
>>> I cannot reproduce the problem.  In my testing a read error is fixed by
>>> 'check'.  For you it clearly isn't.  I wonder what is different.
>>>
>>> During normal 'check' or 'repair' etc the read requests are allowed to be
>>> combined by the io scheduler so when we get a read error, it could be one
>>> error for a megabyte of more of the address space.
>>> So the first thing raid5.c does is arrange to read all the blocks again but
>>> to prohibit the merging of requests.  This time any read error will be for a
>>> single 4K block.
>>>
>>> Once we have that reliable read error the data is constructed from the other
>>> blocks and the new block is written out.
>>>
>>> This suggests that when there is a read error you should see e.g.
>>>
>>> [  714.808494] end_request: I/O error, dev sds, sector 8141872
>>>
>>> then shortly after that another similar error, possibly with a slightly
>>> different sector number (at most a few thousand sectors later).
>>>
>>> Then something like
>>>
>>> md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
>>>
>>>
>>> However in the log Mikael Abrahamsson posted on 16 Jan 2014
>>> (Subject: Re: read errors not corrected when doing check on RAID6)
>>>
>>> we only see that first 'end_request' message.  No second one and no "read
>>> error corrected".
>>>
>>> This seems to suggest that the second read succeeded, which is odd (to say
>>> the least).
>>>
>>> In your log posted 21 Feb 2014
>>> (Subject: raid 'check' does not provoke expected i/o error)
>>> there aren't even any read errors during 'check'.
>>> The drive sometimes reports a read error and something doesn't?
>>> Does reading the drive with 'dd' already report an error, and with 'check'
>>> never report an error?
>>>
>>>
>>>
>>> So I'm a bit stumped.  It looks like md is doing the right thing, but maybe
>>> the drive is getting confused.
>>> Are all the people who report this using the same sort of drive??
>>>
>>> NeilBrown
>>>
>>
>

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au)

next prev parent reply	other threads:[~2014-02-25 11:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-21 18:09 feature re-quest for "re-write" Mikael Abrahamsson
2014-02-24  1:30 ` Brad Campbell
2014-02-24  1:46   ` Eyal Lebedinsky
2014-02-24  2:11     ` Brad Campbell
2014-02-24  3:40       ` Eyal Lebedinsky
2014-02-24 14:14         ` Wilson Jonathan
2014-02-24 20:39           ` Eyal Lebedinsky
2014-02-25  3:16             ` NeilBrown
2014-02-25  5:58               ` Eyal Lebedinsky
2014-02-25  7:05                 ` Stan Hoeppner
2014-02-25  7:45                   ` Eyal Lebedinsky
2014-02-25  7:58               ` Eyal Lebedinsky
2014-02-25  8:35                 ` NeilBrown
2014-02-25 11:08                   ` Eyal Lebedinsky [this message]
2014-02-25 11:28                     ` Mikael Abrahamsson
2014-02-25 12:05                       ` Eyal Lebedinsky
2014-02-25 12:17                         ` Mikael Abrahamsson
2014-02-25 12:32                           ` Eyal Lebedinsky
2014-02-24  2:42   ` Mikael Abrahamsson
2014-02-24  2:24 ` Brad Campbell
2014-02-25  2:10   ` NeilBrown
2014-02-25  2:26     ` Brad Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=530C79AD.2020400@eyal.emu.id.au \
    --to=eyal@eyal.emu.id.au \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).