Re: nonzero mismatch_cnt with no earlier error

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eyal Lebedinsky <eyal@eyal.emu.id.au>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid list <linux-raid@vger.kernel.org>,
	list linux-ide <linux-ide@vger.kernel.org>
Subject: Re: nonzero mismatch_cnt with no earlier error
Date: Mon, 26 Feb 2007 19:18:45 +1100	[thread overview]
Message-ID: <45E297E5.50303@eyal.emu.id.au> (raw)
In-Reply-To: <17890.25524.465403.130119@notabene.brown>

I CC'ed linux-ide to see if they think the reported error was really innocent:

Question: does this error report suggest that a disk could be corrupted?

This SATA disk is part of an md raid and no error was reported by md.

[937567.332751] ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4190002 action 0x2
[937567.354094] ata3.00: cmd b0/d5:01:09:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 512 in
[937567.354096]          res 51/04:83:45:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[937568.120783] ata3: soft resetting port
[937568.282450] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[937568.306693] ata3.00: configured for UDMA/100
[937568.319733] ata3: EH complete
[937568.361223] SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
[937568.397207] sdc: Write Protect is off
[937568.408620] sdc: Mode Sense: 00 3a 00 00
[937568.453522] SCSI device sdc: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Neil Brown wrote:
> On Saturday February 24, eyal@eyal.emu.id.au wrote:
> 
>>But is this not a good opportunity to repair the bad stripe for a very
>>low cost (no complete resync required)?
> 
> 
> In this case, 'md' knew nothing about an error.  The SCSI layer
> detected something and thought it had fixed it itself.  Nothing for md
> to do.

I expected this. So either the scsi layer incorrectly held back the error
report of the mismatch_cnt is due to something unrelated to the disk
i/o failure.

>>At time of error we actually know which disk failed and can re-write
>>it, something we do not know at resync time, so I assume we always
>>write to the parity disk.

Again, as I expected, resync cannot correct a problem, effectively
"blaming" the parity block. To know which block to correct one needs
a higher level parity code (can raid6 correct single bit/disk read
errors?).

> md only knows of a 'problem' if the lower level driver reports one.
> If it reports a problem for a write request, md will fail the device.
> If it reports a problem for a read request, md will try to over-write
> correct data on the failed block. 
> But if the driver doesn't report the failure, there is nothing md can
> do.
> 
> When performing a check/repair md looks for consistencies and fixes
> the 'arbitrarily'.  For raid5/6, it just 'corrects' the parity.  For
> raid1/10, it chooses one block and over-writes the other(s) with it.
> 
> Mapping these corrections back to blocks in files in the filesystem is
> extremely non-trivial.
> 
> NeilBrown

-- 
Eyal Lebedinsky (eyal@eyal.emu.id.au) <http://samba.org/eyal/>
	attach .zip as .dat

next prev parent reply	other threads:[~2007-02-26  8:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-24  0:23 nonzero mismatch_cnt with no earlier error Eyal Lebedinsky
2007-02-24  0:30 ` Justin Piszcz
2007-02-24  0:59   ` Eyal Lebedinsky
2007-02-26  4:36     ` Neil Brown
2007-02-26  5:46       ` Jeff Breidenbach
2007-02-26  8:18       ` Eyal Lebedinsky [this message]
2007-03-05  4:00         ` Tejun Heo
2007-02-24  6:58 ` Eyal Lebedinsky
2007-02-24  9:14   ` Justin Piszcz
2007-02-24  9:37     ` Justin Piszcz
2007-02-24  9:48       ` Jason Rainforest
2007-02-24  9:50         ` Justin Piszcz
2007-02-24  9:59           ` Jason Rainforest
2007-02-24 10:01             ` Justin Piszcz
2007-02-24 11:09         ` Michael Tokarev
2007-02-24 11:12           ` Justin Piszcz
2007-02-25 20:02             ` Bill Davidsen
2007-02-25 18:33 ` Frank van Maarseveen
2007-02-25 19:58   ` Christian Pernegger
2007-02-25 21:07     ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45E297E5.50303@eyal.emu.id.au \
    --to=eyal@eyal.emu.id.au \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).