Should we be trying re-write on write errors?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Should we be trying re-write on write errors?
@ 2008-11-14 21:30 greg
  2008-11-14 21:58 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: greg @ 2008-11-14 21:30 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid

Hi Neil, hope the week is ending well for you and the rest of the
denizens on the linux-raid list.

Somewhat of a Gedanken question for you.

We currently attempt a re-write on read error for volumes which have
redundancy, ie. RAID[156] etc, on the bet that we can force a bad
sector remap.  Should we be attempting that (or do we) on a write
error as well?

We ran into the following on one of our many Linux storage boxes this
week:

/var/log/messages:
Nov 13 01:47:36 MACHINE kernel: sd 1:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Nov 13 01:47:44 MACHINE kernel: sd 1:0:1:0: [sdb] Sense Key : Hardware Error [current]
Nov 13 01:47:44 MACHINE kernel: sd 1:0:1:0: [sdb] Add. Sense: Defect list error

/var/log/syslog:
Nov 13 01:47:44 MACHINE kernel: end_request: I/O error, dev sdb, sector 3484469
Nov 13 01:47:44 MACHINE kernel: raid1: Disk failure on sdb3, disabling device.
Nov 13 01:47:44 MACHINE kernel: ^IOperation continuing on 1 devices
Nov 13 01:47:44 MACHINE kernel: RAID1 conf printout:
etc....

The sdb device is a high-end SCSI SCA drive.  I gave the machine a
thorough go over before certifying it back into service.

SMART reports that two sectors have been added to the defect list on
that drive.  Otherwise things are normal, usual collection of ECC
corrected errors etc.

I forced a full physical read of the drive without provoking any
problems.  That was followed by a CHECK run on the MD devices based on
that disk and no issues were noted.  I added the drive back into its
MD devices, resynchronization went without event and things have been
trundling along fine since then.

My analysis of this is that the drive spit a write error back to the
RAID1 driver which kicked the device after following up with a
successful write to its sibling.  The drive's firmware picked up on
the bad write and re-mapped the sector to one of its spares in the
badblock pool on the drive.

In fact the:

Nov 13 01:47:44 MACHINE kernel: sd 1:0:1:0: [sdb] Add. Sense: Defect
list error

Would seem to indicate that the device driver (aic79xx) even knew
what ended up happening on the drive.

It seems to me the RAID1 code could have attempted and probably
succeeded with a sector re-write thus avoiding a situation of dropping
the RAID1 device from full redundancy levels.

Correct analysis or are the realities of the block driver/MD interface
such that this makes a good story with little hope of implementation?

Could the fly in all this be the fact the above error message isn't
telling us about the addition of the defect but rather a problem
adding the block to the remap list?  But if that were the case I would
assume the drive would be problematic and we not be able to get it
back into service.

Thanks much for any enlightenment you can toss to the list on this.

BTW much thanks for the existing re-write code.  Countless mornings
I have said 'gee that Neil Brown was clever' when I see that one of
our machines cleaned up a potential problem before it became a bigger
one.

Best wishes for a pleasant weekend.

As always,
Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"If you think nobody cares if you're alive, try missing a couple of car
 payments."
                                -- Earl Wilson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Should we be trying re-write on write errors?
  2008-11-14 21:30 Should we be trying re-write on write errors? greg
@ 2008-11-14 21:58 ` Neil Brown
  2008-11-15  0:47   ` Keld Jørn Simonsen
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2008-11-14 21:58 UTC (permalink / raw)
  To: greg; +Cc: linux-raid

On Friday November 14, greg@enjellic.com wrote:
> Hi Neil, hope the week is ending well for you and the rest of the
> denizens on the linux-raid list.
> 
> Somewhat of a Gedanken question for you.
> 
> We currently attempt a re-write on read error for volumes which have
> redundancy, ie. RAID[156] etc, on the bet that we can force a bad
> sector remap.  Should we be attempting that (or do we) on a write
> error as well?

I don't think so.
By the time md/raid gets an error status, lower levels (Whether driver
or firmware) should have retried as much as in appropriate.  Doing
further retries at the md level should be pointless.

For reads, we do retry.  But the purpose is to find out exactly which
block failed so that we can just re-write that block.  There is no
expectation that a block which previously failed a read will now
succeed.

Similarly there is no reason to expect that a block which previously
failed a write will now succeed.

I suggest that you might like to discuss your particular case with the
author of the driver for the device.  Maybe the driver should be
retrying.  Maybe the firmware is doing the wrong thing.

After all, you wouldn't expect every different filesystem to retry all
failed writes, would you?

> 
> BTW much thanks for the existing re-write code.  Countless mornings
> I have said 'gee that Neil Brown was clever' when I see that one of
> our machines cleaned up a potential problem before it became a bigger
> one.
:-)
To be honest, that code was largely because people kept complaining
about read errors being too fatal and wanted something done.  The only
way to stop the flood of complaints was to fix something :-)

> 
> Best wishes for a pleasant weekend.

And for you!

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Should we be trying re-write on write errors?
  2008-11-14 21:58 ` Neil Brown
@ 2008-11-15  0:47   ` Keld Jørn Simonsen
  2008-11-15  0:55     ` Greg Freemyer
  0 siblings, 1 reply; 5+ messages in thread
From: Keld Jørn Simonsen @ 2008-11-15  0:47 UTC (permalink / raw)
  To: Neil Brown; +Cc: greg, linux-raid

I would like to write something about this fo the wiki.
What exactly is done, and it is general for all of linux md raid?

best regards
keld

On Sat, Nov 15, 2008 at 08:58:46AM +1100, Neil Brown wrote:
> On Friday November 14, greg@enjellic.com wrote:
> > Hi Neil, hope the week is ending well for you and the rest of the
> > denizens on the linux-raid list.
> > 
> > Somewhat of a Gedanken question for you.
> > 
> > We currently attempt a re-write on read error for volumes which have
> > redundancy, ie. RAID[156] etc, on the bet that we can force a bad
> > sector remap.  Should we be attempting that (or do we) on a write
> > error as well?
> 
> I don't think so.
> By the time md/raid gets an error status, lower levels (Whether driver
> or firmware) should have retried as much as in appropriate.  Doing
> further retries at the md level should be pointless.
> 
> For reads, we do retry.  But the purpose is to find out exactly which
> block failed so that we can just re-write that block.  There is no
> expectation that a block which previously failed a read will now
> succeed.
> 
> Similarly there is no reason to expect that a block which previously
> failed a write will now succeed.
> 
> I suggest that you might like to discuss your particular case with the
> author of the driver for the device.  Maybe the driver should be
> retrying.  Maybe the firmware is doing the wrong thing.
> 
> After all, you wouldn't expect every different filesystem to retry all
> failed writes, would you?
> 
> 
> > 
> > BTW much thanks for the existing re-write code.  Countless mornings
> > I have said 'gee that Neil Brown was clever' when I see that one of
> > our machines cleaned up a potential problem before it became a bigger
> > one.
> :-)
> To be honest, that code was largely because people kept complaining
> about read errors being too fatal and wanted something done.  The only
> way to stop the flood of complaints was to fix something :-)
> 
> > 
> > Best wishes for a pleasant weekend.
> 
> And for you!
> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Should we be trying re-write on write errors?
  2008-11-15  0:47   ` Keld Jørn Simonsen
@ 2008-11-15  0:55     ` Greg Freemyer
  2008-11-17  4:31       ` Ric Wheeler
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Freemyer @ 2008-11-15  0:55 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: Neil Brown, greg, linux-raid

> On Sat, Nov 15, 2008 at 08:58:46AM +1100, Neil Brown wrote:
>> On Friday November 14, greg@enjellic.com wrote:
>> > Hi Neil, hope the week is ending well for you and the rest of the
>> > denizens on the linux-raid list.
>> >
>> > Somewhat of a Gedanken question for you.
>> >
>> > We currently attempt a re-write on read error for volumes which have
>> > redundancy, ie. RAID[156] etc, on the bet that we can force a bad
>> > sector remap.  Should we be attempting that (or do we) on a write
>> > error as well?
>>
>> I don't think so.
>> By the time md/raid gets an error status, lower levels (Whether driver
>> or firmware) should have retried as much as in appropriate.  Doing
>> further retries at the md level should be pointless.
>>
>> For reads, we do retry.  But the purpose is to find out exactly which
>> block failed so that we can just re-write that block.  There is no
>> expectation that a block which previously failed a read will now
>> succeed.
>>
>> Similarly there is no reason to expect that a block which previously
>> failed a write will now succeed.
>>
>> I suggest that you might like to discuss your particular case with the
>> author of the driver for the device.  Maybe the driver should be
>> retrying.  Maybe the firmware is doing the wrong thing.
>>
>> After all, you wouldn't expect every different filesystem to retry all
>> failed writes, would you?
>>
>>
>> >
>> > BTW much thanks for the existing re-write code.  Countless mornings
>> > I have said 'gee that Neil Brown was clever' when I see that one of
>> > our machines cleaned up a potential problem before it became a bigger
>> > one.
>> :-)
>> To be honest, that code was largely because people kept complaining
>> about read errors being too fatal and wanted something done.  The only
>> way to stop the flood of complaints was to fix something :-)
>>
>> >
>> > Best wishes for a pleasant weekend.
>>
>> And for you!
>>
>> NeilBrown

<<Moved from the top post to a bottom post>>

On Fri, Nov 14, 2008 at 7:47 PM, Keld Jørn Simonsen <keld@dkuug.dk> wrote:
> I would like to write something about this fo the wiki.
> What exactly is done, and it is general for all of linux md raid?
>
> best regards
> keld
>

If you are going to document this in a wiki, please document when a
write error can occur because I totally don't understand how this one
occurred.

I thought they could only occur:

1) With bad media on the platter and the reallocatable sectors section
was already 100% utilized

2) Due to a CRC error on the comm path.  (flacky cable / power / etc.)

As I read the below errors, neither of those occurred.  And as Neil
said I believe the retrys related to CRC errors should be handled
below the MD level.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Should we be trying re-write on write errors?
  2008-11-15  0:55     ` Greg Freemyer
@ 2008-11-17  4:31       ` Ric Wheeler
  0 siblings, 0 replies; 5+ messages in thread
From: Ric Wheeler @ 2008-11-17  4:31 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Keld Jørn Simonsen, Neil Brown, greg, linux-raid

Greg Freemyer wrote:
>> On Sat, Nov 15, 2008 at 08:58:46AM +1100, Neil Brown wrote:
>>     
>>> On Friday November 14, greg@enjellic.com wrote:
>>>       
>>>> Hi Neil, hope the week is ending well for you and the rest of the
>>>> denizens on the linux-raid list.
>>>>
>>>> Somewhat of a Gedanken question for you.
>>>>
>>>> We currently attempt a re-write on read error for volumes which have
>>>> redundancy, ie. RAID[156] etc, on the bet that we can force a bad
>>>> sector remap.  Should we be attempting that (or do we) on a write
>>>> error as well?
>>>>         
>>> I don't think so.
>>> By the time md/raid gets an error status, lower levels (Whether driver
>>> or firmware) should have retried as much as in appropriate.  Doing
>>> further retries at the md level should be pointless.
>>>
>>> For reads, we do retry.  But the purpose is to find out exactly which
>>> block failed so that we can just re-write that block.  There is no
>>> expectation that a block which previously failed a read will now
>>> succeed.
>>>
>>> Similarly there is no reason to expect that a block which previously
>>> failed a write will now succeed.
>>>
>>> I suggest that you might like to discuss your particular case with the
>>> author of the driver for the device.  Maybe the driver should be
>>> retrying.  Maybe the firmware is doing the wrong thing.
>>>
>>> After all, you wouldn't expect every different filesystem to retry all
>>> failed writes, would you?
>>>
>>>
>>>       
>>>> BTW much thanks for the existing re-write code.  Countless mornings
>>>> I have said 'gee that Neil Brown was clever' when I see that one of
>>>> our machines cleaned up a potential problem before it became a bigger
>>>> one.
>>>>         
>>> :-)
>>> To be honest, that code was largely because people kept complaining
>>> about read errors being too fatal and wanted something done.  The only
>>> way to stop the flood of complaints was to fix something :-)
>>>
>>>       
>>>> Best wishes for a pleasant weekend.
>>>>         
>>> And for you!
>>>
>>> NeilBrown
>>>       
>
> <<Moved from the top post to a bottom post>>
>
> On Fri, Nov 14, 2008 at 7:47 PM, Keld Jørn Simonsen <keld@dkuug.dk> wrote:
>   
>> I would like to write something about this fo the wiki.
>> What exactly is done, and it is general for all of linux md raid?
>>
>> best regards
>> keld
>>
>>     
>
> If you are going to document this in a wiki, please document when a
> write error can occur because I totally don't understand how this one
> occurred.
>
> I thought they could only occur:
>
> 1) With bad media on the platter and the reallocatable sectors section
> was already 100% utilized
>
> 2) Due to a CRC error on the comm path.  (flacky cable / power / etc.)
>
> As I read the below errors, neither of those occurred.  And as Neil
> said I believe the retrys related to CRC errors should be handled
> below the MD level.
>
> Greg
>   

Most of the common write errors you see should not be retried, but you 
might see some writes fail due to transient conditions.

One possible condition would be vibrations, for example as you wheel a 
rack around in your data center or you bang into the computer.

 If you are using a SAN, you might also have transient link errors that 
will go away once the switch rights itself or someone plugs back in a 
new cable...

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-11-17  4:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-14 21:30 Should we be trying re-write on write errors? greg
2008-11-14 21:58 ` Neil Brown
2008-11-15  0:47   ` Keld Jørn Simonsen
2008-11-15  0:55     ` Greg Freemyer
2008-11-17  4:31       ` Ric Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).