Re: Is there a drive error "retry" parameter?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Carlos Knowlton <cknowlton@update.fsix.com>
To: linux-raid@vger.kernel.org
Subject: Re: Is there a drive error "retry" parameter?
Date: Tue, 14 Jun 2005 16:53:17 -0500	[thread overview]
Message-ID: <42AF51CD.7050102@update.fsix.com> (raw)
In-Reply-To: <429F3ED5.4020005@tls.msk.ru>

Hi Michael,

Michael Tokarev wrote:

>Carlos Knowlton wrote:
>  
>
>>I want to understand exactly what is going on in the Software RAID 5
>>code when a drive is marked "dirty", and booted from the array.  Based
>>on what I've read so far, it seems that this happens any time the RAID
>>software runs into a read or write error that might have been corrected
>>by fsck (if it had been there first).  Is this true?
>>    
>>
>
>You're mixing up 2 very different things here.  Very different.
>
>Fsck has nothing to do with raid, per se.  Fsck checks the filesystem
>which is on top of a block device (be it a raid array, a disk, or a
>loopback device, whatever).  It does not understand/know what is "raid",
>at all.  Speaking of raid, the filesystem is an upper-level stuff.  Again,
>raid code knows nothing about filesystems or any data it stores.  Also,
>filesystem obviously does not know about underlying components of the
>raid array where the filesystem resides -- so fsck can NOT "fix" whatever
>error happened two layers down the stack (fs, raid, underlying devices).
>
From the other side, raid code ensures (or tries to, anyway) that any
>errors in underlying (components) devices will not propagate to the
>upper level (be it a filesystem, database or anything else - raid does
>not care what data it stores).  It is here to "hide" whatever errors
>may happen on the physical device (disk drive).  Currently, if enouth
>drives fails, raid array will be "shut down" so that the upper level
>(eg filesystem) can't even access the whole raid array.  Until that
>happens, there should be no errors propagated to the filesystem layer,
>all such errors will be corrected by raid code, ensuring that it will
>read the same data as has been written to it.
>  
>
Thanks, that is good to know!   I had read a discussion from this list a 
few months ago that I must have gotten the wrong impression from. 
<http://marc.theaimsgroup.com/?l=linux-raid&m=108852478803297&w=2>.

Maybe you can help me clarify some other misconceptions I have.  For 
instance, I had heard that with most modern hard disks, when they run 
into a bad sector, they will map around that sector, and copy the data 
to another place on the disk.  Do you know if this is true?  If so, how 
does this impact RAID? (ie, Is RAID benefited by this, or does it 
override it?)


>>Is there a "retry" parameter that can be set in the kernel parameters,
>>or else in the code itself to prolong the existence of a drive in an
>>array before it is considered dirty?
>>    
>>
>
>There's no such parameter currently.  But there was several discussions
>about how to make raid code more robust - in particular, in case of
>read error, raid code may keep the errored drive in the array and mark
>it dirty only in case of write error.
>  
>
That would be nice.  Do you know if anyone has done any work toward such 
a fix?

>>If so, I would like to increase it in my environment, because it seems
>>like I'm losing drives in my array that are often still quite stable.
>>    
>>
>
>I think you have to provide some more information.  Kernel logging tells
>alot of details about what exactly happening and what the raid code is
>doing as a result of that.
>  
>
Unfortunately, I don't have the logs handy, but I'll post something next 
time I see it.   I built several RAID servers for some customers over a 
year ago, and they have reported drive failures.  We have replaced these 
and when we tested the old drives they were still in fairly good 
condition.  So for the last little while, I have just reinserted the 
drive back into the array, and it usually doesn't cause any trouble 
again (though occasionally a different drive will fail).  If there is a 
way to keep the drive in the array a little longer,  when a read error 
is detected, it would really help!


Thanks!
Carlos Knowlton

next prev parent reply	other threads:[~2005-06-14 21:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-02 16:24 apparent but not real raid1 failure. what happened? still confused. Gurus Please help Mitchell Laks
2005-05-02 18:20 ` Peter T. Breuer
2005-06-02 15:23 ` Is there a drive error "retry" parameter? Carlos Knowlton
2005-06-02 17:16   ` Michael Tokarev
2005-06-03  9:21     ` danci
2005-06-14 21:53     ` Carlos Knowlton [this message]
2005-06-14 22:46       ` Michael Tokarev
2005-06-15 21:40         ` Carlos Knowlton
2005-06-16  0:20         ` Paul Clements
2005-06-16 16:23           ` Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42AF51CD.7050102@update.fsix.com \
    --to=cknowlton@update.fsix.com \
    --cc=cknowlton@science.edu \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).