From: Douglas Gilbert <dougg@torque.net>
To: ric@emc.com
Cc: Jeff Garzik <jeff@garzik.org>, Mark Lord <liml@rtr.ca>,
"Eric D. Mudama" <edmudama@gmail.com>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
linux-kernel@vger.kernel.org,
IDE/ATA development list <linux-ide@vger.kernel.org>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR
Date: Wed, 31 Jan 2007 10:28:33 -0500 [thread overview]
Message-ID: <45C0B5A1.8000506@torque.net> (raw)
In-Reply-To: <45C0A985.7010402@emc.com>
Ric Wheeler wrote:
>
>
> Jeff Garzik wrote:
>> Mark Lord wrote:
>>> Eric D. Mudama wrote:
>>>>
>>>> Actually, it's possibly worse, since each failure in libata will
>>>> generate 3-4 retries. With existing ATA error recovery in the
>>>> drives, that's about 3 seconds per retry on average, or 12 seconds
>>>> per failure. Multiply that by the number of blocks past the error
>>>> to complete the request..
>>>
>>> It really beats the alternative of a forced reboot
>>> due to, say, superblock I/O failing because it happened
>>> to get merged with an unrelated I/O which then failed..
>>> Etc..
>>
>>
>> FWIW -- speaking generally -- I think there are inevitable areas where
>> libata error handling combined with SCSI error handling results in
>> suboptimal error handling.
>>
>> Just creating a list of "<this behavior> should be handled <this way>,
>> but in reality is handled in <this silly way>" would be very helpful.
>
> I agree - Tejun has done a great job at giving us a great base. Next
> step is to get clarity on what the types of errors are and how to
> differentiate between them (and maybe how that would change by class of
> device?).
>
>>
>> Error handling is tough to get right, because the code is exercised so
>> infrequently. Tejun has actually done an above-average job here, by
>> making device probe, hotplug and other "exceptions" go through the
>> libata EH code, thereby exercising the EH code more than one might
>> normally assume.
>>
>> Some errors in libata probably should not be retried more than once,
>> when we have a definitive diagnosis. Suggestions for improvements are
>> welcome.
>>
>> Jeff
>
> One thing that we find really useful is to inject real errors into
> devices. Mark has some patches that let us inject media errors, we also
> bring back failed drives and run them through testing and occasionally
> get to use analyzers, etc to inject odd ball errors.
Ric,
Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added
command support to flag a block as "uncorrectable". There
is no need to send bad "long" data to it and suppress the
disk's automatic re-allocation logic.
In the case of ATA it is the WRITE UNCORRECTABLE command.
In the case of SCSI it is the WR_UNCOR bit in the WRITE
LONG command.
It seems that due to SAT any useful capability in the ATA
command set will soon appear in the corresponding SCSI
command set, if it is not already there.
Doug Gilbert
next prev parent reply other threads:[~2007-01-31 15:28 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-31 0:47 [PATCH] scsi_lib.c: continue after MEDIUM_ERROR Mark Lord
2007-01-31 1:12 ` [PATCH] RESEND " Mark Lord
2007-01-31 1:16 ` [PATCH] " James Bottomley
2007-01-31 1:36 ` Mark Lord
[not found] ` <311601c90701301725n53d25a74g652b7ca3bfc64c56@mail.gmail.com>
2007-01-31 1:41 ` Mark Lord
2007-01-31 3:20 ` Ric Wheeler
2007-01-31 4:21 ` James Bottomley
2007-01-31 15:13 ` Mark Lord
2007-01-31 15:22 ` Mark Lord
2007-01-31 15:24 ` James Bottomley
2007-01-31 5:09 ` Douglas Gilbert
2007-01-31 15:08 ` Mark Lord
2007-01-31 15:23 ` Alan
2007-01-31 16:35 ` Ric Wheeler
2007-01-31 17:57 ` Mark Lord
2007-01-31 18:13 ` James Bottomley
2007-01-31 18:37 ` Mark Lord
2007-01-31 9:30 ` Jeff Garzik
2007-01-31 14:36 ` Ric Wheeler
2007-01-31 15:28 ` Douglas Gilbert [this message]
2007-01-31 15:38 ` Mark Lord
2007-02-01 20:02 ` Mark Lord
2007-02-01 21:55 ` James Bottomley
2007-02-02 2:48 ` Mark Lord
2007-02-02 12:20 ` Ric Wheeler
2007-02-02 14:42 ` Alan
2007-02-02 14:53 ` James Bottomley
2007-02-02 16:16 ` Ric Wheeler
2007-02-02 20:16 ` Douglas Gilbert
2007-02-02 14:50 ` Alan
2007-02-02 16:06 ` Mark Lord
2007-02-02 19:49 ` Matt Mackall
2007-02-02 22:58 ` Mark Lord
2007-02-02 23:07 ` Matt Mackall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45C0B5A1.8000506@torque.net \
--to=dougg@torque.net \
--cc=James.Bottomley@hansenpartnership.com \
--cc=edmudama@gmail.com \
--cc=jeff@garzik.org \
--cc=liml@rtr.ca \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=ric@emc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).