* non-zero result with NO_SENSE
@ 2006-11-02 14:28 Laurie Costello
2006-11-02 16:10 ` Douglas Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: Laurie Costello @ 2006-11-02 14:28 UTC (permalink / raw)
To: linux-scsi
I'm researching a problem with data corruption on linux 2.6.9.
I'm seeing results of scsi_print_sense() in the system log, which brought me
to this chunk of code. Is it correct to process a non-zero result with
NO_SENSE as if an error didn't occur or was recovered?
Laurie Costello
Oct 9 08:16:03 localhost kernel: Info fld=0x0, Current sdv: sense key No Sense
/* An error occurred */
if (driver_byte(result) != 0 && /* An error occurred */
(SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */
switch (SCpnt->sense_buffer[2]) {
......deleted some case statement code
case RECOVERED_ERROR: /* an error occurred, but it recovered */
case NO_SENSE: /* LLDD got sense data */
/*
* Inform the user, but make sure that it's not treated
* as a hard error.
*/
scsi_print_sense("sd", SCpnt);
SCpnt->result = 0;
SCpnt->sense_buffer[0] = 0x0;
good_bytes = this_count;
break;
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: non-zero result with NO_SENSE
2006-11-02 14:28 non-zero result with NO_SENSE Laurie Costello
@ 2006-11-02 16:10 ` Douglas Gilbert
0 siblings, 0 replies; 4+ messages in thread
From: Douglas Gilbert @ 2006-11-02 16:10 UTC (permalink / raw)
To: Laurie Costello; +Cc: linux-scsi
Laurie Costello wrote:
> I'm researching a problem with data corruption on linux 2.6.9.
>
> I'm seeing results of scsi_print_sense() in the system log, which brought me
>
> to this chunk of code. Is it correct to process a non-zero result with
>
> NO_SENSE as if an error didn't occur or was recovered?
Laurie,
Well it isn't ideal obviously. The problem is that sd
sits between a SCSI direct access device and the SCSI
block subsystem. SCSI devices try and be helpful and
tell the application client things like: the data was
read but required 3 retries and ECC. The block layer
is only interested in errors that impact the current
IO, so sd has no slot to file these warnings in, apart
from the system log. There was some proposal about
logging such errors and warning (but where to store
the log :-) ) and disks already do a fair amount of
logging themselves.
Even more worrying are deferred errors. Something like
a cached write that at some later time gets an IO error
when the disk tries to write its cache to the media.
This should become a more interesting area as disks get
larger non-volatile caches.
There are tools such as smartmontools that can monitor
the health of disks.
Doug Gilbert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: non-zero result with NO_SENSE
@ 2006-11-02 16:58 Laurie Costello
2006-11-02 19:06 ` Douglas Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: Laurie Costello @ 2006-11-02 16:58 UTC (permalink / raw)
To: dougg; +Cc: linux-scsi
I understand this for the case of RECOVERED_ERROR. But for the
case of NO_SENSE the driver didn't supply any extra information with
the failure. Why does this code assume the request completed okay?
good_bytes is set to 0 based on the non-zero result in the scsi_cmnd
result field. Then its reset based on NO_SENSE. Not returning more
sense information would be poor driver behavior but changing result to 0
seems dangerous.
Laurie
----- Original Message ----
From: Douglas Gilbert <dougg@torque.net>
To: Laurie Costello <lmcostello@yahoo.com>
Cc: linux-scsi@vger.kernel.org
Sent: Thursday, November 2, 2006 10:10:16 AM
Subject: Re: non-zero result with NO_SENSE
Laurie Costello wrote:
> I'm researching a problem with data corruption on linux 2.6.9.
>
> I'm seeing results of scsi_print_sense() in the system log, which brought me
>
> to this chunk of code. Is it correct to process a non-zero result with
>
> NO_SENSE as if an error didn't occur or was recovered?
Laurie,
Well it isn't ideal obviously. The problem is that sd
sits between a SCSI direct access device and the SCSI
block subsystem. SCSI devices try and be helpful and
tell the application client things like: the data was
read but required 3 retries and ECC. The block layer
is only interested in errors that impact the current
IO, so sd has no slot to file these warnings in, apart
from the system log. There was some proposal about
logging such errors and warning (but where to store
the log :-) ) and disks already do a fair amount of
logging themselves.
Even more worrying are deferred errors. Something like
a cached write that at some later time gets an IO error
when the disk tries to write its cache to the media.
This should become a more interesting area as disks get
larger non-volatile caches.
There are tools such as smartmontools that can monitor
the health of disks.
Doug Gilbert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: non-zero result with NO_SENSE
2006-11-02 16:58 Laurie Costello
@ 2006-11-02 19:06 ` Douglas Gilbert
0 siblings, 0 replies; 4+ messages in thread
From: Douglas Gilbert @ 2006-11-02 19:06 UTC (permalink / raw)
To: Laurie Costello; +Cc: linux-scsi
Laurie Costello wrote:
> I understand this for the case of RECOVERED_ERROR. But for the
> case of NO_SENSE the driver didn't supply any extra information with
> the failure. Why does this code assume the request completed okay?
Laurie,
By my reading of that logic, if the response code is
0x70, 0x71, 0x72 or 0x73 then the sense buffer will
be printed to the log in the case when the sense_key
is RECOVERED_ERROR or NO_SENSE.
The sort of sense messages that come through with
a valid response code and a sense_key of NO_SENSE
are informational exceptions (when MRIE=5 or 6) **.
The block layer isn't interested (but the user might
be). A properly set up smartd (from smartmontools)
should be sending automatically generated emails
to system administrators in such cases.
** REQUEST SENSE is the command that receives
a sense_key of NO_SENSE in its data in buffer.
The interesting cases are when the asc/ascq codes
are non-zero. I'm not sure the sd driver ever issues
a REQUEST SENSE command.
BTW There is another (lower) level of sense data
checking in scsi_decide_disposition() found in
the scsi_error.c file and belonging to the SCSI
mid level code.
> good_bytes is set to 0 based on the non-zero result in the scsi_cmnd
> result field. Then its reset based on NO_SENSE. Not returning more
> sense information would be poor driver behavior but changing result to 0
> seems dangerous.
As far as I can tell the SG_IO ioctl applied to a
scsi disk device node (e.g. /dev/sda) does _not_
take the code path that you showed. That leaves
mounted file systems and commands like dd using
that code via the block layer.
Do you have a suggestion of what you would prefer
to happen?
Doug Gilbert
> ----- Original Message ----
> From: Douglas Gilbert <dougg@torque.net>
> To: Laurie Costello <lmcostello@yahoo.com>
> Cc: linux-scsi@vger.kernel.org
> Sent: Thursday, November 2, 2006 10:10:16 AM
> Subject: Re: non-zero result with NO_SENSE
>
> Laurie Costello wrote:
>> I'm researching a problem with data corruption on linux 2.6.9.
>>
>> I'm seeing results of scsi_print_sense() in the system log, which brought me
>>
>> to this chunk of code. Is it correct to process a non-zero result with
>>
>> NO_SENSE as if an error didn't occur or was recovered?
>
> Laurie,
> Well it isn't ideal obviously. The problem is that sd
> sits between a SCSI direct access device and the SCSI
> block subsystem. SCSI devices try and be helpful and
> tell the application client things like: the data was
> read but required 3 retries and ECC. The block layer
> is only interested in errors that impact the current
> IO, so sd has no slot to file these warnings in, apart
> from the system log. There was some proposal about
> logging such errors and warning (but where to store
> the log :-) ) and disks already do a fair amount of
> logging themselves.
>
> Even more worrying are deferred errors. Something like
> a cached write that at some later time gets an IO error
> when the disk tries to write its cache to the media.
> This should become a more interesting area as disks get
> larger non-volatile caches.
>
> There are tools such as smartmontools that can monitor
> the health of disks.
>
> Doug Gilbert
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-11-02 19:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-02 14:28 non-zero result with NO_SENSE Laurie Costello
2006-11-02 16:10 ` Douglas Gilbert
-- strict thread matches above, loose matches on Subject: below --
2006-11-02 16:58 Laurie Costello
2006-11-02 19:06 ` Douglas Gilbert
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.