public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Lord <lkml@rtr.ca>
To: Johan Groth <johan.groth@linux-grotto.org.uk>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Scsi errors with Megaraid 300-8x
Date: Wed, 23 Aug 2006 11:27:14 -0400	[thread overview]
Message-ID: <44EC73D2.9090302@rtr.ca> (raw)
In-Reply-To: <44EB1875.3020403@linux-grotto.org.uk>

Johan Groth wrote:
> Hi,
> ever since I upgraded my server from a dual Opteron 244 (mobo Tyan 2885) 
> system to a dual dual-core Opteron 285 (mobo Tyan 2895) system, I'm 
> getting read errors that freezes the system which leads to my disk based 
> backup software stopped working (faubackup). I think it is faubackup 
> that triggers the bug.
> 
> I get these errors in the log:
> Aug 20 06:35:08 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 
> 0x40001
> Aug 20 06:35:56 jaguar kernel: end_request: I/O error, dev sda, sector 
> 616924530
> Aug 20 06:36:03 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 
> 0x40001
> Aug 20 06:36:03 jaguar kernel: end_request: I/O error, dev sda, sector 
> 616924538
..
> Aug 20 06:36:07 jaguar kernel: sd 2:1:0:0: SCSI error: return code = 
> 0x40001
> Aug 20 06:36:07 jaguar kernel: end_request: I/O error, dev sda, sector 
> 616924538
> 
> The last sector is repeated until I reboot the machine. The only 
> difference I've made to the raid configuration is that sdc is now 2x250 
> MB instead of 4x120MB, but that array is the target not the source (sda).
> The raid HW is an LSI Megaraid 300-8x with the following configuration:
..

That looks like the classic SCSI bad-sectory non-recovery bug.
The code in scsi_lib.c, scsi_error.c, and sd.c is currently a
bit of a mess here.  

Basically, given an I/O request for 200 sectors, with a bad sector
in the middle at number 100, what SCSI will often do is fail sectors
number 1 through 100, one at a time, retrying the entire remainder of
the request after each attempt.  This takes hours, and results in no
data for the first 99 good sectors.

What it needs to do *instead*, is retry each sector individually,
rather than the entire request.  This would result in sectors 1..99
and 101..200 succeeding, and retries/failure only for sector 100.

A slight optimization would be to fail the bio size around sector 100,
rather than just the one sector.

I've got patches that do exactly this, and they work quite well.
But they're probably not "pretty enough" for inclusion.

Cheers



  reply	other threads:[~2006-08-23 15:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-22 14:45 Scsi errors with Megaraid 300-8x Johan Groth
2006-08-23 15:27 ` Mark Lord [this message]
2006-08-23 15:42   ` Johan Groth
2006-08-23 15:45     ` Justin Piszcz
2006-08-23 15:48       ` Johan Groth
2006-08-23 15:53         ` Justin Piszcz
2006-08-23 15:57           ` Johan Groth
2006-08-23 15:59             ` Justin Piszcz
2006-08-24 14:48         ` Mark Lord
2006-08-24 15:09           ` Johan Groth
2006-08-24 16:57             ` Mark Lord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44EC73D2.9090302@rtr.ca \
    --to=lkml@rtr.ca \
    --cc=johan.groth@linux-grotto.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox