public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Proposal to add a new sysfs attribute to SCSI device
@ 2005-03-22  3:05 Zhao, Forrest
  2005-03-23 16:36 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Zhao, Forrest @ 2005-03-22  3:05 UTC (permalink / raw)
  To: linux-scsi

Hi, list

Background: As we know, when the SCSI disk is in failure state, for
example bad sectors appear on disk or disk is surprise-removed, the SCSI
middle-layer will endless retry the failed I/O requests if SCSI
mid-layer can't get the notification from LLDD to stop the retry.
Unfortunately now *not all LLDD* can notify the mid-layer to stop the
retry when SCSI disk is in failure state. 

Let me tell you the testing experience in our lab: 
1 we install kernel 2.6.11.2 on a Tiger4 platform, 
2 there're two SCSI disks, one is sda for root fs, the other is sdb for
/mnt
3 execute "cp -r /usr/src/linux-2.6.11.2 /mnt"
4 during the process of copying, we surprise-removed sdb 
5 then system become very busy and freezing, even the user can't login
into the system whether locally or remotely
6 the error output on the screen demonstrates that SCSI mid-layer is
endless retrying the failed I/O requests

To overcome this morbid(or weird) behavior, I propose to add a new sysfs
attribute to SCSI device.
Attribute name: stop_retry_threshold
Description: user set a threshold value through this interface, so that
after SCSI mid-layer has retried "threshold" times, it'll automatically
stop the further retries to make system calm down and usable to other
users.
Usage example: user execute "echo 100 >
/sys/block/sdb/device/stop_retry_threshold" to tell SCSI mid-layer to
automatically stop the further retries after it has retries 100 times.

What's your comment about this proposal? If there's no objection, I'll
send out the patch soon.

Thanks,
Forrest

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Proposal to add a new sysfs attribute to SCSI device
  2005-03-22  3:05 Proposal to add a new sysfs attribute to SCSI device Zhao, Forrest
@ 2005-03-23 16:36 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2005-03-23 16:36 UTC (permalink / raw)
  To: Zhao, Forrest; +Cc: SCSI Mailing List

On Tue, 2005-03-22 at 11:05 +0800, Zhao, Forrest wrote:
> Let me tell you the testing experience in our lab: 
> 1 we install kernel 2.6.11.2 on a Tiger4 platform, 
> 2 there're two SCSI disks, one is sda for root fs, the other is sdb for
> /mnt
> 3 execute "cp -r /usr/src/linux-2.6.11.2 /mnt"
> 4 during the process of copying, we surprise-removed sdb 
> 5 then system become very busy and freezing, even the user can't login
> into the system whether locally or remotely
> 6 the error output on the screen demonstrates that SCSI mid-layer is
> endless retrying the failed I/O requests

Which HBA driver is this?  As long as the HBA can recognise the device
is missing, the retries should go very fast because it's simply a bounce
between the eh thread and the driver.

James



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2005-03-23 16:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-22  3:05 Proposal to add a new sysfs attribute to SCSI device Zhao, Forrest
2005-03-23 16:36 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox