* Proposal to add a new sysfs attribute to SCSI device
@ 2005-03-22 3:05 Zhao, Forrest
2005-03-23 16:36 ` James Bottomley
0 siblings, 1 reply; 2+ messages in thread
From: Zhao, Forrest @ 2005-03-22 3:05 UTC (permalink / raw)
To: linux-scsi
Hi, list
Background: As we know, when the SCSI disk is in failure state, for
example bad sectors appear on disk or disk is surprise-removed, the SCSI
middle-layer will endless retry the failed I/O requests if SCSI
mid-layer can't get the notification from LLDD to stop the retry.
Unfortunately now *not all LLDD* can notify the mid-layer to stop the
retry when SCSI disk is in failure state.
Let me tell you the testing experience in our lab:
1 we install kernel 2.6.11.2 on a Tiger4 platform,
2 there're two SCSI disks, one is sda for root fs, the other is sdb for
/mnt
3 execute "cp -r /usr/src/linux-2.6.11.2 /mnt"
4 during the process of copying, we surprise-removed sdb
5 then system become very busy and freezing, even the user can't login
into the system whether locally or remotely
6 the error output on the screen demonstrates that SCSI mid-layer is
endless retrying the failed I/O requests
To overcome this morbid(or weird) behavior, I propose to add a new sysfs
attribute to SCSI device.
Attribute name: stop_retry_threshold
Description: user set a threshold value through this interface, so that
after SCSI mid-layer has retried "threshold" times, it'll automatically
stop the further retries to make system calm down and usable to other
users.
Usage example: user execute "echo 100 >
/sys/block/sdb/device/stop_retry_threshold" to tell SCSI mid-layer to
automatically stop the further retries after it has retries 100 times.
What's your comment about this proposal? If there's no objection, I'll
send out the patch soon.
Thanks,
Forrest
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Proposal to add a new sysfs attribute to SCSI device
2005-03-22 3:05 Proposal to add a new sysfs attribute to SCSI device Zhao, Forrest
@ 2005-03-23 16:36 ` James Bottomley
0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2005-03-23 16:36 UTC (permalink / raw)
To: Zhao, Forrest; +Cc: SCSI Mailing List
On Tue, 2005-03-22 at 11:05 +0800, Zhao, Forrest wrote:
> Let me tell you the testing experience in our lab:
> 1 we install kernel 2.6.11.2 on a Tiger4 platform,
> 2 there're two SCSI disks, one is sda for root fs, the other is sdb for
> /mnt
> 3 execute "cp -r /usr/src/linux-2.6.11.2 /mnt"
> 4 during the process of copying, we surprise-removed sdb
> 5 then system become very busy and freezing, even the user can't login
> into the system whether locally or remotely
> 6 the error output on the screen demonstrates that SCSI mid-layer is
> endless retrying the failed I/O requests
Which HBA driver is this? As long as the HBA can recognise the device
is missing, the retries should go very fast because it's simply a bounce
between the eh thread and the driver.
James
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2005-03-23 16:36 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-22 3:05 Proposal to add a new sysfs attribute to SCSI device Zhao, Forrest
2005-03-23 16:36 ` James Bottomley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox