* SCSI/Block boot problems
@ 2003-01-18 16:57 Russell King
2003-01-18 18:58 ` Russell King
0 siblings, 1 reply; 2+ messages in thread
From: Russell King @ 2003-01-18 16:57 UTC (permalink / raw)
To: LKML; +Cc: James Bottomley
Continuing the theme of testing 2.5.59, there appears to be some fairly
bad error handling somewhere in this area at the moment. So far, my
debugging shows the following:
1. A permanent error with a _1_ SCSI target on a SCSI bus causes the
SCSI error handling to go completely gaga and eventually ends up
oopsing the kernel. Full kernel messages with SCSI debugging enabled
have been forwarded to James Bottomley.
2. SCSI appears to attempt to spin up a non-present disk in removable
SCSI drive _3_ times during boot. This is new behaviour for 2.5,
which 2.4, 2.2 nor 2.0 used to show.
Since each spinup takes around 2 minutes to timeout and the drive
obvious isn't going to spin up without media present, it produces
some very long test cycles, and is a source of continual annoyance.
3. SCSI goes completely gaga after a SCSI disk IO error. I haven't
got much to say about this other than to supply the kernel messages
(with some extra ones added to try to track down the problem.)
At this point, we are trying to read the partition table on the
aforementioned empty SCSI removable drive:
sda:submitting buffer 0 of 1 (cc3fa580) page c026e3c0
submission done
prep_rq_fn: device sda ret = 1
scsi_prep_fn() returns BLKPREP_KILL, and we end the request:
__end_that_request_first: req c0427dc0 uptodate 0 nrbytes 4096
end_request: I/O error, dev sda, sector 0
end_buffer_async_read: bh cc3fa580 page c026e3c0 uptodate 0
Buffer I/O error on device sd(8,0), logical block 0
unlocking page: all buffers unlocked
unlocking page c026e3c0 waitqueue c0003228: flags 00001006
all done
The partition code attempts to read another page:
submitting buffer 0 of 1 (cc3fa580) page c026e3c0
submission done
wait_on_page_bit: task c0441040 page c026e3c0 bit 0 waitqueue c0003228
prep_rq_fn: device sda ret = 2
sleeping on page c026e3c0: flags 00021007
prep_rq_fn: device sda ret = 2
prep_rq_fn: device sda ret = 2
prep_rq_fn: device sda ret = 2
prep_rq_fn: device sda ret = 2
prep_rq_fn: device sda ret = 2
This time, scsi_prep_fn() continually returns BLKPREP_DEFER and we
don't make any further progress. (I assume 20 minutes of waiting is
probably long enough! 8))
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: SCSI/Block boot problems
2003-01-18 16:57 SCSI/Block boot problems Russell King
@ 2003-01-18 18:58 ` Russell King
0 siblings, 0 replies; 2+ messages in thread
From: Russell King @ 2003-01-18 18:58 UTC (permalink / raw)
To: LKML; +Cc: James Bottomley
On Sat, Jan 18, 2003 at 04:57:02PM +0000, Russell King wrote:
> 3. SCSI goes completely gaga after a SCSI disk IO error. I haven't
> got much to say about this other than to supply the kernel messages
> (with some extra ones added to try to track down the problem.)
>
> At this point, we are trying to read the partition table on the
> aforementioned empty SCSI removable drive:
>
> sda:submitting buffer 0 of 1 (cc3fa580) page c026e3c0
> submission done
> prep_rq_fn: device sda ret = 1
Additional debugging shows that the above is due to a suspected media
change - we are dropping out of 2.5.59 drivers/scsi/sd.c:238
(sd_init_command(), sdp->changed true).
It would appear that when we return to scsi_prep_fn(), we release
any buffers allocated to the command structure (via scsi_release_buffers)
but we don't actually free the SCSI command structure which was allocated
via scsi_allocate_device().
This means that we drop one SCSI command structure on the floor each time
we detect the media has changed in a removable media device, which then
causes us to run out of SCSI command structures, eventually bringing the
device to a complete halt.
Unfortunately, SCSI command structures can come from req->special, and
it is unclear to me at present whether these should be freed as well.
Therefore, someone more knowledgeable of the implementation in this
area needs to review this.
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-01-18 18:49 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-18 16:57 SCSI/Block boot problems Russell King
2003-01-18 18:58 ` Russell King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox