public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* SCSI/Block boot problems
@ 2003-01-18 16:57 Russell King
  2003-01-18 18:58 ` Russell King
  0 siblings, 1 reply; 2+ messages in thread
From: Russell King @ 2003-01-18 16:57 UTC (permalink / raw)
  To: LKML; +Cc: James Bottomley

Continuing the theme of testing 2.5.59, there appears to be some fairly
bad error handling somewhere in this area at the moment.  So far, my
debugging shows the following:

1. A permanent error with a _1_ SCSI target on a SCSI bus causes the
   SCSI error handling to go completely gaga and eventually ends up
   oopsing the kernel.  Full kernel messages with SCSI debugging enabled
   have been forwarded to James Bottomley.

2. SCSI appears to attempt to spin up a non-present disk in removable
   SCSI drive _3_ times during boot.  This is new behaviour for 2.5,
   which 2.4, 2.2 nor 2.0 used to show.

   Since each spinup takes around 2 minutes to timeout and the drive
   obvious isn't going to spin up without media present, it produces
   some very long test cycles, and is a source of continual annoyance.

3. SCSI goes completely gaga after a SCSI disk IO error.  I haven't
   got much to say about this other than to supply the kernel messages
   (with some extra ones added to try to track down the problem.)

   At this point, we are trying to read the partition table on the
   aforementioned empty SCSI removable drive:

	 sda:submitting buffer 0 of 1 (cc3fa580) page c026e3c0
	submission done
	prep_rq_fn: device sda ret = 1

   scsi_prep_fn() returns BLKPREP_KILL, and we end the request:

	__end_that_request_first: req c0427dc0 uptodate 0 nrbytes 4096
	end_request: I/O error, dev sda, sector 0
	end_buffer_async_read: bh cc3fa580 page c026e3c0 uptodate 0
	Buffer I/O error on device sd(8,0), logical block 0
	unlocking page: all buffers unlocked
	unlocking page c026e3c0 waitqueue c0003228: flags 00001006
	all done

   The partition code attempts to read another page:

	submitting buffer 0 of 1 (cc3fa580) page c026e3c0
	submission done
	wait_on_page_bit: task c0441040 page c026e3c0 bit 0 waitqueue c0003228
	prep_rq_fn: device sda ret = 2
	sleeping on page c026e3c0: flags 00021007
	prep_rq_fn: device sda ret = 2
	prep_rq_fn: device sda ret = 2
	prep_rq_fn: device sda ret = 2
	prep_rq_fn: device sda ret = 2
	prep_rq_fn: device sda ret = 2

   This time, scsi_prep_fn() continually returns BLKPREP_DEFER and we
   don't make any further progress.  (I assume 20 minutes of waiting is
   probably long enough! 8))

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: SCSI/Block boot problems
  2003-01-18 16:57 SCSI/Block boot problems Russell King
@ 2003-01-18 18:58 ` Russell King
  0 siblings, 0 replies; 2+ messages in thread
From: Russell King @ 2003-01-18 18:58 UTC (permalink / raw)
  To: LKML; +Cc: James Bottomley

On Sat, Jan 18, 2003 at 04:57:02PM +0000, Russell King wrote:
> 3. SCSI goes completely gaga after a SCSI disk IO error.  I haven't
>    got much to say about this other than to supply the kernel messages
>    (with some extra ones added to try to track down the problem.)
> 
>    At this point, we are trying to read the partition table on the
>    aforementioned empty SCSI removable drive:
> 
> 	 sda:submitting buffer 0 of 1 (cc3fa580) page c026e3c0
> 	submission done
> 	prep_rq_fn: device sda ret = 1

Additional debugging shows that the above is due to a suspected media
change - we are dropping out of 2.5.59 drivers/scsi/sd.c:238
(sd_init_command(), sdp->changed true).

It would appear that when we return to scsi_prep_fn(), we release
any buffers allocated to the command structure (via scsi_release_buffers)
but we don't actually free the SCSI command structure which was allocated
via scsi_allocate_device().

This means that we drop one SCSI command structure on the floor each time
we detect the media has changed in a removable media device, which then
causes us to run out of SCSI command structures, eventually bringing the
device to a complete halt.

Unfortunately, SCSI command structures can come from req->special, and
it is unclear to me at present whether these should be freed as well.
Therefore, someone more knowledgeable of the implementation in this
area needs to review this.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-01-18 18:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-18 16:57 SCSI/Block boot problems Russell King
2003-01-18 18:58 ` Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox