public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* aic7xxx deadlock in 2.6.0-test10
@ 2003-11-24  7:08 Andrew Morton
  2003-11-24  8:13 ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2003-11-24  7:08 UTC (permalink / raw)
  To: Justin T. Gibbs, linux-scsi


One of my less-used boxes if taking an NMI watchdog hit on boot in test10:

(gdb) bt
#0  0xc02dc427 in .text.lock.scsi () at drivers/scsi/scsi.c:1049
#1  0xf70e5df8 in ?? ()
#2  0xc02df24a in scsi_report_bus_reset (shost=0xf70f6e48, channel=-150053384) at drivers/scsi/scsi_error.c:1625
#3  0xc02fbc96 in ahc_send_async (ahc=0xf70e5df8, channel=65 'A', target=4294967295, lun=4144983544, code=-134635520, arg=0x0) at drivers/scsi/aic7xxx/aic7xxx_osm.c:4075
#4  0xc02ee031 in ahc_reset_channel (ahc=0xf70f7df8, channel=65 'A', initiate_reset=1) at drivers/scsi/aic7xxx/aic7xxx_core.c:6077
#5  0xc02f7040 in ahc_linux_initialize_scsi_bus (ahc=0xf70f7df8) at drivers/scsi/aic7xxx/aic7xxx_osm.c:1846
#6  0xc02f6dc3 in ahc_linux_register_host (ahc=0xf70f7df8, template=0xf70f6df8) at drivers/scsi/aic7xxx/aic7xxx_osm.c:1737
#7  0xc02f5cc8 in ahc_linux_detect (template=0xc04a4ea0) at drivers/scsi/aic7xxx/aic7xxx_osm.c:908
#8  0xc05c47d5 in ahc_linux_init () at drivers/scsi/aic7xxx/aic7xxx_osm.c:5054
#9  0xc05ac865 in do_initcalls () at init/main.c:497
#10 0xc05ac8dc in do_basic_setup () at init/main.c:537
#11 0xc010512d in init (unused=0x0) at init/main.c:579

ahc_linux_register_host() is calling ahc_linux_initialize_scsi_bus() under
ahc_lock(ahc, &s).  But scsi_report_bus_reset() is taking host->host_lock
inside shost_for_each_device(), and that is the same (already held) lock.


This patch gets things going again of course.  Could someone please take a
look at fixing this for real?

I'm not sure why this has suddenly started happening actually.

--- 25/drivers/scsi/aic7xxx/aic7xxx_osm.c~aic7xxx-deadlock-fix	2003-11-23 23:03:52.000000000 -0800
+++ 25-akpm/drivers/scsi/aic7xxx/aic7xxx_osm.c	2003-11-23 23:03:59.000000000 -0800
@@ -1734,8 +1734,8 @@ ahc_linux_register_host(struct ahc_softc
     LINUX_VERSION_CODE  < KERNEL_VERSION(2,5,0)
 	scsi_set_pci_device(host, ahc->dev_softc);
 #endif
-	ahc_linux_initialize_scsi_bus(ahc);
 	ahc_unlock(ahc, &s);
+	ahc_linux_initialize_scsi_bus(ahc);
 	ahc->platform_data->dv_pid = kernel_thread(ahc_linux_dv_thread, ahc, 0);
 	ahc_lock(ahc, &s);
 	if (ahc->platform_data->dv_pid < 0) {

_


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: aic7xxx deadlock in 2.6.0-test10
  2003-11-24  7:08 aic7xxx deadlock in 2.6.0-test10 Andrew Morton
@ 2003-11-24  8:13 ` Christoph Hellwig
  2003-11-24 15:08   ` James Bottomley
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2003-11-24  8:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Justin T. Gibbs, linux-scsi

On Sun, Nov 23, 2003 at 11:08:53PM -0800, Andrew Morton wrote:
> ahc_linux_register_host() is calling ahc_linux_initialize_scsi_bus() under
> ahc_lock(ahc, &s).  But scsi_report_bus_reset() is taking host->host_lock
> inside shost_for_each_device(), and that is the same (already held) lock.
> 
> 
> This patch gets things going again of course.  Could someone please take a
> look at fixing this for real?
> 
> I'm not sure why this has suddenly started happening actually.

jejb just posted another fix for it on linux-scsi, but in fact your fix
looks better to me, ahc_initialize doesn't need the host_lock because
ahc can't be reached from an other thread yet.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: aic7xxx deadlock in 2.6.0-test10
  2003-11-24  8:13 ` Christoph Hellwig
@ 2003-11-24 15:08   ` James Bottomley
  0 siblings, 0 replies; 3+ messages in thread
From: James Bottomley @ 2003-11-24 15:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, Justin T. Gibbs, SCSI Mailing List

On Mon, 2003-11-24 at 02:13, Christoph Hellwig wrote:
> jejb just posted another fix for it on linux-scsi, but in fact your fix
> looks better to me, ahc_initialize doesn't need the host_lock because
> ahc can't be reached from an other thread yet.

I don't think so.  This only fixes the boot case for aic7xxx (the boot
problem is also present in aic79xx). This routine was designed to be
called also when a driver detects a reset it didn't initiate
(aic7xxx_old is probably the only driver that actually seems to do
this).  In that case, the detection is called from the ISR, so you don't
want to drop the lock

The fix, I think, is to require the lock be held for the bus/device
reset reporting callbacks and use the unlocked iterator.

James



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-11-24 15:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-24  7:08 aic7xxx deadlock in 2.6.0-test10 Andrew Morton
2003-11-24  8:13 ` Christoph Hellwig
2003-11-24 15:08   ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox