Core scsi layer crashes in 2.6.8.1

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Core scsi layer crashes in 2.6.8.1
@ 2004-09-28 20:03 Alan Cox
  2004-09-29 14:11 ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2004-09-28 20:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List, linux-scsi

I've been doing some stress testing for Fedora Core 3 and lets say the
SCSI layer came apart under stress. 

First problem was a device with 256 byte sector sizes. When it probes I
get a chain of errors. If I then try and mount it then it hangs the
mount forever. If you remove the USB scsi device to try and unjam it you
get errors logged about Invalid State 256 in USB reset and it doesn't
recover.

Second problem is with the scsi handling logic for errors. If you rmmod
a scsi driver while it is error handling you get a chain of errors
starting with

	Illegal transition Cancel->Offline
	Badness is scsi_device_set_state
	path:
	scsi_device_set_state
	scsi_unjam_host
	scsi_error_handler

This is followed by a series of further errors including kobject errors
and oopses. Then the machine dies.

The set up is fairly simple. Its a a disk and two scsi cd multichangers
configured so that I can also badly terminate them. In that situation
identify works but other commands tend to fail which allows good error
stressing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-09-29 14:11 ` James Bottomley
@ 2004-09-29 13:24   ` Alan Cox
  2004-09-29 14:34     ` James Bottomley
  2004-09-29 14:27   ` Matthew Wilcox
  2004-10-05 11:49   ` Anton Blanchard
  2 siblings, 1 reply; 19+ messages in thread
From: Alan Cox @ 2004-09-29 13:24 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linux Kernel Mailing List, SCSI Mailing List

On Mer, 2004-09-29 at 15:11, James Bottomley wrote:
> What was the oops?
> 
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

The sequence I scribbled down from the console was

        Illegal state transition Cancel->Offline
        Badness in scsi_device_set_state
                scsi_device_set_state
                scsi_unjam_host
                scsi_error_handler
 
        badness in kref_get
                kobject_get, get_device, scsi_request_fn
                blk_insert_request, scsi_queue_insert
                scsi_eh_flush_done_q, scsi_unjam_host
                scsi_error_handler
 
        OOPS scsi_device_dev_release
                device_release
                kobject_cleanup
                kobject_release
                kref_put
                scsi_request_fn

Alan


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-09-28 20:03 Core scsi layer crashes in 2.6.8.1 Alan Cox
@ 2004-09-29 14:11 ` James Bottomley
  2004-09-29 13:24   ` Alan Cox
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: James Bottomley @ 2004-09-29 14:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, SCSI Mailing List

On Tue, 2004-09-28 at 16:03, Alan Cox wrote:
> Second problem is with the scsi handling logic for errors. If you rmmod
> a scsi driver while it is error handling you get a chain of errors
> starting with
> 
> 	Illegal transition Cancel->Offline
> 	Badness is scsi_device_set_state
> 	path:
> 	scsi_device_set_state
> 	scsi_unjam_host
> 	scsi_error_handler
> 

These state transition warnings are currently expected in this code
(they're basically verbose warnings).

What was the oops?

I have a theory that we should be taking a device reference before
waking up the error handler, otherwise host removal can race with error
handling.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-09-29 14:11 ` James Bottomley
  2004-09-29 13:24   ` Alan Cox
@ 2004-09-29 14:27   ` Matthew Wilcox
  2004-10-05 11:49   ` Anton Blanchard
  2 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2004-09-29 14:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Cox, Linux Kernel Mailing List, SCSI Mailing List

On Wed, Sep 29, 2004 at 10:11:59AM -0400, James Bottomley wrote:
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

That would certainly mesh with what's happening with Anton's sick ppc64
box ...

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-09-29 13:24   ` Alan Cox
@ 2004-09-29 14:34     ` James Bottomley
  0 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2004-09-29 14:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List, SCSI Mailing List

On Wed, 2004-09-29 at 09:24, Alan Cox wrote:
>         badness in kref_get
>                 kobject_get, get_device, scsi_request_fn
>                 blk_insert_request, scsi_queue_insert
>                 scsi_eh_flush_done_q, scsi_unjam_host
>                 scsi_error_handler

Yes .. this is the significant one ... we're trying to get a reference
to a device that has already been released

>         OOPS scsi_device_dev_release
>                 device_release
>                 kobject_cleanup
>                 kobject_release
>                 kref_put
>                 scsi_request_fn

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-09-29 14:11 ` James Bottomley
  2004-09-29 13:24   ` Alan Cox
  2004-09-29 14:27   ` Matthew Wilcox
@ 2004-10-05 11:49   ` Anton Blanchard
  2004-10-05 13:56     ` James Bottomley
  2 siblings, 1 reply; 19+ messages in thread
From: Anton Blanchard @ 2004-10-05 11:49 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Cox, Linux Kernel Mailing List, SCSI Mailing List


Hi James,

> These state transition warnings are currently expected in this code
> (they're basically verbose warnings).
> 
> What was the oops?
> 
> I have a theory that we should be taking a device reference before
> waking up the error handler, otherwise host removal can race with error
> handling.

Did this get sorted out? Here is an oops from a few week old BK tree.
FYI I just noticed I have disabled host reset in the sym2 driver (it
was locking up at the time and I never went back to work out why).
However, even with a host reset this could happen right?

Below we get a WARN_ON then an oops (the bit starting with NIP, the
address we tried to access was 0x100510.

Anton

sym0: <1010-66> rev 0x1 at pci 0004:03:01.0 irq 87
sym.0004:03:01.0: No NVRAM, ID 7, Fast-80, LVD, parity checking
xics_enable_irq 47 buid 4 gqirm 255
sym.0004:03:01.0: SCSI BUS has been reset.
scsi0 : sym-2.1.18j
Using anticipatory io scheduler
sym.0004:03:01.0:10: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
  Vendor: IBM       Model: IC35L036UCDY10-0  Rev: S25M
  Type:   Direct-Access                      ANSI SCSI revision: 03
sym.0004:03:01.0:10:0: tagged command queuing enabled, command queue depth 16.
scsi(0:0:10:0): Beginning Domain Validation
sym.0004:03:01.0:10: asynchronous.
sym.0004:03:01.0:10: wide asynchronous.
sym.0004:03:01.0:10: FAST-80 WIDE SCSI 160.0 MB/s DT (12.5 ns, offset 31)
scsi(0:0:10:0): Ending Domain Validation
sym.0004:03:01.0:11:0:phase change 2-7 6@01050368 resid=5.
sym.0004:03:01.0:11:0:phase change 2-3 6@01050368 resid=5.
sym.0004:03:01.0:11: FAST-40 WIDE SCSI 80.0 MB/s ST (25.0 ns, offset 31)
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: DEVICE RESET operation started.
sym.0004:03:01.0:11:0: DEVICE RESET operation complete.
sym.0004:03:01.0:11:control msgout: c.
sym.0004:03:01.0: TARGET 11 has been reset.
sym.0004:03:01.0:11:0: ABORT operation started.
sym.0004:03:01.0:11:0: ABORT operation complete.
sym.0004:03:01.0:11:0: BUS RESET operation started.
sym.0004:03:01.0:11:0: BUS RESET operation complete.
sym.0004:03:01.0: SCSI BUS reset detected.
sym.0004:03:01.0: SCSI BUS has been reset.
scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 11 lun 0
Badness in kref_get at lib/kref.c:32
Call Trace:
[c0000025fe1b3bd0] [c0000030262bf4b0] 0xc0000030262bf4b0 (unreliable)
[c0000025fe1b3c50] [c00000000021f5b8] .get_device+0x20/0x3c
[c0000025fe1b3cc0] [c000000000294c60] .scsi_device_get+0x38/0xe4
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68
sym.0004:03:01.0:11:control msgout: c.
NIP: C000000000294C48 XER: 0000000020000000 LR: C000000000294E30
REGS: c0000025fe1b3a40 TRAP: 0300   Not tainted  (2.6.9-rc2-bml)
MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 0000000000100510, DSISR: 0000000040000000
TASK: c000001dfe1b72c0[1467] 'scsi_eh_0' THREAD: c0000025fe1b0000 CPU: 3
GPR00: FFFFFFFFFFFFFFFA C0000025FE1B3CC0 C0000000007297B8 00000000001000F0
GPR04: C000000FFE185000 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000100100 C000000000B96228 9000000000009032
GPR12: 0000000024FFFF22 C000000000542700 0000000000000000 0000000000000000
GPR16: 0000000000000000 C00000000040D188 C000000000587058 C0000025FE1B3ED0
GPR20: 00000000000000FC C00000000040D188 C000000000587058 C0000025FE1B3F00
GPR24: C0000025FE1B3EF0 0000040100000000 C000002FFE128D30 C000000FFE185000
GPR28: 9000000000009032 C0000007FFFCF800 00000000001002D8 00000000001000F0
NIP [c000000000294c48] .scsi_device_get+0x20/0xe4
LR [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
Call Trace:
[c0000025fe1b3cc0] [c000000000294da8] .scsi_device_put+0x9c/0xc4 (unreliable)
[c0000025fe1b3d40] [c000000000294e30] .__scsi_iterate_devices+0x60/0xfc
[c0000025fe1b3de0] [c000000000299bf8] .scsi_run_host_queues+0x34/0x58
[c0000025fe1b3e60] [c0000000002989f8] .scsi_error_handler+0x268/0xaa0
[c0000025fe1b3f90] [c000000000017aac] .kernel_thread+0x4c/0x68

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 11:49   ` Anton Blanchard
@ 2004-10-05 13:56     ` James Bottomley
  2004-10-05 14:44       ` Mark Lord
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-10-05 13:56 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Alan Cox, Linux Kernel Mailing List, SCSI Mailing List

On Tue, 2004-10-05 at 06:49, Anton Blanchard wrote:
> 
> Hi James,
> 
> > These state transition warnings are currently expected in this code
> > (they're basically verbose warnings).
> > 
> > What was the oops?
> > 
> > I have a theory that we should be taking a device reference before
> > waking up the error handler, otherwise host removal can race with error
> > handling.
> 
> Did this get sorted out? Here is an oops from a few week old BK tree.
> FYI I just noticed I have disabled host reset in the sym2 driver (it
> was locking up at the time and I never went back to work out why).
> However, even with a host reset this could happen right?

Well, the theoretical hole is fixed ... If you test the current tree
we'll find out if this is indeed your problem.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 13:56     ` James Bottomley
@ 2004-10-05 14:44       ` Mark Lord
  2004-10-05 14:56         ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Lord @ 2004-10-05 14:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

There seem to be other holes/races in this and related code.

The QStor driver implements hot insertion/removal of drives.

One thing it has to cope with at present is, after notifying
the mid-layer that a drive has been removed, the mid-layer calls
back with a synchronize-cache command for that drive..

Cheers
-- 
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 14:44       ` Mark Lord
@ 2004-10-05 14:56         ` James Bottomley
  2004-10-05 15:46           ` Mark Lord
  2004-10-05 15:49           ` Oliver Neukum
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2004-10-05 14:56 UTC (permalink / raw)
  To: Mark Lord
  Cc: Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

On Tue, 2004-10-05 at 09:44, Mark Lord wrote:
> There seem to be other holes/races in this and related code.
> 
> The QStor driver implements hot insertion/removal of drives.
> 
> One thing it has to cope with at present is, after notifying
> the mid-layer that a drive has been removed, the mid-layer calls
> back with a synchronize-cache command for that drive..

This is expected behaviour.  For orderly removal an cache sync command
must be sent to drives with a writeback cache before they're powered
down.  For forced ejection, the driver has to error the command.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 14:56         ` James Bottomley
@ 2004-10-05 15:46           ` Mark Lord
  2004-10-05 15:49           ` Oliver Neukum
  1 sibling, 0 replies; 19+ messages in thread
From: Mark Lord @ 2004-10-05 15:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

James Bottomley wrote:
>
> This is expected behaviour.  For orderly removal an cache sync command
> must be sent to drives with a writeback cache before they're powered
> down.  For forced ejection, the driver has to error the command.

Yup, that's how it has to be done at present.

Another weirdness I ran into at one point, was that the mid-layer
could be made to segfault if a LLD asked it to remove a drive that
had previously been set "offline" -- it complains about an illegal
state transition during the removal, and then dies.  This sequence
no longer occurs in the QStor driver, but it might resurface soon
as more drivers begin to support hot insertion/removal.

Cheers
-- 
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 14:56         ` James Bottomley
  2004-10-05 15:46           ` Mark Lord
@ 2004-10-05 15:49           ` Oliver Neukum
  2004-10-05 15:54             ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: Oliver Neukum @ 2004-10-05 15:49 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

Am Dienstag, 5. Oktober 2004 16:56 schrieb James Bottomley:
> On Tue, 2004-10-05 at 09:44, Mark Lord wrote:
> > There seem to be other holes/races in this and related code.
> > 
> > The QStor driver implements hot insertion/removal of drives.
> > 
> > One thing it has to cope with at present is, after notifying
> > the mid-layer that a drive has been removed, the mid-layer calls
> > back with a synchronize-cache command for that drive..
> 
> This is expected behaviour.  For orderly removal an cache sync command
> must be sent to drives with a writeback cache before they're powered
> down.  For forced ejection, the driver has to error the command.

Then let the driver tell the upper layers whether the device is still
connected or not.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 15:49           ` Oliver Neukum
@ 2004-10-05 15:54             ` James Bottomley
  2004-10-05 15:57               ` Mark Lord
  2004-10-05 16:01               ` Oliver Neukum
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2004-10-05 15:54 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

On Tue, 2004-10-05 at 10:49, Oliver Neukum wrote:
> Then let the driver tell the upper layers whether the device is still
> connected or not.

Do we have to go over this again?

It would add quite a bit of complexity to the reference counted
aynchronous model to try and force synchronicity between queuecommand
and scsi_remove_host in the mid-layer.  Therefore it's much easier to
let the LLD decide what to do with the command.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 15:54             ` James Bottomley
@ 2004-10-05 15:57               ` Mark Lord
  2004-10-05 16:01                 ` James Bottomley
  2004-10-05 16:53                 ` Jeff Garzik
  2004-10-05 16:01               ` Oliver Neukum
  1 sibling, 2 replies; 19+ messages in thread
From: Mark Lord @ 2004-10-05 15:57 UTC (permalink / raw)
  To: James Bottomley
  Cc: Oliver Neukum, Anton Blanchard, Alan Cox,
	Linux Kernel Mailing List, SCSI Mailing List

James Bottomley wrote:
>
> It would add quite a bit of complexity to the reference counted
> aynchronous model to try and force synchronicity between queuecommand
> and scsi_remove_host in the mid-layer.  Therefore it's much easier to
> let the LLD decide what to do with the command.

Presumably the same is also true for scsi_remove_device() ?

Cheers
-- 
Mark Lord
(hdparm keeper & the original "Linux IDE Guy")

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 15:57               ` Mark Lord
@ 2004-10-05 16:01                 ` James Bottomley
  2004-10-05 16:53                 ` Jeff Garzik
  1 sibling, 0 replies; 19+ messages in thread
From: James Bottomley @ 2004-10-05 16:01 UTC (permalink / raw)
  To: Mark Lord
  Cc: Oliver Neukum, Anton Blanchard, Alan Cox,
	Linux Kernel Mailing List, SCSI Mailing List

On Tue, 2004-10-05 at 10:57, Mark Lord wrote:
> James Bottomley wrote:
> >
> > It would add quite a bit of complexity to the reference counted
> > aynchronous model to try and force synchronicity between queuecommand
> > and scsi_remove_host in the mid-layer.  Therefore it's much easier to
> > let the LLD decide what to do with the command.
> 
> Presumably the same is also true for scsi_remove_device() ?

Yes.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 15:54             ` James Bottomley
  2004-10-05 15:57               ` Mark Lord
@ 2004-10-05 16:01               ` Oliver Neukum
  2004-10-05 16:07                 ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: Oliver Neukum @ 2004-10-05 16:01 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

Am Dienstag, 5. Oktober 2004 17:54 schrieb James Bottomley:
> On Tue, 2004-10-05 at 10:49, Oliver Neukum wrote:
> > Then let the driver tell the upper layers whether the device is still
> > connected or not.
> 
> Do we have to go over this again?
> 
> It would add quite a bit of complexity to the reference counted
> aynchronous model to try and force synchronicity between queuecommand
> and scsi_remove_host in the mid-layer.  Therefore it's much easier to
> let the LLD decide what to do with the command.

Why is it in any way difficult to decide whether to issue a command in the
first place? The command is generated upon being notified by the lower layer.
There is no issue of synchronisation here. It is simply stupid to give
commands that are bound to fail, if the information is already available.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 16:01               ` Oliver Neukum
@ 2004-10-05 16:07                 ` James Bottomley
  2004-10-05 16:26                   ` Oliver Neukum
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-10-05 16:07 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

On Tue, 2004-10-05 at 11:01, Oliver Neukum wrote:
> Why is it in any way difficult to decide whether to issue a command in the
> first place? The command is generated upon being notified by the lower layer.
> There is no issue of synchronisation here. It is simply stupid to give
> commands that are bound to fail, if the information is already available.

a) we don't know that they are ... for notified ejection they will
succeed.

b) The scsi bus is a scanned model ... drivers must be prepared to
accept commands for non-existent devices. How does the removal case
differ from the never present case?

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 16:07                 ` James Bottomley
@ 2004-10-05 16:26                   ` Oliver Neukum
  2004-10-05 16:38                     ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Oliver Neukum @ 2004-10-05 16:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

Am Dienstag, 5. Oktober 2004 18:07 schrieb James Bottomley:
> On Tue, 2004-10-05 at 11:01, Oliver Neukum wrote:
> > Why is it in any way difficult to decide whether to issue a command in the
> > first place? The command is generated upon being notified by the lower layer.
> > There is no issue of synchronisation here. It is simply stupid to give
> > commands that are bound to fail, if the information is already available.
> 
> a) we don't know that they are ... for notified ejection they will
> succeed.

Yes, that is why I proposed that you let the lower levels tell you whether
the devices in questions are still operative or not.

> b) The scsi bus is a scanned model ... drivers must be prepared to
> accept commands for non-existent devices. How does the removal case
> differ from the never present case?

It doesn't. But that doesn't explain why you want to issue the command
in all cases, even if we coule easily tell you whether it makes sense or
not? It makes no sense to me to throw away information you already have.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 16:26                   ` Oliver Neukum
@ 2004-10-05 16:38                     ` James Bottomley
  0 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2004-10-05 16:38 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Mark Lord, Anton Blanchard, Alan Cox, Linux Kernel Mailing List,
	SCSI Mailing List

On Tue, 2004-10-05 at 11:26, Oliver Neukum wrote:
> > b) The scsi bus is a scanned model ... drivers must be prepared to
> > accept commands for non-existent devices. How does the removal case
> > differ from the never present case?
> 
> It doesn't. But that doesn't explain why you want to issue the command
> in all cases, even if we coule easily tell you whether it makes sense or
> not? It makes no sense to me to throw away information you already have.

I'm lazy ... I don't see any point in going to a huge engineering effort
to avoid behaviour that the driver must cope correctly with anyway.  If
it isn't broken, don't fix it.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Core scsi layer crashes in 2.6.8.1
  2004-10-05 15:57               ` Mark Lord
  2004-10-05 16:01                 ` James Bottomley
@ 2004-10-05 16:53                 ` Jeff Garzik
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff Garzik @ 2004-10-05 16:53 UTC (permalink / raw)
  To: Mark Lord
  Cc: James Bottomley, Oliver Neukum, Anton Blanchard, Alan Cox,
	Linux Kernel Mailing List, SCSI Mailing List

On Tue, Oct 05, 2004 at 11:57:40AM -0400, Mark Lord wrote:
> James Bottomley wrote:
> >
> >It would add quite a bit of complexity to the reference counted
> >aynchronous model to try and force synchronicity between queuecommand
> >and scsi_remove_host in the mid-layer.  Therefore it's much easier to
> >let the LLD decide what to do with the command.
> 
> Presumably the same is also true for scsi_remove_device() ?

What I do in my local hotplug code is assume that the SCSI layer will be
stupid and send commands after I call scsi_remove_device(), for an
indeterminant but short period of time.

	Jeff




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-10-05 16:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-28 20:03 Core scsi layer crashes in 2.6.8.1 Alan Cox
2004-09-29 14:11 ` James Bottomley
2004-09-29 13:24   ` Alan Cox
2004-09-29 14:34     ` James Bottomley
2004-09-29 14:27   ` Matthew Wilcox
2004-10-05 11:49   ` Anton Blanchard
2004-10-05 13:56     ` James Bottomley
2004-10-05 14:44       ` Mark Lord
2004-10-05 14:56         ` James Bottomley
2004-10-05 15:46           ` Mark Lord
2004-10-05 15:49           ` Oliver Neukum
2004-10-05 15:54             ` James Bottomley
2004-10-05 15:57               ` Mark Lord
2004-10-05 16:01                 ` James Bottomley
2004-10-05 16:53                 ` Jeff Garzik
2004-10-05 16:01               ` Oliver Neukum
2004-10-05 16:07                 ` James Bottomley
2004-10-05 16:26                   ` Oliver Neukum
2004-10-05 16:38                     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).