linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Is this a known problem with the SCSI mid layer?
@ 2010-01-14 16:56 scameron
  2010-01-14 18:24 ` Douglas Gilbert
  0 siblings, 1 reply; 13+ messages in thread
From: scameron @ 2010-01-14 16:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, mikem, dab, hch, axboe


I'm seeing a problem which I think is a problem in the SCSI mid layer.

Check this out:

I can rmmod and insmod hpsa (a modified version from 
what's currently in the mainline tree, but I don't think
that matters.)

I have one logical drive present

[root@slicer ~]# rmmod hpsa
[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: 1210m            Rev: 0150
  Type:   RAID                             ANSI  SCSI revision: 05
Host: scsi1 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
  Type:   Direct-Access                    ANSI  SCSI revision: 05
[root@slicer ~]# rmmod hpsa
[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: 1210m            Rev: 0150
  Type:   RAID                             ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
  Type:   Direct-Access                    ANSI  SCSI revision: 05
[root@slicer ~]# lsscsi -g
[2:0:0:0]    storage HP       1210m            0150  -         /dev/sg0
[2:0:0:1]    disk    HP       1210m VOLUME     0150  /dev/sda  /dev/sg1

So far, so good.

Now, watch this.  Remove the device while something has it open:

[root@slicer ~]# sleep 10 < /dev/sg1 & ( sleep 1 && echo scsi remove-single-device 2 0 0 1 > /proc/scsi/scsi )
[1] 6077
[root@slicer ~]# 
[1]+  Done                    sleep 10 < /dev/sg1
[root@slicer ~]# lsof /dev/sg1
lsof: status error on /dev/sg1: No such file or directory
lsof 4.78
 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
 latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
 latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
 usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f]
 [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]]
 [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names]
Use the ``-h'' option to get more help information.
[root@slicer ~]# rmmod hpsa
ERROR: Module hpsa is in use
[root@slicer ~]#

Hmm, that's not cool.

Maybe it's my driver.  Let me try with USB.

[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: USB      Model: DISK 2.0         Rev: 0403
  Type:   Direct-Access                    ANSI  SCSI revision: 00
[root@slicer ~]# lsmod | grep sd
sd_mod                 59592  0 
scsi_mod              189304  9 usb_storage,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,scsi_dh,sg,cciss,sd_mod
[root@slicer ~]# rmmod usb_storage
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
[root@slicer ~]# modprobe usb_storage
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
[root@slicer ~]# echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi
-bash: echo: write error: No such device or address

Oh yeah, the host number increments, forgot about that...

[root@slicer ~]# echo scsi add-single-device 2 0 0 0 > /proc/scsi/scsi
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: USB      Model: DISK 2.0         Rev: 0403
  Type:   Direct-Access                    ANSI  SCSI revision: 00
[root@slicer ~]# lsscsi -g
[2:0:0:0]    disk    USB      DISK 2.0         0403  /dev/sda  /dev/sg0
[root@slicer ~]# sleep 10 < /dev/sg0 & ( sleep 1 && echo scsi remove-single-device 2 0 0 0 > /proc/scsi/scsi )
[1] 6073
[root@slicer ~]# 
[root@slicer ~]# 
[1]+  Done                    sleep 10 < /dev/sg0
[root@slicer ~]# rmmod usb_storage
ERROR: Module usb_storage is in use
[root@slicer ~]#

Hmm, same thing.

Any thoughts?  (other than "don't do that."  Our array configuration
utility for smart arrays is causing similar trouble, as it rapidly creates
and deletes logical drives, etc. so it would be nice if this didn't happen.)

-- steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-14 16:56 scameron
@ 2010-01-14 18:24 ` Douglas Gilbert
  2010-01-14 19:00   ` scameron
  2010-01-14 20:07   ` scameron
  0 siblings, 2 replies; 13+ messages in thread
From: Douglas Gilbert @ 2010-01-14 18:24 UTC (permalink / raw)
  To: scameron
  Cc: linux-scsi, James.Bottomley, mikem, dab, hch, axboe,
	FUJITA Tomonori

scameron@beardog.cce.hp.com wrote:
> I'm seeing a problem which I think is a problem in the SCSI mid layer.
> 
> Check this out:
> 
> I can rmmod and insmod hpsa (a modified version from 
> what's currently in the mainline tree, but I don't think
> that matters.)
> 
> I have one logical drive present
> 
> [root@slicer ~]# rmmod hpsa
> [root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
>   Vendor: HP       Model: 1210m            Rev: 0150
>   Type:   RAID                             ANSI  SCSI revision: 05
> Host: scsi1 Channel: 00 Id: 00 Lun: 01
>   Vendor: HP       Model: 1210m VOLUME     Rev: 0150
>   Type:   Direct-Access                    ANSI  SCSI revision: 05
> [root@slicer ~]# rmmod hpsa
> [root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi2 Channel: 00 Id: 00 Lun: 00
>   Vendor: HP       Model: 1210m            Rev: 0150
>   Type:   RAID                             ANSI  SCSI revision: 05
> Host: scsi2 Channel: 00 Id: 00 Lun: 01
>   Vendor: HP       Model: 1210m VOLUME     Rev: 0150
>   Type:   Direct-Access                    ANSI  SCSI revision: 05
> [root@slicer ~]# lsscsi -g
> [2:0:0:0]    storage HP       1210m            0150  -         /dev/sg0
> [2:0:0:1]    disk    HP       1210m VOLUME     0150  /dev/sda  /dev/sg1
> 
> So far, so good.
> 
> Now, watch this.  Remove the device while something has it open:
> 
> [root@slicer ~]# sleep 10 < /dev/sg1 & ( sleep 1 && echo scsi remove-single-device 2 0 0 1 > /proc/scsi/scsi )
> [1] 6077
> [root@slicer ~]# 
> [1]+  Done                    sleep 10 < /dev/sg1
> [root@slicer ~]# lsof /dev/sg1
> lsof: status error on /dev/sg1: No such file or directory
> lsof 4.78
>  latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
>  latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
>  latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
>  usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f]
>  [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]]
>  [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names]
> Use the ``-h'' option to get more help information.
> [root@slicer ~]# rmmod hpsa
> ERROR: Module hpsa is in use
> [root@slicer ~]#
> 
> Hmm, that's not cool.

Steve,
That 'sleep 10 < /dev/sg1' worries me. The purpose of a
read() on a sg device is to fetch the response of a SCSI
command sent by a preceding write(). So with nothing to
read and a blocking sg device file descriptor the read()
probably hangs. IMO the valid use of the sg driver should
not have a read() hanging for a SCSI command that was
never sent. While that is happening you remove the
device.

That may be a valid torture test for the sg driver but
isn't something that should be encouraged from the
user space.

On a Ubuntu kernel 2.6.31-17-generic using a virtual
device owned by the scsi_debug driver and the same
torture test, I don't have a problem with 'rmmod scsi_debug'

IMO the usb-storage driver is not a good yardstick.

Doug Gilbert


> Maybe it's my driver.  Let me try with USB.
> 
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi1 Channel: 00 Id: 00 Lun: 00
>   Vendor: USB      Model: DISK 2.0         Rev: 0403
>   Type:   Direct-Access                    ANSI  SCSI revision: 00
> [root@slicer ~]# lsmod | grep sd
> sd_mod                 59592  0 
> scsi_mod              189304  9 usb_storage,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,scsi_dh,sg,cciss,sd_mod
> [root@slicer ~]# rmmod usb_storage
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> [root@slicer ~]# modprobe usb_storage
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> [root@slicer ~]# echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi
> -bash: echo: write error: No such device or address
> 
> Oh yeah, the host number increments, forgot about that...
> 
> [root@slicer ~]# echo scsi add-single-device 2 0 0 0 > /proc/scsi/scsi
> [root@slicer ~]# cat /proc/scsi/scsi
> Attached devices:
> Host: scsi2 Channel: 00 Id: 00 Lun: 00
>   Vendor: USB      Model: DISK 2.0         Rev: 0403
>   Type:   Direct-Access                    ANSI  SCSI revision: 00
> [root@slicer ~]# lsscsi -g
> [2:0:0:0]    disk    USB      DISK 2.0         0403  /dev/sda  /dev/sg0
> [root@slicer ~]# sleep 10 < /dev/sg0 & ( sleep 1 && echo scsi remove-single-device 2 0 0 0 > /proc/scsi/scsi )
> [1] 6073
> [root@slicer ~]# 
> [root@slicer ~]# 
> [1]+  Done                    sleep 10 < /dev/sg0
> [root@slicer ~]# rmmod usb_storage
> ERROR: Module usb_storage is in use
> [root@slicer ~]#
> 
> Hmm, same thing.
> 
> Any thoughts?  (other than "don't do that."  Our array configuration
> utility for smart arrays is causing similar trouble, as it rapidly creates
> and deletes logical drives, etc. so it would be nice if this didn't happen.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-14 18:24 ` Douglas Gilbert
@ 2010-01-14 19:00   ` scameron
  2010-01-14 20:07   ` scameron
  1 sibling, 0 replies; 13+ messages in thread
From: scameron @ 2010-01-14 19:00 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, James.Bottomley, mikem, dab, hch, axboe,
	FUJITA Tomonori

On Thu, Jan 14, 2010 at 01:24:59PM -0500, Douglas Gilbert wrote:
> scameron@beardog.cce.hp.com wrote:
> >I'm seeing a problem which I think is a problem in the SCSI mid layer.
> >
> >Check this out:
> >
> >I can rmmod and insmod hpsa (a modified version from 
> >what's currently in the mainline tree, but I don't think
> >that matters.)
> >
> >I have one logical drive present
> >
> >[root@slicer ~]# rmmod hpsa
> >[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi1 Channel: 00 Id: 00 Lun: 00
> >  Vendor: HP       Model: 1210m            Rev: 0150
> >  Type:   RAID                             ANSI  SCSI revision: 05
> >Host: scsi1 Channel: 00 Id: 00 Lun: 01
> >  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
> >  Type:   Direct-Access                    ANSI  SCSI revision: 05
> >[root@slicer ~]# rmmod hpsa
> >[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi2 Channel: 00 Id: 00 Lun: 00
> >  Vendor: HP       Model: 1210m            Rev: 0150
> >  Type:   RAID                             ANSI  SCSI revision: 05
> >Host: scsi2 Channel: 00 Id: 00 Lun: 01
> >  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
> >  Type:   Direct-Access                    ANSI  SCSI revision: 05
> >[root@slicer ~]# lsscsi -g
> >[2:0:0:0]    storage HP       1210m            0150  -         /dev/sg0
> >[2:0:0:1]    disk    HP       1210m VOLUME     0150  /dev/sda  /dev/sg1
> >
> >So far, so good.
> >
> >Now, watch this.  Remove the device while something has it open:
> >
> >[root@slicer ~]# sleep 10 < /dev/sg1 & ( sleep 1 && echo scsi 
> >remove-single-device 2 0 0 1 > /proc/scsi/scsi )
> >[1] 6077
> >[root@slicer ~]# 
> >[1]+  Done                    sleep 10 < /dev/sg1
> >[root@slicer ~]# lsof /dev/sg1
> >lsof: status error on /dev/sg1: No such file or directory
> >lsof 4.78
> > latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
> > latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
> > latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
> > usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f]
> > [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]]
> > [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] 
> > [names]
> >Use the ``-h'' option to get more help information.
> >[root@slicer ~]# rmmod hpsa
> >ERROR: Module hpsa is in use
> >[root@slicer ~]#
> >
> >Hmm, that's not cool.
> 
> Steve,
> That 'sleep 10 < /dev/sg1' worries me. The purpose of a
> read() on a sg device is to fetch the response of a SCSI
> command sent by a preceding write(). So with nothing to
> read and a blocking sg device file descriptor the read()
> probably hangs. IMO the valid use of the sg driver should
> not have a read() hanging for a SCSI command that was
> never sent. While that is happening you remove the
> device.

I don't think the sleep does a read from stdin.  As far
as I can tell, There is no read, just an open(), by the shell,
to connect to sleep's stdin, but sleep never reads from stdin.
It does some reads for locale stuff, but not from stdin.
According to strace, all reads by sleep are from file descriptor
3, which is opened to various things, locale, random, library
stuff, but no reads from stdin.  Just an open() by the shell.

> 
> That may be a valid torture test for the sg driver but
> isn't something that should be encouraged from the
> user space.

Yeah, of course this test isn't exactly a sane thing to do,
but, if someone happens to "echo scsi remove-single-device ... "
while some process has the corresponding /dev/sg node merely opened, 
wedging things seems, well, kinda bad.  And it was something
we ran into testing other software, I just isolated it down to this
test case. 

This, however, appears to work:

[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: USB      Model: DISK 2.0         Rev: 0403
  Type:   Direct-Access                    ANSI  SCSI revision: 00
[root@slicer ~]# lsscsi
[1:0:0:0]    disk    USB      DISK 2.0         0403  /dev/sdb
[root@slicer ~]# sleep 10 < /dev/sdb & ( sleep 1 && echo scsi remove-single-device 1 0 0 0 > /proc/scsi/scsi )
[1] 5942
[root@slicer ~]# 
[1]+  Done                    sleep 10 < /dev/sdb
[root@slicer ~]# rmmod usb_storage

and this works too:

[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
[root@slicer ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: 1210m            Rev: 0150
  Type:   RAID                             ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 01
  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
  Type:   Direct-Access                    ANSI  SCSI revision: 05
[root@slicer ~]# lsscsi
[2:0:0:0]    storage HP       1210m            0150  -       
[2:0:0:1]    disk    HP       1210m VOLUME     0150  /dev/sda
[root@slicer ~]# sleep 10 < /dev/sda & ( sleep 1 && echo scsi remove-single-device 2 0 0 1 > /proc/scsi/scsi )
[1] 6087
[root@slicer ~]# 
[1]+  Done                    sleep 10 < /dev/sda
[root@slicer ~]# rmmod hpsa
[root@slicer ~]#



> 
> On a Ubuntu kernel 2.6.31-17-generic using a virtual
> device owned by the scsi_debug driver and the same
> torture test, I don't have a problem with 'rmmod scsi_debug'
> 
> IMO the usb-storage driver is not a good yardstick.
> 

Yeah, wasn't my first choice for a guinea pig, as I know it
does some strange things compared to other scsi drivers, but
it was what I had handy, hardware-wise.

I'll try to scrounge up some different hardware, and see if they
behave the same.

> Doug Gilbert
> 
> 
> >Maybe it's my driver.  Let me try with USB.
> >
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi1 Channel: 00 Id: 00 Lun: 00
> >  Vendor: USB      Model: DISK 2.0         Rev: 0403
> >  Type:   Direct-Access                    ANSI  SCSI revision: 00
> >[root@slicer ~]# lsmod | grep sd
> >sd_mod                 59592  0 
> >scsi_mod              189304  9 
> >usb_storage,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,scsi_dh,sg,cciss,sd_mod
> >[root@slicer ~]# rmmod usb_storage
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >[root@slicer ~]# modprobe usb_storage
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >[root@slicer ~]# echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi
> >-bash: echo: write error: No such device or address
> >
> >Oh yeah, the host number increments, forgot about that...
> >
> >[root@slicer ~]# echo scsi add-single-device 2 0 0 0 > /proc/scsi/scsi
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi2 Channel: 00 Id: 00 Lun: 00
> >  Vendor: USB      Model: DISK 2.0         Rev: 0403
> >  Type:   Direct-Access                    ANSI  SCSI revision: 00
> >[root@slicer ~]# lsscsi -g
> >[2:0:0:0]    disk    USB      DISK 2.0         0403  /dev/sda  /dev/sg0
> >[root@slicer ~]# sleep 10 < /dev/sg0 & ( sleep 1 && echo scsi 
> >remove-single-device 2 0 0 0 > /proc/scsi/scsi )
> >[1] 6073
> >[root@slicer ~]# 
> >[root@slicer ~]# 
> >[1]+  Done                    sleep 10 < /dev/sg0
> >[root@slicer ~]# rmmod usb_storage
> >ERROR: Module usb_storage is in use
> >[root@slicer ~]#
> >
> >Hmm, same thing.
> >
> >Any thoughts?  (other than "don't do that."  Our array configuration
> >utility for smart arrays is causing similar trouble, as it rapidly creates
> >and deletes logical drives, etc. so it would be nice if this didn't 
> >happen.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-14 18:24 ` Douglas Gilbert
  2010-01-14 19:00   ` scameron
@ 2010-01-14 20:07   ` scameron
  2010-01-14 23:43     ` Stefan Richter
  1 sibling, 1 reply; 13+ messages in thread
From: scameron @ 2010-01-14 20:07 UTC (permalink / raw)
  To: Douglas Gilbert
  Cc: linux-scsi, James.Bottomley, mikem, dab, hch, axboe,
	FUJITA Tomonori

On Thu, Jan 14, 2010 at 01:24:59PM -0500, Douglas Gilbert wrote:
> scameron@beardog.cce.hp.com wrote:
> >I'm seeing a problem which I think is a problem in the SCSI mid layer.
> >
> >Check this out:
> >
> >I can rmmod and insmod hpsa (a modified version from 
> >what's currently in the mainline tree, but I don't think
> >that matters.)
> >
> >I have one logical drive present
> >
> >[root@slicer ~]# rmmod hpsa
> >[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi1 Channel: 00 Id: 00 Lun: 00
> >  Vendor: HP       Model: 1210m            Rev: 0150
> >  Type:   RAID                             ANSI  SCSI revision: 05
> >Host: scsi1 Channel: 00 Id: 00 Lun: 01
> >  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
> >  Type:   Direct-Access                    ANSI  SCSI revision: 05
> >[root@slicer ~]# rmmod hpsa
> >[root@slicer ~]# insmod /usr/src/linux-2.6.27.42/drivers/scsi/hpsa.ko
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi2 Channel: 00 Id: 00 Lun: 00
> >  Vendor: HP       Model: 1210m            Rev: 0150
> >  Type:   RAID                             ANSI  SCSI revision: 05
> >Host: scsi2 Channel: 00 Id: 00 Lun: 01
> >  Vendor: HP       Model: 1210m VOLUME     Rev: 0150
> >  Type:   Direct-Access                    ANSI  SCSI revision: 05
> >[root@slicer ~]# lsscsi -g
> >[2:0:0:0]    storage HP       1210m            0150  -         /dev/sg0
> >[2:0:0:1]    disk    HP       1210m VOLUME     0150  /dev/sda  /dev/sg1
> >
> >So far, so good.
> >
> >Now, watch this.  Remove the device while something has it open:
> >
> >[root@slicer ~]# sleep 10 < /dev/sg1 & ( sleep 1 && echo scsi 
> >remove-single-device 2 0 0 1 > /proc/scsi/scsi )
> >[1] 6077
> >[root@slicer ~]# 
> >[1]+  Done                    sleep 10 < /dev/sg1
> >[root@slicer ~]# lsof /dev/sg1
> >lsof: status error on /dev/sg1: No such file or directory
> >lsof 4.78
> > latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
> > latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
> > latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
> > usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f]
> > [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]]
> > [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] 
> > [names]
> >Use the ``-h'' option to get more help information.
> >[root@slicer ~]# rmmod hpsa
> >ERROR: Module hpsa is in use
> >[root@slicer ~]#
> >
> >Hmm, that's not cool.
> 
> Steve,
> That 'sleep 10 < /dev/sg1' worries me. The purpose of a
> read() on a sg device is to fetch the response of a SCSI
> command sent by a preceding write(). So with nothing to
> read and a blocking sg device file descriptor the read()
> probably hangs. IMO the valid use of the sg driver should
> not have a read() hanging for a SCSI command that was
> never sent. While that is happening you remove the
> device.
> 
> That may be a valid torture test for the sg driver but
> isn't something that should be encouraged from the
> user space.
> 
> On a Ubuntu kernel 2.6.31-17-generic using a virtual
> device owned by the scsi_debug driver and the same
> torture test, I don't have a problem with 'rmmod scsi_debug'
> 
> IMO the usb-storage driver is not a good yardstick.

I tried it with mptsas with the same results:

mptlinuxtest:~ # cat /proc/scsi/scsi
mptlinuxtest:~ # modprobe mptsas
mptlinuxtest:~ # cat /proc/scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: DG072A9BB7       Rev: HPD0
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: DG072A8B54       Rev: HPD4
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi2 Channel: 01 Id: 03 Lun: 00
  Vendor: LSILOGIC Model: Logical Volume   Rev: 3000
  Type:   Direct-Access                    ANSI  SCSI revision: 02
mptlinuxtest:~ # rmmod mptsas
mptlinuxtest:~ # modprobe mptsas
mptlinuxtest:~ # cat /proc/scsi/scsi
Attached devices:
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: HP       Model: DG072A9BB7       Rev: HPD0
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi3 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: DG072A8B54       Rev: HPD4
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi3 Channel: 01 Id: 03 Lun: 00
  Vendor: LSILOGIC Model: Logical Volume   Rev: 3000
  Type:   Direct-Access                    ANSI  SCSI revision: 02
mptlinuxtest:~ # lsscsi -g
[3:0:0:0]    disk    HP       DG072A9BB7       HPD0  -         /dev/sg0
[3:0:1:0]    disk    HP       DG072A8B54       HPD4  -         /dev/sg1
[3:1:3:0]    disk    LSILOGIC Logical Volume   3000  /dev/sda  /dev/sg2
mptlinuxtest:~ # sleep 10 < /dev/sg2 & ( sleep 1 && echo scsi remove-single-device 3 1 3 0 > /proc/scsi/scsi )
[1] 797
mptlinuxtest:~ # 
[1]+  Done                    sleep 10 < /dev/sg2
mptlinuxtest:~ # echo $?
0
mptlinuxtest:~ # rmmod mptsas
ERROR: Module mptsas is in use
mptlinuxtest:~ #


Also I tried using a C program rather than the sleep command, to remove
the possibility of a read from stdin by sleep command (even though
I think it doesn't, according to strace.)

[root@slicer ~]# cat sleeptest.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char *argv[])
{
	int fd;
	
	fd = open(argv[1], O_RDONLY);
	if (fd < 0) {
		perror("open");
		return 1;	
	}
	sleep(10);
	return 0;
}

Using that, used in place of the "sleep < /dev/sg1", made no difference.

-- steve


> 
> Doug Gilbert
> 
> 
> >Maybe it's my driver.  Let me try with USB.
> >
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi1 Channel: 00 Id: 00 Lun: 00
> >  Vendor: USB      Model: DISK 2.0         Rev: 0403
> >  Type:   Direct-Access                    ANSI  SCSI revision: 00
> >[root@slicer ~]# lsmod | grep sd
> >sd_mod                 59592  0 
> >scsi_mod              189304  9 
> >usb_storage,ib_iser,iscsi_tcp,libiscsi,scsi_transport_iscsi,scsi_dh,sg,cciss,sd_mod
> >[root@slicer ~]# rmmod usb_storage
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >[root@slicer ~]# modprobe usb_storage
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >[root@slicer ~]# echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi
> >-bash: echo: write error: No such device or address
> >
> >Oh yeah, the host number increments, forgot about that...
> >
> >[root@slicer ~]# echo scsi add-single-device 2 0 0 0 > /proc/scsi/scsi
> >[root@slicer ~]# cat /proc/scsi/scsi
> >Attached devices:
> >Host: scsi2 Channel: 00 Id: 00 Lun: 00
> >  Vendor: USB      Model: DISK 2.0         Rev: 0403
> >  Type:   Direct-Access                    ANSI  SCSI revision: 00
> >[root@slicer ~]# lsscsi -g
> >[2:0:0:0]    disk    USB      DISK 2.0         0403  /dev/sda  /dev/sg0
> >[root@slicer ~]# sleep 10 < /dev/sg0 & ( sleep 1 && echo scsi 
> >remove-single-device 2 0 0 0 > /proc/scsi/scsi )
> >[1] 6073
> >[root@slicer ~]# 
> >[root@slicer ~]# 
> >[1]+  Done                    sleep 10 < /dev/sg0
> >[root@slicer ~]# rmmod usb_storage
> >ERROR: Module usb_storage is in use
> >[root@slicer ~]#
> >
> >Hmm, same thing.
> >
> >Any thoughts?  (other than "don't do that."  Our array configuration
> >utility for smart arrays is causing similar trouble, as it rapidly creates
> >and deletes logical drives, etc. so it would be nice if this didn't 
> >happen.)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-14 20:07   ` scameron
@ 2010-01-14 23:43     ` Stefan Richter
  2010-01-15  8:10       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Richter @ 2010-01-14 23:43 UTC (permalink / raw)
  To: scameron
  Cc: Douglas Gilbert, linux-scsi, James.Bottomley, mikem, dab, hch,
	axboe, FUJITA Tomonori

scameron@beardog.cce.hp.com wrote:
> On Thu, Jan 14, 2010 at 01:24:59PM -0500, Douglas Gilbert wrote:
>> scameron@beardog.cce.hp.com wrote:
>>> I'm seeing a problem which I think is a problem in the SCSI mid layer.
[...]
>>> Remove the device while something has it open:
[...]
>>> [root@slicer ~]# rmmod hpsa
>>> ERROR: Module hpsa is in use

The sg driver's open method takes a reference to the underlying SCSI
device representation of the mid layer.  Among else, this step increases
the module use count of the respective low-level driver (transport layer
driver) so that the SCSI mid layer can be sure that function pointers to
driver methods stay valid during the lifetime of the SCSI device
representation.

This reference taking is of course being reversed when the sg driver
finishes its last uses of the underlying SCSI device.  This may be at
the respective close() or even later.

In short, it is normal, expected, and necessary what you are seeing.
-- 
Stefan Richter
-=====-==-=- ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
@ 2010-01-15  2:14 Stephen Cameron
  2010-01-15 13:02 ` Stefan Richter
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Cameron @ 2010-01-15  2:14 UTC (permalink / raw)
  To: linux-scsi

stefanr@s5r6.in-berlin.de wrote:
> scameron@beardog.cce.hp.com wrote:
> > On Thu, Jan 14, 2010 at 01:24:59PM -0500, Douglas Gilbert wrote:
> >> scameron@beardog.cce.hp.com wrote:
> >>> I'm seeing a problem which I think is a problem in the SCSI mid layer.
> [...]
> >>> Remove the device while something has it open:
> [...]
> >>> [root@slicer ~]# rmmod hpsa
> >>> ERROR: Module hpsa is in use
> 
> The sg driver's open method takes a reference to the underlying SCSI
> device representation of the mid layer.  Among else, this step increases
> the module use count of the respective low-level driver (transport layer
> driver) so that the SCSI mid layer can be sure that function pointers to
> driver methods stay valid during the lifetime of the SCSI device
> representation.
> 
> This reference taking is of course being reversed when the sg driver
> finishes its last uses of the underlying SCSI device.  This may be at
> the respective close() or even later.
> 
> In short, it is normal, expected, and necessary what you are seeing.
> -- 
> Stefan Richter
> 

I don't think you are correct.  Look more closely at my test cases.

When I attempt the rmmod, *nothing* has the device open.  The last
close has already occurred -- albeit *after* the device was
removed.  

In my test, the module *never* becomes rmmod-able.  Doesn't matter
how long you wait, and nothing will ever decrement the reference
count.

Are you saying that this is correct behavior, that once a device
is removed while a process has it open that it shoule *never*
be rmmod'able?

Because that is the behavior I am seeing.

-- steve



      

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-14 23:43     ` Stefan Richter
@ 2010-01-15  8:10       ` Jens Axboe
  2010-01-15  9:03         ` Stefan Richter
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2010-01-15  8:10 UTC (permalink / raw)
  To: Stefan Richter
  Cc: scameron, Douglas Gilbert, linux-scsi, James.Bottomley, mikem,
	dab, hch, FUJITA Tomonori

On Fri, Jan 15 2010, Stefan Richter wrote:
> scameron@beardog.cce.hp.com wrote:
> > On Thu, Jan 14, 2010 at 01:24:59PM -0500, Douglas Gilbert wrote:
> >> scameron@beardog.cce.hp.com wrote:
> >>> I'm seeing a problem which I think is a problem in the SCSI mid layer.
> [...]
> >>> Remove the device while something has it open:
> [...]
> >>> [root@slicer ~]# rmmod hpsa
> >>> ERROR: Module hpsa is in use
> 
> The sg driver's open method takes a reference to the underlying SCSI
> device representation of the mid layer.  Among else, this step increases
> the module use count of the respective low-level driver (transport layer
> driver) so that the SCSI mid layer can be sure that function pointers to
> driver methods stay valid during the lifetime of the SCSI device
> representation.
> 
> This reference taking is of course being reversed when the sg driver
> finishes its last uses of the underlying SCSI device.  This may be at
> the respective close() or even later.
> 
> In short, it is normal, expected, and necessary what you are seeing.

Hmm... Unless I'm reading Stephens email incorrectly, he's holding the
device open, removing it, closing the device, and then attempting to
remove the host driver. So at the point that he wants to rmmod the
module, there is indeed no references to it anymore.

It looks like a bug.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15  8:10       ` Jens Axboe
@ 2010-01-15  9:03         ` Stefan Richter
  2010-01-15 21:44           ` scameron
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Richter @ 2010-01-15  9:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: scameron, Douglas Gilbert, linux-scsi, James.Bottomley, mikem,
	dab, hch, FUJITA Tomonori

On 15 Jan, Jens Axboe wrote:
> On Fri, Jan 15 2010, Stefan Richter wrote:
>> The sg driver's open method takes a reference to the underlying SCSI
>> device representation of the mid layer.  Among else, this step increases
>> the module use count of the respective low-level driver
...
>> In short, it is normal, expected, and necessary what you are seeing.
> 
> Hmm... Unless I'm reading Stephens email incorrectly, he's holding the
> device open, removing it, closing the device, and then attempting to
> remove the host driver. So at the point that he wants to rmmod the
> module, there is indeed no references to it anymore.
> 
> It looks like a bug.

Oh, right, I missed that the device file was indeed closed ( = the
seemingly last user of it exited) before rmmod was attempted.

I.e. remove-single-device while an sg file is open leaves a dangling
reference, while remove-single-device while an sd file is open does
not...
-- 
Stefan Richter
-=====-==-=- ---= -====
http://arcgraph.de/sr/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15  2:14 Is this a known problem with the SCSI mid layer? Stephen Cameron
@ 2010-01-15 13:02 ` Stefan Richter
  2010-01-15 14:18   ` Stephen Cameron
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Richter @ 2010-01-15 13:02 UTC (permalink / raw)
  To: Stephen Cameron; +Cc: linux-scsi

Stephen Cameron wrote:
> stefanr@s5r6.in-berlin.de wrote:
>> The sg driver's open method takes a reference to the underlying SCSI
>> device representation of the mid layer.  Among else, this step increases
>> the module use count of the respective low-level driver (transport layer
>> driver) so that the SCSI mid layer can be sure that function pointers to
>> driver methods stay valid during the lifetime of the SCSI device
>> representation.
>>
>> This reference taking is of course being reversed when the sg driver
>> finishes its last uses of the underlying SCSI device.  This may be at
>> the respective close() or even later.
>>
>> In short, it is normal, expected, and necessary what you are seeing.
>> -- 
>> Stefan Richter
>>
> 
> I don't think you are correct.  Look more closely at my test cases.
> 
> When I attempt the rmmod, *nothing* has the device open.  The last
> close has already occurred -- albeit *after* the device was
> removed.  

Yep, sorry, I didn't read your report as carefully as necessary.

> In my test, the module *never* becomes rmmod-able.  Doesn't matter
> how long you wait, and nothing will ever decrement the reference
> count.

Did you check that also after, say, several minutes?  I suppose you
already looked at the kernel log for possibly related messages and would
have reported them as well...  Nevertheless, I have seen some cases of
device removal or driver removal where the SCSI layer spent many minutes
(5? 15? I'm not sure anymore) in error handling although the lower layer
had done its best (best but perhaps not well) to report the device as
physically gone.

Did you already test really long after device removal and last close,
along the lines of this?
# while ((i++ < 20)); do rmmod $driver && break; sleep 60; done
-- 
Stefan Richter
-=====-==-=- ---= -====
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15 13:02 ` Stefan Richter
@ 2010-01-15 14:18   ` Stephen Cameron
  2010-01-15 17:50     ` Douglas Gilbert
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Cameron @ 2010-01-15 14:18 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux-scsi

--- On Fri, 1/15/10, Stefan Richter <stefanr@s5r6.in-berlin.de> wrote:

...

> 
> Did you check that also after, say, several minutes? 
> I suppose you
> already looked at the kernel log for possibly related
> messages and would
> have reported them as well...  Nevertheless, I have
> seen some cases of
> device removal or driver removal where the SCSI layer spent
> many minutes
> (5? 15? I'm not sure anymore) in error handling although
> the lower layer
> had done its best (best but perhaps not well) to report the
> device as
> physically gone.
> 
> Did you already test really long after device removal and
> last close,
> along the lines of this?
> # while ((i++ < 20)); do rmmod $driver && break;
> sleep 60; done


I had left it sitting since last night (~6:00pm localtime), and this morning (~8:00am localtime), it still won't rmmod... so ~14 hours.

-- steve

> -- 
> Stefan Richter
> -=====-==-=- ---= -====
> http://arcgraph.de/sr/
> 


      
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15 14:18   ` Stephen Cameron
@ 2010-01-15 17:50     ` Douglas Gilbert
  0 siblings, 0 replies; 13+ messages in thread
From: Douglas Gilbert @ 2010-01-15 17:50 UTC (permalink / raw)
  To: Stephen Cameron; +Cc: Stefan Richter, linux-scsi

Steve,
Could you confirm whether, in your environment, the scsi_debug
driver has the same problem.

Doug Gilbert


Stephen Cameron wrote:
> --- On Fri, 1/15/10, Stefan Richter <stefanr@s5r6.in-berlin.de> wrote:
> 
> ...
> 
>> Did you check that also after, say, several minutes? 
>> I suppose you
>> already looked at the kernel log for possibly related
>> messages and would
>> have reported them as well...  Nevertheless, I have
>> seen some cases of
>> device removal or driver removal where the SCSI layer spent
>> many minutes
>> (5? 15? I'm not sure anymore) in error handling although
>> the lower layer
>> had done its best (best but perhaps not well) to report the
>> device as
>> physically gone.
>>
>> Did you already test really long after device removal and
>> last close,
>> along the lines of this?
>> # while ((i++ < 20)); do rmmod $driver && break;
>> sleep 60; done
> 
> 
> I had left it sitting since last night (~6:00pm localtime), and this morning (~8:00am localtime), it still won't rmmod... so ~14 hours.
> 
> -- steve
> 
>> -- 
>> Stefan Richter
>> -=====-==-=- ---= -====
>> http://arcgraph.de/sr/
>>
> 
> 
>       
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15  9:03         ` Stefan Richter
@ 2010-01-15 21:44           ` scameron
  2010-01-16 11:47             ` FUJITA Tomonori
  0 siblings, 1 reply; 13+ messages in thread
From: scameron @ 2010-01-15 21:44 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Jens Axboe, Douglas Gilbert, linux-scsi, James.Bottomley, mikem,
	dab, hch, FUJITA Tomonori

On Fri, Jan 15, 2010 at 10:03:09AM +0100, Stefan Richter wrote:
> On 15 Jan, Jens Axboe wrote:
> > On Fri, Jan 15 2010, Stefan Richter wrote:
> >> The sg driver's open method takes a reference to the underlying SCSI
> >> device representation of the mid layer.  Among else, this step increases
> >> the module use count of the respective low-level driver
> ...
> >> In short, it is normal, expected, and necessary what you are seeing.
> > 
> > Hmm... Unless I'm reading Stephens email incorrectly, he's holding the
> > device open, removing it, closing the device, and then attempting to
> > remove the host driver. So at the point that he wants to rmmod the
> > module, there is indeed no references to it anymore.
> > 
> > It looks like a bug.
> 
> Oh, right, I missed that the device file was indeed closed ( = the
> seemingly last user of it exited) before rmmod was attempted.
> 
> I.e. remove-single-device while an sg file is open leaves a dangling
> reference, while remove-single-device while an sd file is open does
> not...
> -- 
> Stefan Richter
> -=====-==-=- ---= -====
> http://arcgraph.de/sr/
> 

I've since tried this on a newer kernel, 2.6.33rc4 (previously I 
was using a 2.6.27 kernel). 

With the 2.6.33rc4 kernel, so far the problem hasn't appeared. 

Thanks, and sorry I didn't try the newer kernel first.

-- steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Is this a known problem with the SCSI mid layer?
  2010-01-15 21:44           ` scameron
@ 2010-01-16 11:47             ` FUJITA Tomonori
  0 siblings, 0 replies; 13+ messages in thread
From: FUJITA Tomonori @ 2010-01-16 11:47 UTC (permalink / raw)
  To: scameron
  Cc: stefanr, jens.axboe, dgilbert, linux-scsi, James.Bottomley, mikem,
	dab, hch, fujita.tomonori

On Fri, 15 Jan 2010 15:44:55 -0600
scameron@beardog.cce.hp.com wrote:

> I've since tried this on a newer kernel, 2.6.33rc4 (previously I 
> was using a 2.6.27 kernel). 
> 
> With the 2.6.33rc4 kernel, so far the problem hasn't appeared. 

Some race bugs about device removal that might be related with this
issue were fixed around 2.6.30, I think.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-01-16 11:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-15  2:14 Is this a known problem with the SCSI mid layer? Stephen Cameron
2010-01-15 13:02 ` Stefan Richter
2010-01-15 14:18   ` Stephen Cameron
2010-01-15 17:50     ` Douglas Gilbert
  -- strict thread matches above, loose matches on Subject: below --
2010-01-14 16:56 scameron
2010-01-14 18:24 ` Douglas Gilbert
2010-01-14 19:00   ` scameron
2010-01-14 20:07   ` scameron
2010-01-14 23:43     ` Stefan Richter
2010-01-15  8:10       ` Jens Axboe
2010-01-15  9:03         ` Stefan Richter
2010-01-15 21:44           ` scameron
2010-01-16 11:47             ` FUJITA Tomonori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).