public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
@ 2004-01-27  0:32 Willem Riede
  2004-01-28  3:30 ` Willem Riede
  0 siblings, 1 reply; 8+ messages in thread
From: Willem Riede @ 2004-01-27  0:32 UTC (permalink / raw)
  To: linux-scsi

For additional fun shaking bugs out of ide-scsi, I dusted of my old
PD/CD drive (retired years ago in favor of a CD Writer).

Besides having to set   host->max_lun = 2;  in  idescsi_attach() to
have both luns of this device detected:

Jan 26 17:13:23 fallguy kernel: scsi6 : SCSI host adapter emulation for IDE ATAPI devices
Jan 26 17:13:23 fallguy kernel:   Vendor: NEC       Model: PD-1 ODX654P      Rev: A111
Jan 26 17:13:23 fallguy kernel:   Type:   CD-ROM                             ANSI SCSI revision: 02
Jan 26 17:13:23 fallguy kernel: sr1: scsi3-mmc drive: 6x/6x xa/form2 cdda tray
Jan 26 17:13:23 fallguy kernel:   Vendor: NEC       Model: PD-1 ODX654P      Rev: A111
Jan 26 17:13:23 fallguy kernel:   Type:   Optical Device                     ANSI SCSI revision: 02
Jan 26 17:13:23 fallguy kernel: SCSI device sdc: 1298496 512-byte hdwr sectors (665 MB)
Jan 26 17:13:23 fallguy kernel: sdc: Write Protect is off
Jan 26 17:13:23 fallguy kernel: SCSI device sdc: drive cache: write back

I get a bunch of the following warning:

Jan 26 17:13:23 fallguy kernel: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
Jan 26 17:13:23 fallguy kernel: Call Trace:
Jan 26 17:13:23 fallguy kernel:  [<e0850192>] scsi_single_lun_run+0x202/0x230 [scsi_mod]
Jan 26 17:13:23 fallguy kernel:  [<e084a33d>] scsi_put_command+0xbd/0x160 [scsi_mod]
Jan 26 17:13:23 fallguy kernel:  [<e0850347>] scsi_run_queue+0x187/0x190 [scsi_mod]
Jan 26 17:13:23 fallguy kernel:  [<e0850524>] scsi_end_request+0xf4/0x150 [scsi_mod]
Jan 26 17:13:23 fallguy kernel:  [<e08508f7>] scsi_io_completion+0x1c7/0x4b0 [scsi_mod]
Jan 26 17:13:23 fallguy kernel:  [<e0829a32>] sd_rw_intr+0x82/0x260 [sd_mod]
Jan 26 17:13:24 fallguy kernel:  [<e084aed1>] scsi_finish_command+0x81/0xe0 [scsi_mod]
Jan 26 17:13:24 fallguy kernel:  [<e084ade8>] scsi_softirq+0xc8/0xf0 [scsi_mod]
Jan 26 17:13:24 fallguy kernel:  [<c012d6a7>] do_softirq+0xc7/0xd0
Jan 26 17:13:24 fallguy kernel:  [<c010e395>] do_IRQ+0x165/0x220
Jan 26 17:13:24 fallguy kernel:  [<c011c04d>] smp_apic_timer_interrupt+0xdd/0x150
Jan 26 17:13:24 fallguy kernel:  [<c010c21c>] common_interrupt+0x18/0x20
Jan 26 17:13:24 fallguy kernel:  [<c01089f0>] default_idle+0x0/0x40
Jan 26 17:13:24 fallguy kernel:  [<c0108a19>] default_idle+0x29/0x40
Jan 26 17:13:24 fallguy kernel:  [<c0108aab>] cpu_idle+0x3b/0x50
Jan 26 17:13:24 fallguy kernel:  [<c0128f80>] printk+0x180/0x260
Jan 26 17:13:24 fallguy kernel:

Most are similar from do_IRQ up, io_completion called from sd_rw_intr or (sr) rw_intr,
which make its way to scsi_single_lun_run. (I don't understand why scsi_put_command
shows up, I believe it should be scsi_end_request -> scsi_next_request -> scsi_run_queue
-> scsi_single_lun_run.

One is different, don't know if that's a fluke (kernel log buffer overrun?) or evidence
of a rogue path...

Jan 26 17:14:12 fallguy kernel: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
Jan 26 17:14:12 fallguy kernel: Call Trace:
Jan 26 17:14:12 fallguy kernel:  [<e0850192>] scsi_single_lun_run+0x202/0x230 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084a33d>] scsi_put_command+0xbd/0x160 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e0850347>] scsi_run_queue+0x187/0x190 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084fd01>] scsi_wait_req+0xa1/0xb0 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084fb90>] scsi_wait_done+0x0/0xd0 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084a02e>] scsi_allocate_request+0x2e/0x70 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084bf7d>] ioctl_internal_command+0x6d/0x200 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e084c17f>] scsi_set_medium_removal+0x6f/0xa0 [scsi_mod]
Jan 26 17:14:12 fallguy kernel:  [<e0829602>] sd_release+0x72/0x90 [sd_mod]
Jan 26 17:14:12 fallguy kernel:  [<c0176b54>] blkdev_put+0x214/0x260
Jan 26 17:14:12 fallguy kernel:  [<c016d60a>] __fput+0x10a/0x120
Jan 26 17:14:12 fallguy kernel:  [<c016b7a9>] filp_close+0x59/0x90
Jan 26 17:14:13 fallguy kernel:  [<c016b85f>] sys_close+0x7f/0x100
Jan 26 17:14:13 fallguy kernel:  [<c010b8af>] syscall_call+0x7/0xb
Jan 26 17:14:13 fallguy kernel:

Somehow it appears that io completion sometimes happens twice without 
current_sdev->sdev_target->starget_sdev_user being set to a new owner so
that the  WARN_ON(!current_sdev->sdev_target->starget_sdev_user) triggers...

(I have proven to my satisfaction that no path exists that can set
current_sdev->sdev_target->starget_sdev_user to NULL to cause this).

Setting starget_sdev_user happens in scsi_request_fn(). Do any paths exist
that start io that don't use it?

Probably to simple minded to suspect the request sense following a command that
reports "Check Condition"?

Ideas of where to look next appreciated.

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-27  0:32 Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344 Willem Riede
@ 2004-01-28  3:30 ` Willem Riede
  2004-01-28 16:41   ` Patrick Mansfield
  2004-01-30 20:14   ` [PATCH] fix badness in scsi_single_lun_run Patrick Mansfield
  0 siblings, 2 replies; 8+ messages in thread
From: Willem Riede @ 2004-01-28  3:30 UTC (permalink / raw)
  To: linux-scsi

On 2004.01.26 19:32, Willem Riede wrote:
> Jan 26 17:13:23 fallguy kernel: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
> Jan 26 17:13:23 fallguy kernel: Call Trace:
> Jan 26 17:13:23 fallguy kernel:  [<e0850192>] scsi_single_lun_run+0x202/0x230 [scsi_mod]
> Jan 26 17:13:23 fallguy kernel:  [<e084a33d>] scsi_put_command+0xbd/0x160 [scsi_mod]
> Jan 26 17:13:23 fallguy kernel:  [<e0850347>] scsi_run_queue+0x187/0x190 [scsi_mod]
> Jan 26 17:13:23 fallguy kernel:  [<e0850524>] scsi_end_request+0xf4/0x150 [scsi_mod]
> Jan 26 17:13:23 fallguy kernel:  [<e08508f7>] scsi_io_completion+0x1c7/0x4b0 [scsi_mod]
> Jan 26 17:13:23 fallguy kernel:  [<e0829a32>] sd_rw_intr+0x82/0x260 [sd_mod]
> Jan 26 17:13:24 fallguy kernel:  [<e084aed1>] scsi_finish_command+0x81/0xe0 [scsi_mod]
> Jan 26 17:13:24 fallguy kernel:  [<e084ade8>] scsi_softirq+0xc8/0xf0 [scsi_mod]
> Jan 26 17:13:24 fallguy kernel:  [<c012d6a7>] do_softirq+0xc7/0xd0
> Jan 26 17:13:24 fallguy kernel:  [<c010e395>] do_IRQ+0x165/0x220

scsi_lib.c contains the following code:

                spin_lock(shost->host_lock);
                                                                                                                        
                if (!scsi_host_queue_ready(q, shost, sdev))
                        goto not_ready;
                if (sdev->single_lun) {
                        if (sdev->sdev_target->starget_sdev_user &&
                            sdev->sdev_target->starget_sdev_user != sdev)
                                goto not_ready;
                        sdev->sdev_target->starget_sdev_user = sdev;
                }
                shost->host_busy++;
                                                                                                                        
                /*
                 * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
                 *              take the lock again.
                 */
                spin_unlock_irq(shost->host_lock);


Is it a problem that the lock is taken by means of spin_lock() and then
released with spin_unlock_irq() ? We do need to lock against scsi_softirq
as in the backtrace above...

Thanks, Willem Riede.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-28  3:30 ` Willem Riede
@ 2004-01-28 16:41   ` Patrick Mansfield
  2004-01-28 17:36     ` Willem Riede
  2004-01-30 20:14   ` [PATCH] fix badness in scsi_single_lun_run Patrick Mansfield
  1 sibling, 1 reply; 8+ messages in thread
From: Patrick Mansfield @ 2004-01-28 16:41 UTC (permalink / raw)
  To: Willem Riede; +Cc: linux-scsi

On Tue, Jan 27, 2004 at 10:30:41PM -0500, Willem Riede wrote:
> On 2004.01.26 19:32, Willem Riede wrote:
> > Jan 26 17:13:23 fallguy kernel: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344

I was trying to figure out what happened, apparently I missed a check when
splitting the locks up and removing a lock hiearchy.

Per changeset 1.75.1.8, at this link:

http://linux.bkbits.net:8080/linux-2.5/diffs/drivers/scsi/scsi_lib.c@1.75.1.8?nav=index.html|src/|src/drivers|src/drivers/scsi|hist/drivers/scsi/scsi_lib.c

We should only clear starget_sdev_user when the sdev has no more IO. We
want to run (a bunch of) IO for a given starget_sdev_user, and when there is
no more IO, clear starget_sdev_user so someone else has a chance to run.

I'm trying to figure out a fix without adding any lock hiearchy. queue_lock
protects device_busy, but host_lock protects starget_sdev_user.

The single lun devices and code are annoying :-(

Do you know if the single lun code is for performance, or because of
hardware limitations - that is we don't want a disc change between IO, or
does the device just fail?

> scsi_lib.c contains the following code:
> 
>                 spin_lock(shost->host_lock);
>                                                                                                                         
>                 spin_unlock_irq(shost->host_lock);
> 
> 
> Is it a problem that the lock is taken by means of spin_lock() and then
> released with spin_unlock_irq() ? We do need to lock against scsi_softirq
> as in the backtrace above...

No, the previous unlock kept interrupts disabled, the code path looks like:

	spin_lock_irq(q->queue_lock);
	...
	spin_unlock(q->queue_lock);
	spin_lock(shost->host_lock);
	...
	spin_unlock_irq(shost->host_lock);

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-28 16:41   ` Patrick Mansfield
@ 2004-01-28 17:36     ` Willem Riede
  2004-01-28 17:53       ` Patrick Mansfield
  0 siblings, 1 reply; 8+ messages in thread
From: Willem Riede @ 2004-01-28 17:36 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

On 2004.01.28 11:41, Patrick Mansfield wrote:
> On Tue, Jan 27, 2004 at 10:30:41PM -0500, Willem Riede wrote:
> > On 2004.01.26 19:32, Willem Riede wrote:
> > > Jan 26 17:13:23 fallguy kernel: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
> 
> I was trying to figure out what happened, apparently I missed a check when
> splitting the locks up and removing a lock hiearchy.
> 
> We should only clear starget_sdev_user when the sdev has no more IO. We
> want to run (a bunch of) IO for a given starget_sdev_user, and when there is
> no more IO, clear starget_sdev_user so someone else has a chance to run.
> 
> I'm trying to figure out a fix without adding any lock hiearchy. queue_lock
> protects device_busy, but host_lock protects starget_sdev_user.

Anything I can do to help (if only to test when you have an idea)?

> The single lun devices and code are annoying :-(
> 
> Do you know if the single lun code is for performance, or because of
> hardware limitations - that is we don't want a disc change between IO, or
> does the device just fail?

In the case of the NEC PD/CD, it is a hardware limitation. You can only
have one disc inserted at a time, either CD or PD. Mixing IO will fail
for the absent media type.

Regards, Willem Riede.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-28 17:36     ` Willem Riede
@ 2004-01-28 17:53       ` Patrick Mansfield
  2004-01-28 18:33         ` Willem Riede
  2004-01-28 18:37         ` Matthew Wilcox
  0 siblings, 2 replies; 8+ messages in thread
From: Patrick Mansfield @ 2004-01-28 17:53 UTC (permalink / raw)
  To: Willem Riede; +Cc: linux-scsi

On Wed, Jan 28, 2004 at 12:36:09PM -0500, Willem Riede wrote:
> On 2004.01.28 11:41, Patrick Mansfield wrote:

> > Do you know if the single lun code is for performance, or because of
> > hardware limitations - that is we don't want a disc change between IO, or
> > does the device just fail?
> 
> In the case of the NEC PD/CD, it is a hardware limitation. You can only
> have one disc inserted at a time, either CD or PD. Mixing IO will fail
> for the absent media type.

BTW what is PD?

So there is one physical drive that can hold one disc, and it can only use
the LUN associated with the disc that is currently in the drive; and reading
or accessing the wrong LUN will always fail?

We don't need single_lun code for that, we can just let the IO fail, right?

I assume there are other single_lun devices that automatically change discs.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-28 17:53       ` Patrick Mansfield
@ 2004-01-28 18:33         ` Willem Riede
  2004-01-28 18:37         ` Matthew Wilcox
  1 sibling, 0 replies; 8+ messages in thread
From: Willem Riede @ 2004-01-28 18:33 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: linux-scsi

On 2004.01.28 12:53, Patrick Mansfield wrote:
> On Wed, Jan 28, 2004 at 12:36:09PM -0500, Willem Riede wrote:
> > On 2004.01.28 11:41, Patrick Mansfield wrote:
> 
> > > Do you know if the single lun code is for performance, or because of
> > > hardware limitations - that is we don't want a disc change between IO, or
> > > does the device just fail?
> > 
> > In the case of the NEC PD/CD, it is a hardware limitation. You can only
> > have one disc inserted at a time, either CD or PD. Mixing IO will fail
> > for the absent media type.
> 
> BTW what is PD?

PD stands for PowerDrive, it uses 650MB (CD size) cartridges that Panasonic
invented before there were writable CDs. I assume they're a type of magneto-
optical device.
 
> So there is one physical drive that can hold one disc, and it can only use
> the LUN associated with the disc that is currently in the drive; and reading
> or accessing the wrong LUN will always fail?
> 
> We don't need single_lun code for that, we can just let the IO fail, right?

That would be no problem. All the single-lun code does is delay the inevitable:
if I address the /dev/scdX which corresponds to the /dev/sdY that's the PD disk
when it's loaded, it _will_ fail (after all IO to the PD's lun finishes).

> I assume there are other single_lun devices that automatically change discs.

Yes, there would be, like CD changers. (That reminds me, I've got one of
those lying around too - I just don't remember how broken it was when I
removed it...)

Regards, Willem Riede.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344
  2004-01-28 17:53       ` Patrick Mansfield
  2004-01-28 18:33         ` Willem Riede
@ 2004-01-28 18:37         ` Matthew Wilcox
  1 sibling, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2004-01-28 18:37 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Willem Riede, linux-scsi

On Wed, Jan 28, 2004 at 09:53:41AM -0800, Patrick Mansfield wrote:
> BTW what is PD?

I've got a PD drive myself (SCSI, not IDE).  I believe it stands for
Plasmon Drive.  There's one slot on the front of the 5.25" unit which
will take either CDs or PDs.  It shows up as two LUNs.  The PD discs
are rewritable but never achieved much mass-market acceptance.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] fix badness in scsi_single_lun_run
  2004-01-28  3:30 ` Willem Riede
  2004-01-28 16:41   ` Patrick Mansfield
@ 2004-01-30 20:14   ` Patrick Mansfield
  1 sibling, 0 replies; 8+ messages in thread
From: Patrick Mansfield @ 2004-01-30 20:14 UTC (permalink / raw)
  To: Willem Riede, James Bottomley; +Cc: linux-scsi

Willem - the WARN_ON is bogus. 

James - can you please apply?

This patch against recent mainline bk removes the bogus WARN_ON for
single_lun devices, and a meaningless comment.

We clear the starget_sdev_user, and immediately blk_run_queue for the LUN
that just issued IO. Another LUN could race in scsi_request_fn, but it is
most likely that the last user will get there first, and reset
starget_sdev_user. If it does not, it will have to wait for the other LUN
to finish all of its IO.

===== drivers/scsi/scsi_lib.c 1.115 vs edited =====
--- 1.115/drivers/scsi/scsi_lib.c	Sat Nov 22 16:20:45 2003
+++ edited/drivers/scsi/scsi_lib.c	Fri Jan 30 11:18:39 2004
@@ -340,7 +340,6 @@
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	WARN_ON(!current_sdev->sdev_target->starget_sdev_user);
 	current_sdev->sdev_target->starget_sdev_user = NULL;
 	spin_unlock_irqrestore(shost->host_lock, flags);
 
@@ -352,10 +351,6 @@
 	 */
 	blk_run_queue(current_sdev->request_queue);
 
-	/*
-	 * After unlock, this races with anyone clearing starget_sdev_user,
-	 * but we always enter this function again, avoiding any problems.
-	 */
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (current_sdev->sdev_target->starget_sdev_user)
 		goto out;

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-01-30 20:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-27  0:32 Badness in scsi_single_lun_run at /root/scsi/scsi_lib.c:344 Willem Riede
2004-01-28  3:30 ` Willem Riede
2004-01-28 16:41   ` Patrick Mansfield
2004-01-28 17:36     ` Willem Riede
2004-01-28 17:53       ` Patrick Mansfield
2004-01-28 18:33         ` Willem Riede
2004-01-28 18:37         ` Matthew Wilcox
2004-01-30 20:14   ` [PATCH] fix badness in scsi_single_lun_run Patrick Mansfield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox