linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: at drivers/scsi/scsi_lib.c:1704
@ 2011-11-07  6:24 Stephen Rothwell
  2011-11-07  7:33 ` Bart Van Assche
  2011-11-07 14:51 ` WARNING: at drivers/scsi/scsi_lib.c:1704 James Bottomley
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Rothwell @ 2011-11-07  6:24 UTC (permalink / raw)
  To: LKML; +Cc: Bart Van Assche, James Bottomley, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 3541 bytes --]

Hi all,

Starting some time last week I am getting the following during boot on
our PPC970 blade:

calling  .ipr_init+0x0/0x68 @ 1
ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011)
ipr 0000:01:01.0: Found IOA with IRQ: 26
ipr 0000:01:01.0: Starting IOA initialization sequence.
ipr 0000:01:01.0: Adapter firmware version: 06160039
ipr 0000:01:01.0: IOA initialized.
scsi0 : IBM 572E Storage Adapter
------------[ cut here ]------------
WARNING: at drivers/scsi/scsi_lib.c:1704
Modules linked in:
NIP: c00000000053b3d4 LR: c00000000053e5b0 CTR: c000000000541d70
REGS: c0000000783c2f60 TRAP: 0700   Not tainted  (3.1.0-autokern1)
MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24002024  XER: 20000002
TASK = c0000000783b8000[1] 'swapper' THREAD: c0000000783c0000 CPU: 0
GPR00: 0000000000000001 c0000000783c31e0 c000000000cf38b0 c00000000239a9d0 
GPR04: c000000000cbe8f8 0000000000000000 c0000000783c3040 0000000000000000 
GPR08: c000000075daf488 c000000078a3b7ff c000000000bcacc8 0000000000000000 
GPR12: 0000000044002028 c000000007ffb000 0000000002e40000 000000000099b800 
GPR16: 0000000000000000 c000000000bba5fc c000000000a61db8 0000000000000000 
GPR20: 0000000001b77200 0000000000000000 c000000078990000 0000000000000001 
GPR24: c000000002396828 0000000000000000 0000000000000000 c000000078a3b938 
GPR28: fffffffffffffffa c0000000008ad2c0 c000000000c7faa8 c00000000239a9d0 
NIP [c00000000053b3d4] .scsi_free_queue+0x24/0x90
LR [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
Call Trace:
[c0000000783c31e0] [c000000000c7faa8] wireless_seq_fops+0x278d0/0x2eb88 (unreliable)
[c0000000783c3270] [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
[c0000000783c3330] [c00000000053eba0] .scsi_probe_and_add_lun+0x390/0xb40
[c0000000783c34a0] [c00000000053f7ec] .__scsi_scan_target+0x16c/0x650
[c0000000783c35f0] [c00000000053fd90] .scsi_scan_channel+0xc0/0x100
[c0000000783c36a0] [c00000000053fefc] .scsi_scan_host_selected+0x12c/0x1c0
[c0000000783c3750] [c00000000083dcb4] .ipr_probe+0x2c0/0x390
[c0000000783c3830] [c0000000003f50b4] .local_pci_probe+0x34/0x50
[c0000000783c38a0] [c0000000003f5f78] .pci_device_probe+0x148/0x150
[c0000000783c3950] [c0000000004e1e8c] .driver_probe_device+0xdc/0x210
[c0000000783c39f0] [c0000000004e20cc] .__driver_attach+0x10c/0x110
[c0000000783c3a80] [c0000000004e1228] .bus_for_each_dev+0x98/0xf0
[c0000000783c3b30] [c0000000004e1bf8] .driver_attach+0x28/0x40
[c0000000783c3bb0] [c0000000004e07d8] .bus_add_driver+0x218/0x340
[c0000000783c3c60] [c0000000004e2a2c] .driver_register+0x9c/0x1b0
[c0000000783c3d00] [c0000000003f62d4] .__pci_register_driver+0x64/0x140
[c0000000783c3da0] [c000000000b99f88] .ipr_init+0x4c/0x68
[c0000000783c3e20] [c00000000000ad24] .do_one_initcall+0x1a4/0x1e0
[c0000000783c3ee0] [c000000000b512d0] .kernel_init+0x14c/0x1fc
[c0000000783c3f90] [c000000000022468] .kernel_thread+0x54/0x70
Instruction dump:
ebe1fff8 7c0803a6 4e800020 7c0802a6 fba1ffe8 fbe1fff8 7c7f1b78 f8010010 
f821ff71 e8030398 3120ffff 7c090110 <0b000000> e86303b0 482de065 60000000 
---[ end trace 759bed76a85e8dec ]---
scsi 0:0:1:0: Direct-Access     IBM-ESXS MAY2036RC        T106 PQ: 0 ANSI: 5
------------[ cut here ]------------

I get lots more of these.  The obvious commit to point the finger at
is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
commands") but the root cause may be something different.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-07  6:24 WARNING: at drivers/scsi/scsi_lib.c:1704 Stephen Rothwell
@ 2011-11-07  7:33 ` Bart Van Assche
  2011-11-08  1:36   ` [PATCH] [SCSI] scsi_alloc_sdev() trips WARN_ON in scsi_free_queue() Anton Blanchard
  2011-11-07 14:51 ` WARNING: at drivers/scsi/scsi_lib.c:1704 James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2011-11-07  7:33 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: LKML, James Bottomley, linux-scsi, Anton Blanchard

On Mon, Nov 7, 2011 at 7:24 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> I get lots more of these.  The obvious commit to point the finger at
> is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
> commands") but the root cause may be something different.

It's probably the combination of commit 3308511c93e6 and patch
http://marc.info/?l=linux-scsi&m=132027101521282 that's triggering
that warning: the first commit changed scsi_free_queue() such that it
expects that its callers reset the queuedata pointer but the second
patch introduced a new scsi_free_queue() call that doesn't reset the
queuedata pointer first.

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-07  6:24 WARNING: at drivers/scsi/scsi_lib.c:1704 Stephen Rothwell
  2011-11-07  7:33 ` Bart Van Assche
@ 2011-11-07 14:51 ` James Bottomley
  2011-11-08  0:03   ` Stephen Rothwell
  2011-11-10 15:51   ` Steffen Maier
  1 sibling, 2 replies; 12+ messages in thread
From: James Bottomley @ 2011-11-07 14:51 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: LKML, Bart Van Assche, linux-scsi

On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Starting some time last week I am getting the following during boot on
> our PPC970 blade:
> 
> calling  .ipr_init+0x0/0x68 @ 1
> ipr: IBM Power RAID SCSI Device Driver version: 2.5.2 (April 27, 2011)
> ipr 0000:01:01.0: Found IOA with IRQ: 26
> ipr 0000:01:01.0: Starting IOA initialization sequence.
> ipr 0000:01:01.0: Adapter firmware version: 06160039
> ipr 0000:01:01.0: IOA initialized.
> scsi0 : IBM 572E Storage Adapter
> ------------[ cut here ]------------
> WARNING: at drivers/scsi/scsi_lib.c:1704
> Modules linked in:
> NIP: c00000000053b3d4 LR: c00000000053e5b0 CTR: c000000000541d70
> REGS: c0000000783c2f60 TRAP: 0700   Not tainted  (3.1.0-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24002024  XER: 20000002
> TASK = c0000000783b8000[1] 'swapper' THREAD: c0000000783c0000 CPU: 0
> GPR00: 0000000000000001 c0000000783c31e0 c000000000cf38b0 c00000000239a9d0 
> GPR04: c000000000cbe8f8 0000000000000000 c0000000783c3040 0000000000000000 
> GPR08: c000000075daf488 c000000078a3b7ff c000000000bcacc8 0000000000000000 
> GPR12: 0000000044002028 c000000007ffb000 0000000002e40000 000000000099b800 
> GPR16: 0000000000000000 c000000000bba5fc c000000000a61db8 0000000000000000 
> GPR20: 0000000001b77200 0000000000000000 c000000078990000 0000000000000001 
> GPR24: c000000002396828 0000000000000000 0000000000000000 c000000078a3b938 
> GPR28: fffffffffffffffa c0000000008ad2c0 c000000000c7faa8 c00000000239a9d0 
> NIP [c00000000053b3d4] .scsi_free_queue+0x24/0x90
> LR [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
> Call Trace:
> [c0000000783c31e0] [c000000000c7faa8] wireless_seq_fops+0x278d0/0x2eb88 (unreliable)
> [c0000000783c3270] [c00000000053e5b0] .scsi_alloc_sdev+0x280/0x2e0
> [c0000000783c3330] [c00000000053eba0] .scsi_probe_and_add_lun+0x390/0xb40
> [c0000000783c34a0] [c00000000053f7ec] .__scsi_scan_target+0x16c/0x650
> [c0000000783c35f0] [c00000000053fd90] .scsi_scan_channel+0xc0/0x100
> [c0000000783c36a0] [c00000000053fefc] .scsi_scan_host_selected+0x12c/0x1c0
> [c0000000783c3750] [c00000000083dcb4] .ipr_probe+0x2c0/0x390
> [c0000000783c3830] [c0000000003f50b4] .local_pci_probe+0x34/0x50
> [c0000000783c38a0] [c0000000003f5f78] .pci_device_probe+0x148/0x150
> [c0000000783c3950] [c0000000004e1e8c] .driver_probe_device+0xdc/0x210
> [c0000000783c39f0] [c0000000004e20cc] .__driver_attach+0x10c/0x110
> [c0000000783c3a80] [c0000000004e1228] .bus_for_each_dev+0x98/0xf0
> [c0000000783c3b30] [c0000000004e1bf8] .driver_attach+0x28/0x40
> [c0000000783c3bb0] [c0000000004e07d8] .bus_add_driver+0x218/0x340
> [c0000000783c3c60] [c0000000004e2a2c] .driver_register+0x9c/0x1b0
> [c0000000783c3d00] [c0000000003f62d4] .__pci_register_driver+0x64/0x140
> [c0000000783c3da0] [c000000000b99f88] .ipr_init+0x4c/0x68
> [c0000000783c3e20] [c00000000000ad24] .do_one_initcall+0x1a4/0x1e0
> [c0000000783c3ee0] [c000000000b512d0] .kernel_init+0x14c/0x1fc
> [c0000000783c3f90] [c000000000022468] .kernel_thread+0x54/0x70
> Instruction dump:
> ebe1fff8 7c0803a6 4e800020 7c0802a6 fba1ffe8 fbe1fff8 7c7f1b78 f8010010 
> f821ff71 e8030398 3120ffff 7c090110 <0b000000> e86303b0 482de065 60000000 
> ---[ end trace 759bed76a85e8dec ]---
> scsi 0:0:1:0: Direct-Access     IBM-ESXS MAY2036RC        T106 PQ: 0 ANSI: 5
> ------------[ cut here ]------------
> 
> I get lots more of these.  The obvious commit to point the finger at
> is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
> commands") but the root cause may be something different.

Actually, I don't think it's anything to do with this: it's Anton's
fault

commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
Author: Anton Blanchard <anton@samba.org>
Date:   Thu Nov 3 08:56:22 2011 +1100

    [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev

Doesn't completely do the teardown.  The true fix is to do a proper
teardown instead of hand rolling it.  Does this fix it for you?

James

---
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 72273a0..b3c6d95 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -319,11 +319,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 	return sdev;
 
 out_device_destroy:
-	scsi_device_set_state(sdev, SDEV_DEL);
-	transport_destroy_device(&sdev->sdev_gendev);
-	put_device(&sdev->sdev_dev);
-	scsi_free_queue(sdev->request_queue);
-	put_device(&sdev->sdev_gendev);
+	__scsi_remove_device(sdev);
 out:
 	if (display_failure_msg)
 		printk(ALLOC_FAILURE_MSG, __func__);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-07 14:51 ` WARNING: at drivers/scsi/scsi_lib.c:1704 James Bottomley
@ 2011-11-08  0:03   ` Stephen Rothwell
  2011-11-08  0:36     ` Stephen Rothwell
                       ` (2 more replies)
  2011-11-10 15:51   ` Steffen Maier
  1 sibling, 3 replies; 12+ messages in thread
From: Stephen Rothwell @ 2011-11-08  0:03 UTC (permalink / raw)
  To: James Bottomley; +Cc: LKML, Bart Van Assche, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 764 bytes --]

Hi James,

On Mon, 07 Nov 2011 08:51:24 -0600 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> Actually, I don't think it's anything to do with this: it's Anton's
> fault
> 
> commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
> Author: Anton Blanchard <anton@samba.org>
> Date:   Thu Nov 3 08:56:22 2011 +1100
> 
>     [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
> 
> Doesn't completely do the teardown.  The true fix is to do a proper
> teardown instead of hand rolling it.  Does this fix it for you?

I don't get the WARNING any more, but now get lots of:

scsi: killing requests for dead queue

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-08  0:03   ` Stephen Rothwell
@ 2011-11-08  0:36     ` Stephen Rothwell
  2011-11-08  7:07     ` Bart Van Assche
  2011-11-08 15:30     ` Hannes Reinecke
  2 siblings, 0 replies; 12+ messages in thread
From: Stephen Rothwell @ 2011-11-08  0:36 UTC (permalink / raw)
  To: James Bottomley; +Cc: LKML, Bart Van Assche, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Tue, 8 Nov 2011 11:03:00 +1100 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 07 Nov 2011 08:51:24 -0600 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >
> > Actually, I don't think it's anything to do with this: it's Anton's
> > fault
> > 
> > commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
> > Author: Anton Blanchard <anton@samba.org>
> > Date:   Thu Nov 3 08:56:22 2011 +1100
> > 
> >     [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
> > 
> > Doesn't completely do the teardown.  The true fix is to do a proper
> > teardown instead of hand rolling it.  Does this fix it for you?
> 
> I don't get the WARNING any more, but now get lots of:
> 
> scsi: killing requests for dead queue

also, ipr_init takes 494857993 usecs(!) which is a bit suboptimal.  And I
get a few

INFO: rcu_sched detected stall ...

while ipr_init is in progress.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] [SCSI] scsi_alloc_sdev() trips WARN_ON in scsi_free_queue()
  2011-11-07  7:33 ` Bart Van Assche
@ 2011-11-08  1:36   ` Anton Blanchard
  0 siblings, 0 replies; 12+ messages in thread
From: Anton Blanchard @ 2011-11-08  1:36 UTC (permalink / raw)
  To: Bart Van Assche, Stephen Rothwell, James Bottomley; +Cc: LKML, linux-scsi


commit 3308511c93e6 ([SCSI] Make scsi_free_queue() kill pending SCSI
commands) requires us to reset the queuedata pointer before calling
scsi_free_queue.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Thanks for the tip Bart, does this look good?

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 72273a0..ebb1206 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -322,6 +322,7 @@ out_device_destroy:
 	scsi_device_set_state(sdev, SDEV_DEL);
 	transport_destroy_device(&sdev->sdev_gendev);
 	put_device(&sdev->sdev_dev);
+	sdev->request_queue->queuedata = NULL;
 	scsi_free_queue(sdev->request_queue);
 	put_device(&sdev->sdev_gendev);
 out:

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-08  0:03   ` Stephen Rothwell
  2011-11-08  0:36     ` Stephen Rothwell
@ 2011-11-08  7:07     ` Bart Van Assche
  2011-11-08  8:49       ` Stephen Rothwell
  2011-11-08 15:30     ` Hannes Reinecke
  2 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2011-11-08  7:07 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: James Bottomley, LKML, linux-scsi

On Tue, Nov 8, 2011 at 1:03 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> On Mon, 07 Nov 2011 08:51:24 -0600 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> The true fix is to do a proper
>> teardown instead of hand rolling it.  Does this fix it for you?
>
> I don't get the WARNING any more, but now get lots of:
>
> scsi: killing requests for dead queue

This patch should help: http://www.spinics.net/lists/linux-scsi/msg55373.html.

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-08  7:07     ` Bart Van Assche
@ 2011-11-08  8:49       ` Stephen Rothwell
  2011-11-16 20:58         ` Luis Henriques
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Rothwell @ 2011-11-08  8:49 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: James Bottomley, LKML, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 749 bytes --]

On Tue, 8 Nov 2011 08:07:37 +0100 Bart Van Assche <bvanassche@acm.org> wrote:
>
> On Tue, Nov 8, 2011 at 1:03 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > On Mon, 07 Nov 2011 08:51:24 -0600 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >> The true fix is to do a proper
> >> teardown instead of hand rolling it.  Does this fix it for you?
> >
> > I don't get the WARNING any more, but now get lots of:
> >
> > scsi: killing requests for dead queue
> 
> This patch should help: http://www.spinics.net/lists/linux-scsi/msg55373.html.

Thanks.  Will that be added to the scsi{,-rc-fixes} trees soon?
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-08  0:03   ` Stephen Rothwell
  2011-11-08  0:36     ` Stephen Rothwell
  2011-11-08  7:07     ` Bart Van Assche
@ 2011-11-08 15:30     ` Hannes Reinecke
  2 siblings, 0 replies; 12+ messages in thread
From: Hannes Reinecke @ 2011-11-08 15:30 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: James Bottomley, LKML, Bart Van Assche, linux-scsi

On 11/08/2011 01:03 AM, Stephen Rothwell wrote:
> Hi James,
>
> On Mon, 07 Nov 2011 08:51:24 -0600 James Bottomley<James.Bottomley@HansenPartnership.com>  wrote:
>>
>> Actually, I don't think it's anything to do with this: it's Anton's
>> fault
>>
>> commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
>> Author: Anton Blanchard<anton@samba.org>
>> Date:   Thu Nov 3 08:56:22 2011 +1100
>>
>>      [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
>>
>> Doesn't completely do the teardown.  The true fix is to do a proper
>> teardown instead of hand rolling it.  Does this fix it for you?
>
> I don't get the WARNING any more, but now get lots of:
>
> scsi: killing requests for dead queue
>
Yeah, that is really annoying.
And what's more the printk is wrong, as it doesn't indicate
that we've actually killed requests. It's just announcing that we 
might start killing requests, provided the request_queue isn't empty.

I'll be sending a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-07 14:51 ` WARNING: at drivers/scsi/scsi_lib.c:1704 James Bottomley
  2011-11-08  0:03   ` Stephen Rothwell
@ 2011-11-10 15:51   ` Steffen Maier
  2011-11-10 16:06     ` James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Steffen Maier @ 2011-11-10 15:51 UTC (permalink / raw)
  To: James Bottomley; +Cc: Stephen Rothwell, LKML, Bart Van Assche, linux-scsi

On 11/07/2011 03:51 PM, James Bottomley wrote:
> On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote:

>> WARNING: at drivers/scsi/scsi_lib.c:1704

>> I get lots more of these.  The obvious commit to point the finger at
>> is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
>> commands") but the root cause may be something different.
>
> Actually, I don't think it's anything to do with this: it's Anton's
> fault
>
> commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
> Author: Anton Blanchard<anton@samba.org>
> Date:   Thu Nov 3 08:56:22 2011 +1100
>
>      [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
>
> Doesn't completely do the teardown.  The true fix is to do a proper
> teardown instead of hand rolling it.  Does this fix it for you?
>
> James
>
> ---
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 72273a0..b3c6d95 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -319,11 +319,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
>   	return sdev;
>
>   out_device_destroy:
> -	scsi_device_set_state(sdev, SDEV_DEL);
> -	transport_destroy_device(&sdev->sdev_gendev);
> -	put_device(&sdev->sdev_dev);
> -	scsi_free_queue(sdev->request_queue);
> -	put_device(&sdev->sdev_gendev);
> +	__scsi_remove_device(sdev);
>   out:
>   	if (display_failure_msg)
>   		printk(ALLOC_FAILURE_MSG, __func__);

James, is it OK that __scsi_remove_device() now also calls 
sdev->host->hostt->slave_destroy(sdev) which wasn't there before?

I cannot prove it yet, but with this patch and some asorted others on 
top of 3.1 our zfcp LLD gets called with an sdev argument that was freed 
before or at least before dereferencing (found with DEBUG_PAGEALLOC).

Steffen

Linux on System z Development

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-10 15:51   ` Steffen Maier
@ 2011-11-10 16:06     ` James Bottomley
  0 siblings, 0 replies; 12+ messages in thread
From: James Bottomley @ 2011-11-10 16:06 UTC (permalink / raw)
  To: Steffen Maier; +Cc: Stephen Rothwell, LKML, Bart Van Assche, linux-scsi

On Thu, 2011-11-10 at 16:51 +0100, Steffen Maier wrote:
> On 11/07/2011 03:51 PM, James Bottomley wrote:
> > On Mon, 2011-11-07 at 17:24 +1100, Stephen Rothwell wrote:
> 
> >> WARNING: at drivers/scsi/scsi_lib.c:1704
> 
> >> I get lots more of these.  The obvious commit to point the finger at
> >> is 3308511c93e6 ("[SCSI] Make scsi_free_queue() kill pending SCSI
> >> commands") but the root cause may be something different.
> >
> > Actually, I don't think it's anything to do with this: it's Anton's
> > fault
> >
> > commit f7c9c6bb14f3104608a3a83cadea10a6943d2804
> > Author: Anton Blanchard<anton@samba.org>
> > Date:   Thu Nov 3 08:56:22 2011 +1100
> >
> >      [SCSI] Fix block queue and elevator memory leak in scsi_alloc_sdev
> >
> > Doesn't completely do the teardown.  The true fix is to do a proper
> > teardown instead of hand rolling it.  Does this fix it for you?
> >
> > James
> >
> > ---
> > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> > index 72273a0..b3c6d95 100644
> > --- a/drivers/scsi/scsi_scan.c
> > +++ b/drivers/scsi/scsi_scan.c
> > @@ -319,11 +319,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
> >   	return sdev;
> >
> >   out_device_destroy:
> > -	scsi_device_set_state(sdev, SDEV_DEL);
> > -	transport_destroy_device(&sdev->sdev_gendev);
> > -	put_device(&sdev->sdev_dev);
> > -	scsi_free_queue(sdev->request_queue);
> > -	put_device(&sdev->sdev_gendev);
> > +	__scsi_remove_device(sdev);
> >   out:
> >   	if (display_failure_msg)
> >   		printk(ALLOC_FAILURE_MSG, __func__);
> 
> James, is it OK that __scsi_remove_device() now also calls 
> sdev->host->hostt->slave_destroy(sdev) which wasn't there before?
> 
> I cannot prove it yet, but with this patch and some asorted others on 
> top of 3.1 our zfcp LLD gets called with an sdev argument that was freed 
> before or at least before dereferencing (found with DEBUG_PAGEALLOC).

You can argue it either way, but we're supposed to pair slave_destroy
with slave_alloc and we already called the latter.

zfcp just unconditionally assumes that if destroy is called, the
allocation returned success.  This fixes it, doesn't it?

James

---

diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 11f07f8..4008ec0 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -55,8 +55,10 @@ static void zfcp_scsi_slave_destroy(struct scsi_device *sdev)
 {
 	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(sdev);
 
-	zfcp_erp_lun_shutdown_wait(sdev, "scssd_1");
-	put_device(&zfcp_sdev->port->dev);
+	if (zfcp_sdev->port) {
+		zfcp_erp_lun_shutdown_wait(sdev, "scssd_1");
+		put_device(&zfcp_sdev->port->dev);
+	}
 }
 
 static int zfcp_scsi_slave_configure(struct scsi_device *sdp)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: WARNING: at drivers/scsi/scsi_lib.c:1704
  2011-11-08  8:49       ` Stephen Rothwell
@ 2011-11-16 20:58         ` Luis Henriques
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Henriques @ 2011-11-16 20:58 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Bart Van Assche, James Bottomley, linux-kernel, linux-scsi

On Tue, 8 Nov 2011 19:49:39 +1100
Stephen Rothwell <sfr@canb.auug.org.au> wrote:

> > > I don't get the WARNING any more, but now get lots of:
> > >
> > > scsi: killing requests for dead queue
> > 
> > This patch should help:
> > http://www.spinics.net/lists/linux-scsi/msg55373.html.
> 
> Thanks.  Will that be added to the scsi{,-rc-fixes} trees soon?

Any idea when this will hit mainline?  Its really annoying having all
these messages on the logs...

Cheers,
--
Luis Henriques

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-11-16 20:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-07  6:24 WARNING: at drivers/scsi/scsi_lib.c:1704 Stephen Rothwell
2011-11-07  7:33 ` Bart Van Assche
2011-11-08  1:36   ` [PATCH] [SCSI] scsi_alloc_sdev() trips WARN_ON in scsi_free_queue() Anton Blanchard
2011-11-07 14:51 ` WARNING: at drivers/scsi/scsi_lib.c:1704 James Bottomley
2011-11-08  0:03   ` Stephen Rothwell
2011-11-08  0:36     ` Stephen Rothwell
2011-11-08  7:07     ` Bart Van Assche
2011-11-08  8:49       ` Stephen Rothwell
2011-11-16 20:58         ` Luis Henriques
2011-11-08 15:30     ` Hannes Reinecke
2011-11-10 15:51   ` Steffen Maier
2011-11-10 16:06     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).