From: Chris Webb <chris@arachsys.com>
To: James Bottomley <James.Bottomley@suse.de>
Cc: linux-scsi@vger.kernel.org
Subject: Re: oops during scsi scanning disk setup
Date: Sat, 22 Aug 2009 12:55:35 +0100 [thread overview]
Message-ID: <20090822115535.GB1976@arachsys.com> (raw)
In-Reply-To: <1250869674.7363.89.camel@mulgrave.site>
James Bottomley <James.Bottomley@suse.de> writes:
> Can you try this as a partial fix? (It should prevent the oops, but
> you'll still lose the disk).
Hi James. Thanks for patch. I've applied this, although the context is quite
a bit different in the released 2.6.30.x from your patch against head. (E.g.
in sd_probe, there's no get_device(&sdp->sdev_gendev) at all before the
async_schedule(). Instead that happens in sd_probe_async.)
I'm now seeing a warning backtrace for every scsi attach in the machine,
including the main system hard drives, so I think something's not quite
right. For instance, in my test virtual machine:
scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
Intel(R) PRO/1000 Network Driver - version 7.3.21-k3-NAPI
Copyright (c) 1999-2006 Intel Corporation.
ata1.01: NODEV after polling detection
ata1.00: ATA-7: QEMU HARDDISK, 0.10.6, max UDMA/100
ata1.00: 20971520 sectors, multi 16: LBA48
ata1.00: configured for MWDMA2
scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 0.10 PQ: 0 ANSI: 5
------------[ cut here ]------------
WARNING: at lib/kref.c:43 kref_get+0x23/0x2d()
Hardware name:
Modules linked in:
Pid: 578, comm: async/0 Not tainted 2.6.30.4-elastic-lon-p #3
Call Trace:
[<ffffffff80419d84>] ? vgacon_set_cursor_size+0xfd/0x109
[<ffffffff80257fa5>] warn_slowpath_common+0x77/0x8f
[<ffffffff80257fcc>] warn_slowpath_null+0xf/0x11
[<ffffffff803fbfb6>] kref_get+0x23/0x2d
[<ffffffff803fb167>] kobject_get+0x1a/0x22
[<ffffffff804708c1>] get_device+0x14/0x1a
[<ffffffff80493d56>] sd_probe+0x1b7/0x21d
[<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
[<ffffffff80473b54>] __device_attach+0x35/0x3a
[<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
[<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
[<ffffffff80473be1>] device_attach+0x5e/0x75
[<ffffffff80472e3c>] bus_attach_device+0x26/0x58
[<ffffffff80471a5d>] device_add+0x3ff/0x562
[<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
[<ffffffff80482f72>] scsi_probe_and_add_lun+0x910/0xa32
[<ffffffff80483e98>] __scsi_add_device+0xb3/0xdf
[<ffffffff804a104d>] ata_scsi_scan_host+0x74/0x16e
[<ffffffff8026b1c3>] ? autoremove_wake_function+0x0/0x34
[<ffffffff8049f3b8>] async_port_probe+0xab/0xb3
[<ffffffff80270482>] async_thread+0x10c/0x20d
[<ffffffff802545ff>] ? default_wake_function+0x0/0xf
[<ffffffff80270376>] ? async_thread+0x0/0x20d
[<ffffffff8026ad89>] kthread+0x55/0x80
[<ffffffff8022be6a>] child_rip+0xa/0x20
[<ffffffff8026ad34>] ? kthread+0x0/0x80
[<ffffffff8022be60>] ? child_rip+0x0/0x20
---[ end trace cce8275f5d03fa65 ]---
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] 20971520 512-byte hardware sectors: (10.7 GB/10.0 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
[...]
scsi2 : iSCSI Initiator over TCP/IP
scsi 2:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
scsi 2:0:0:0: Attached scsi generic sg1 type 12
scsi 2:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
------------[ cut here ]------------
WARNING: at lib/kref.c:43 kref_get+0x23/0x2d()
Hardware name:
Modules linked in:
Pid: 1156, comm: iscsid Tainted: G W 2.6.30.4-elastic-lon-p #3
Call Trace:
[<ffffffff80419d84>] ? vgacon_set_cursor_size+0xfd/0x109
[<ffffffff80257fa5>] warn_slowpath_common+0x77/0x8f
[<ffffffff80257fcc>] warn_slowpath_null+0xf/0x11
[<ffffffff803fbfb6>] kref_get+0x23/0x2d
[<ffffffff803fb167>] kobject_get+0x1a/0x22
[<ffffffff804708c1>] get_device+0x14/0x1a
[<ffffffff80493d56>] sd_probe+0x1b7/0x21d
[<ffffffff80473a1e>] driver_probe_device+0x9a/0x11f
[<ffffffff80473b54>] __device_attach+0x35/0x3a
[<ffffffff80473b1f>] ? __device_attach+0x0/0x3a
[<ffffffff80472fd4>] bus_for_each_drv+0x51/0x88
[<ffffffff80473be1>] device_attach+0x5e/0x75
[<ffffffff80472e3c>] bus_attach_device+0x26/0x58
[<ffffffff80471a5d>] device_add+0x3ff/0x562
[<ffffffff80485104>] scsi_sysfs_add_sdev+0xb5/0x252
[<ffffffff80482f72>] scsi_probe_and_add_lun+0x910/0xa32
[<ffffffff8048363c>] __scsi_scan_target+0x3a5/0x542
[<ffffffff8029e08d>] ? zone_statistics+0x60/0x65
[<ffffffff80293369>] ? get_page_from_freelist+0x4ad/0x67a
[<ffffffff80483dce>] scsi_scan_target+0x97/0xae
[<ffffffff80487c3b>] iscsi_user_scan_session+0xcd/0xe4
[<ffffffff80487b6e>] ? iscsi_user_scan_session+0x0/0xe4
[<ffffffff80470f95>] device_for_each_child+0x35/0x6c
[<ffffffff80487b53>] iscsi_user_scan+0x28/0x2a
[<ffffffff8048471c>] store_scan+0x9b/0xc6
[<ffffffff80470765>] dev_attr_store+0x1b/0x1d
[<ffffffff8030b61d>] sysfs_write_file+0xf2/0x12e
[<ffffffff802c1711>] vfs_write+0xad/0x129
[<ffffffff802c1846>] sys_write+0x45/0x6c
[<ffffffff8022aeeb>] system_call_fastpath+0x16/0x1b
---[ end trace cce8275f5d03fa67 ]---
sd 2:0:0:1: Attached scsi generic sg2 type 0
sd 2:0:0:1: [sdb] 10485760 512-byte hardware sectors: (5.36 GB/5.00 GiB)
sd 2:0:0:1: [sdb] Write Protect is off
sd 2:0:0:1: [sdb] Mode Sense: 79 00 00 08
sd 2:0:0:1: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
sdb: unknown partition table
sd 2:0:0:1: [sdb] Attached SCSI disk
[...]
etc.
> As for a printk, there's no real way to do that. What I did was make sure
> we take a reference to the scsi disk. Holding that reference should
> prevent us from losing the partition table ... but the issue itself is
> legitimate (add racing with remove), and there's not really a good way of
> detecting it.
I was thinking of a debug hack like
if (atomic_read(&sdkp->dev.kobj.kref.refcount) < 2)
printk("James' patch has just protected us from a crash: send him a beer\n");
just before
put_device(&sdkp->dev);
in sd_probe_async(). I know the refcount could still drop between the
atomic_read and put_device, but we wouldn't have crashed in that case anyway
and at least if we do see the message over the next few days in our kernel
logs, I could definitely confirm your theory. Otherwise, given it's such a
rare crash, I might not know whether or not we've just been lucky for a
couple of weeks!
Best wishes,
Chris.
next prev parent reply other threads:[~2009-08-22 11:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-20 18:05 oops during scsi scanning disk setup Chris Webb
2009-08-20 20:01 ` Matthew Wilcox
2009-08-20 20:10 ` Yinghai Lu
2009-08-21 4:26 ` Arjan van de Ven
2009-08-20 22:26 ` James Bottomley
2009-08-21 8:16 ` Chris Webb
2009-08-21 8:33 ` Chris Webb
2009-08-21 9:23 ` Chris Webb
2009-08-21 14:00 ` James Bottomley
2009-08-21 14:51 ` Chris Webb
2009-08-21 15:47 ` James Bottomley
2009-08-21 22:59 ` Chris Webb
2009-08-21 23:39 ` James Bottomley
2009-08-22 11:55 ` Chris Webb [this message]
2009-08-22 14:56 ` James Bottomley
2009-08-22 15:50 ` Chris Webb
2009-09-05 16:45 ` Chris Webb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090822115535.GB1976@arachsys.com \
--to=chris@arachsys.com \
--cc=James.Bottomley@suse.de \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).