From: Andrey Zonov <andrey.zonov@gmail.com>
To: linux-scsi@vger.kernel.org
Subject: SCSI: race condition between scsi_remove_target and scsi_probe_and_add_lun
Date: Thu, 20 Mar 2014 22:12:17 -0700 [thread overview]
Message-ID: <532BCA31.20902@gmail.com> (raw)
Hi,
I've got kernel panic on my box which works as FibreChannel initiator.
I was able to reproduce this panic by setting dev_loss_tmo=2 and
enabling/disabling ports every 5 seconds on the switch in 5 minutes. I
added some debug points in the kernel code and that's what I've got so far:
1. system is inserting new device into __devices list
DEBUG: scsi_sysfs_device_initialize(): sdev=ffff88046a931000 7:0:5:0
Pid: 910, comm: kworker/u:2 Tainted: P O 3.2.48-swt9004 #33
Call Trace:
[<ffffffff81245c42>] ? scsi_alloc_sdev+0x1d2/0x240
[<ffffffff8123d4bd>] ? scsi_device_lookup_by_target+0x8d/0xc0
[<ffffffff8124623a>] ? scsi_probe_and_add_lun+0x42a/0xb20
[<ffffffff811acb7d>] ? kobject_set_name_vargs+0x6d/0x80
[<ffffffff81230b4f>] ? dev_set_name+0x3f/0x50
[<ffffffff811ac782>] ? kobject_get+0x12/0x20
[<ffffffffa000b3e4>] ? fc_host_match+0x14/0x70 [scsi_transport_fc]
[<ffffffff8123726f>] ? attribute_container_add_device+0x4f/0x160
[<ffffffff811ac782>] ? kobject_get+0x12/0x20
[<ffffffff812304e4>] ? get_device+0x14/0x20
[<ffffffff812459c5>] ? scsi_alloc_target+0x295/0x2d0
[<ffffffff81230bca>] ? device_release+0x1a/0x80
[<ffffffff81246bce>] ? __scsi_scan_target+0xce/0x5f0
[<ffffffff8102ec22>] ? dequeue_task_fair+0x52/0x150
[<ffffffff8139a91d>] ? __schedule+0x25d/0x7d0
[<ffffffff812477b6>] ? scsi_scan_target+0xc6/0xe0
[<ffffffffa000e75f>] ? fc_scsi_scan_rport+0xaf/0xc0 [scsi_transport_fc]
[<ffffffff8104eb06>] ? process_one_work+0x116/0x3a0
[<ffffffff8104f1ec>] ? worker_thread+0x14c/0x400
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff81052f06>] ? kthread+0x96/0xa0
[<ffffffff8139f134>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81052e70>] ? kthread_worker_fn+0x120/0x120
[<ffffffff8139f130>] ? gs_change+0xb/0xb
2. later in scsi_probe_and_add_lun() this device is removing
DEBUG: __scsi_remove_device(): sdev=ffff88046a931000 7:0:5:0
Pid: 910, comm: kworker/u:2 Tainted: P O 3.2.48-swt9004 #33
Call Trace:
[<ffffffff81248c66>] ? __scsi_remove_device+0x46/0x110
[<ffffffff81246268>] ? scsi_probe_and_add_lun+0x458/0xb20
[<ffffffff81230b4f>] ? dev_set_name+0x3f/0x50
[<ffffffff811ac782>] ? kobject_get+0x12/0x20
[<ffffffff812459c5>] ? scsi_alloc_target+0x295/0x2d0
[<ffffffff81230bca>] ? device_release+0x1a/0x80
[<ffffffff81246bce>] ? __scsi_scan_target+0xce/0x5f0
[<ffffffff8102ec22>] ? dequeue_task_fair+0x52/0x150
[<ffffffff8139a91d>] ? __schedule+0x25d/0x7d0
[<ffffffff812477b6>] ? scsi_scan_target+0xc6/0xe0
[<ffffffffa000e75f>] ? fc_scsi_scan_rport+0xaf/0xc0 [scsi_transport_fc]
[<ffffffff8104eb06>] ? process_one_work+0x116/0x3a0
[<ffffffff8104f1ec>] ? worker_thread+0x14c/0x400
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff81052f06>] ? kthread+0x96/0xa0
[<ffffffff8139f134>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81052e70>] ? kthread_worker_fn+0x120/0x120
[<ffffffff8139f130>] ? gs_change+0xb/0xb
3. another thread is trying to remove this device because of timeout
DEBUG: __scsi_remove_device(): sdev=ffff88046a931000 7:0:5:0
Pid: 4, comm: kworker/0:0 Tainted: P O 3.2.48-swt9004 #33
Call Trace:
[<ffffffff81248c66>] ? __scsi_remove_device+0x46/0x110
[<ffffffff8139b63a>] ? mutex_lock+0x1a/0x40
[<ffffffff81248d58>] ? scsi_remove_device+0x28/0x40
[<ffffffff81242a00>] ? scsi_kmap_atomic_sg+0x180/0x180
[<ffffffff81248ed1>] ? scsi_remove_target+0x141/0x1e0
[<ffffffff8104eb06>] ? process_one_work+0x116/0x3a0
[<ffffffff8104f1ec>] ? worker_thread+0x14c/0x400
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff81052f06>] ? kthread+0x96/0xa0
[<ffffffff8139f134>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81052e70>] ? kthread_worker_fn+0x120/0x120
[<ffffffff8139f130>] ? gs_change+0xb/0xb
and it's got dead sdev object. I don't understand how this can happen
because __scsi_remove_target() iterating over __devices and getting sdev
reference under host_lock and that should be enough.
DEBUG: kref_put(): kref=ffff88046a9312e0 val=-1
------------[ cut here ]------------
WARNING: at lib/kref.c:61 kref_put+0x88/0xc0()
Hardware name: X9DRi-LN4+/X9DR3-LN4+
Modules linked in: qla2xxx(O) igb ehci_hcd scsi_transport_fc
Pid: 4, comm: kworker/0:0 Tainted: P O 3.2.48-swt9004 #33
Call Trace:
[<ffffffff81037bfb>] ? warn_slowpath_common+0x7b/0xc0
[<ffffffff811ac6c0>] ? kobject_del+0x30/0x30
[<ffffffff811adb08>] ? kref_put+0x88/0xc0
[<ffffffff81248cac>] ? __scsi_remove_device+0x8c/0x110
[<ffffffff8139b63a>] ? mutex_lock+0x1a/0x40
[<ffffffff81248d58>] ? scsi_remove_device+0x28/0x40
[<ffffffff81242a00>] ? scsi_kmap_atomic_sg+0x180/0x180
[<ffffffff81248ed1>] ? scsi_remove_target+0x141/0x1e0
[<ffffffff8104eb06>] ? process_one_work+0x116/0x3a0
[<ffffffff8104f1ec>] ? worker_thread+0x14c/0x400
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff8104f0a0>] ? rescuer_thread+0x310/0x310
[<ffffffff81052f06>] ? kthread+0x96/0xa0
[<ffffffff8139f134>] ? kernel_thread_helper+0x4/0x10
[<ffffffff81052e70>] ? kthread_worker_fn+0x120/0x120
[<ffffffff8139f130>] ? gs_change+0xb/0xb
Here is the patch which helped me:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 9117d0b..676e5ff 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1094,6 +1094,7 @@ static void __scsi_remove_target(struct
scsi_target *starget)
unsigned long flags;
struct scsi_device *sdev;
+ mutex_lock(&shost->scan_mutex);
spin_lock_irqsave(shost->host_lock, flags);
restart:
list_for_each_entry(sdev, &shost->__devices, siblings) {
@@ -1102,12 +1103,13 @@ static void __scsi_remove_target(struct
scsi_target *starget)
scsi_device_get(sdev))
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
- scsi_remove_device(sdev);
+ __scsi_remove_device(sdev);
scsi_device_put(sdev);
spin_lock_irqsave(shost->host_lock, flags);
goto restart;
}
spin_unlock_irqrestore(shost->host_lock, flags);
+ mutex_unlock(&shost->scan_mutex);
}
/**
I'm not sure about the fix is correct, but I was not able to reproduce
the panic.
P.S. Here is another patch which help to detect reference count underflow
diff --git a/include/linux/kref.h b/include/linux/kref.h
index 484604d..05dd2b3 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -68,9 +68,13 @@ static inline void kref_get(struct kref *kref)
static inline int kref_sub(struct kref *kref, unsigned int count,
void (*release)(struct kref *kref))
{
+ long refs;
+
WARN_ON(release == NULL);
- if (atomic_sub_and_test((int) count, &kref->refcount)) {
+ refs = atomic_sub_return((int) count, &kref->refcount);
+ WARN_ON(refs < 0);
+ if (refs == 0) {
release(kref);
return 1;
}
--
Andrey Zonov
next reply other threads:[~2014-03-21 5:12 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-21 5:12 Andrey Zonov [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-03-21 1:42 SCSI: race condition between scsi_remove_target and scsi_probe_and_add_lun Andrey Zonov
2015-10-24 3:15 ` Alexey Ivanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=532BCA31.20902@gmail.com \
--to=andrey.zonov@gmail.com \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.