[PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices
@ 2016-01-11 17:28 Ewan D. Milne
  2016-01-11 17:28 ` [PATCH 1/2] drivers/base: add bus_device_iter_init, bus_device_iter_next, bus_device_iter_exit Ewan D. Milne
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ewan D. Milne @ 2016-01-11 17:28 UTC (permalink / raw)
  To: linux-kernel, linux-scsi; +Cc: gregkh, martin.petersen, hare

From: "Ewan D. Milne" <emilne@redhat.com>

The klist traversal used by the reading of /proc/scsi/scsi is not interlocked
against device removal.  It takes a reference on the containing object, but
this does not prevent the device from being removed from the list.  Thus, we
get errors and eventually panic, as shown in the traces below.  Fix this by
keeping a klist iterator in the seq_file private data.

The problem can be easily reproduced by repeatedly increasing scsi_debug's
max_luns to 30 and then deleting the devices via sysfs, while simulatenously
accessing /proc/scsi/scsi.
    
>From a patch originally developed by David Jeffery <djeffery@redhat.com>

Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
Dec  3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
Dec  3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff8167225d>] klist_iter_init_node+0x3d/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d41>] bus_find_device+0x51/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
Dec  3 13:22:02 localhost kernel: [<ffffffff811e2e50>] seq_read+0x290/0x370
Dec  3 13:22:02 localhost kernel: [<ffffffff81225be8>] proc_reg_read+0x48/0x70
Dec  3 13:22:02 localhost kernel: [<ffffffff811c07c8>] __vfs_read+0x28/0xd0
Dec  3 13:22:02 localhost kernel: [<ffffffff812b5323>] ? security_file_permission+0xa3/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0cf3>] ? rw_verify_area+0x53/0xf0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0e16>] vfs_read+0x86/0x130
Dec  3 13:22:02 localhost kernel: [<ffffffff811c1bb6>] SyS_read+0x46/0xa0
Dec  3 13:22:02 localhost kernel: [<ffffffff8167c717>] entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:02 localhost kernel: ---[ end trace 99a60fb1c41fc8c9 ]---
Dec  3 13:22:02 localhost kernel: ------------[ cut here ]------------
Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at lib/klist.c:189 klist_release+0xa8/0xb0()
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Tainted: G        W       4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81aaa040 ffff880613acfcc0 ffffffff81321eef 0000000000000000
Dec  3 13:22:02 localhost kernel: ffff880613acfcf8 ffffffff8107ca52 dead0000000000f8 ffff880613acfd80
Dec  3 13:22:02 localhost kernel: ffff88060f7aa368 ffff88060f7aa380 ffff88061176b198 ffff880613acfd08
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cb4a>] warn_slowpath_null+0x1a/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff816720a8>] klist_release+0xa8/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814220a0>] ? bus_uevent_store+0x50/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff816723f5>] klist_next+0x95/0xf0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d62>] bus_find_device+0x72/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
Dec  3 13:22:02 localhost kernel: [<ffffffff811e2e50>] seq_read+0x290/0x370
Dec  3 13:22:02 localhost kernel: [<ffffffff81225be8>] proc_reg_read+0x48/0x70
Dec  3 13:22:02 localhost kernel: [<ffffffff811c07c8>] __vfs_read+0x28/0xd0
Dec  3 13:22:02 localhost kernel: [<ffffffff812b5323>] ? security_file_permission+0xa3/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0cf3>] ? rw_verify_area+0x53/0xf0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0e16>] vfs_read+0x86/0x130
Dec  3 13:22:02 localhost kernel: [<ffffffff811c1bb6>] SyS_read+0x46/0xa0
Dec  3 13:22:02 localhost kernel: [<ffffffff8167c717>] entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:02 localhost kernel: ---[ end trace 99a60fb1c41fc8ca ]---
Dec  3 13:22:02 localhost kernel: ------------[ cut here ]------------
Dec  3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at lib/list_debug.c:53 __list_del_entry+0x61/0xc0()
Dec  3 13:22:02 localhost kernel: list_del corruption, ffff88060f7aa370->next is LIST_POISON1 (dead000000000100)
Dec  3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Tainted: G        W       4.4.0-rc1+ #2
Dec  3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:02 localhost kernel: ffffffff81a516b1 ffff880613acfc48 ffffffff81321eef ffff880613acfc90
Dec  3 13:22:02 localhost kernel: ffff880613acfc80 ffffffff8107ca52 ffff88060f7aa370 ffff880613acfd80
Dec  3 13:22:02 localhost kernel: ffff88060f7aa368 ffff88060f7aa380 ffff88061176b198 ffff880613acfce0
Dec  3 13:22:02 localhost kernel: Call Trace:
Dec  3 13:22:02 localhost kernel: [<ffffffff81321eef>] dump_stack+0x44/0x55
Dec  3 13:22:02 localhost kernel: [<ffffffff8107ca52>] warn_slowpath_common+0x82/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff8107cadc>] warn_slowpath_fmt+0x4c/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff8133e581>] __list_del_entry+0x61/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff8133e5ed>] list_del+0xd/0x30
Dec  3 13:22:02 localhost kernel: [<ffffffff81672021>] klist_release+0x21/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814220a0>] ? bus_uevent_store+0x50/0x50
Dec  3 13:22:02 localhost kernel: [<ffffffff816723f5>] klist_next+0x95/0xf0
Dec  3 13:22:02 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:02 localhost kernel: [<ffffffff81421d62>] bus_find_device+0x72/0xb0
Dec  3 13:22:02 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
Dec  3 13:22:02 localhost kernel: [<ffffffff811e2e50>] seq_read+0x290/0x370
Dec  3 13:22:02 localhost kernel: [<ffffffff81225be8>] proc_reg_read+0x48/0x70
Dec  3 13:22:02 localhost kernel: [<ffffffff811c07c8>] __vfs_read+0x28/0xd0
Dec  3 13:22:02 localhost kernel: [<ffffffff812b5323>] ? security_file_permission+0xa3/0xc0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0cf3>] ? rw_verify_area+0x53/0xf0
Dec  3 13:22:02 localhost kernel: [<ffffffff811c0e16>] vfs_read+0x86/0x130
Dec  3 13:22:02 localhost kernel: [<ffffffff811c1bb6>] SyS_read+0x46/0xa0
Dec  3 13:22:02 localhost kernel: [<ffffffff8167c717>] entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:02 localhost kernel: ---[ end trace 99a60fb1c41fc8cb ]---
Dec  3 13:22:03 localhost kernel: general protection fault: 0000 [#1] SMP
Dec  3 13:22:03 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
Dec  3 13:22:03 localhost kernel: CPU: 2 PID: 28073 Comm: cat Tainted: G        W       4.4.0-rc1+ #2
Dec  3 13:22:03 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
Dec  3 13:22:03 localhost kernel: task: ffff8806090ef080 ti: ffff880613acc000 task.ti: ffff880613acc000
Dec  3 13:22:03 localhost kernel: RIP: 0010:[<ffffffff816723b2>]  [<ffffffff816723b2>] klist_next+0x52/0xf0
Dec  3 13:22:03 localhost kernel: RSP: 0018:ffff880613acfd48  EFLAGS: 00010287
Dec  3 13:22:03 localhost kernel: RAX: ffff880612ea2ea8 RBX: dead0000000000f8 RCX: 0000000000000000
Dec  3 13:22:03 localhost kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81cfcab0
Dec  3 13:22:03 localhost kernel: RBP: ffff880613acfd70 R08: 0000000000000001 R09: 000000000000fffe
Dec  3 13:22:03 localhost kernel: R10: 0000000000002a81 R11: 0000000000000002 R12: ffff880613acfd80
Dec  3 13:22:03 localhost kernel: R13: ffffffff814220a0 R14: ffff88060f7aa368 R15: ffff88061176b101
Dec  3 13:22:03 localhost kernel: FS:  00007f47e9ca4700(0000) GS:ffff880617040000(0000) knlGS:0000000000000000
Dec  3 13:22:03 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  3 13:22:03 localhost kernel: CR2: 0000000001d22038 CR3: 000000060724c000 CR4: 00000000000406e0
Dec  3 13:22:03 localhost kernel: Stack:
Dec  3 13:22:03 localhost kernel: ffff88061176b198 0000000000000000 ffffffff814542b0 ffff880610cfb100
Dec  3 13:22:03 localhost kernel: ffff88061176b198 ffff880613acfda8 ffffffff81421d62 ffff880612ea2ea8
Dec  3 13:22:03 localhost kernel: 0000000000000000 ffff88061176b198 ffff880613acff20 ffff880035f39b00
Dec  3 13:22:03 localhost kernel: Call Trace:
Dec  3 13:22:03 localhost kernel: [<ffffffff814542b0>] ? proc_scsi_show+0x20/0x20
Dec  3 13:22:03 localhost kernel: [<ffffffff81421d62>] bus_find_device+0x72/0xb0
Dec  3 13:22:03 localhost kernel: [<ffffffff814545ad>] scsi_seq_next+0x2d/0x40
Dec  3 13:22:03 localhost kernel: [<ffffffff811e2e50>] seq_read+0x290/0x370
Dec  3 13:22:03 localhost kernel: [<ffffffff81225be8>] proc_reg_read+0x48/0x70
Dec  3 13:22:03 localhost kernel: [<ffffffff811c07c8>] __vfs_read+0x28/0xd0
Dec  3 13:22:03 localhost kernel: [<ffffffff812b5323>] ? security_file_permission+0xa3/0xc0
Dec  3 13:22:03 localhost kernel: [<ffffffff811c0cf3>] ? rw_verify_area+0x53/0xf0
Dec  3 13:22:03 localhost kernel: [<ffffffff811c0e16>] vfs_read+0x86/0x130
Dec  3 13:22:03 localhost kernel: [<ffffffff811c1bb6>] SyS_read+0x46/0xa0
Dec  3 13:22:03 localhost kernel: [<ffffffff8167c717>] entry_SYSCALL_64_fastpath+0x12/0x6a
Dec  3 13:22:03 localhost kernel: Code: 8b 46 08 49 8d 7e 18 48 8d 58 f8 f0 41 83 6e 18 01 74 56 49 8b 04 24 45 31 ff 45 31 ed 48 39 c3 49 c7 44 24 08 00 00 00 00 74 20 <f6> 03 01 75 5c b8 01 00 00 00 f0 1 43 18 83 c0 01 83 f8 01
Dec  3 13:22:03 localhost kernel: RIP  [<ffffffff816723b2>] klist_next+0x52/0xf0
Dec  3 13:22:03 localhost kernel: RSP <ffff880613acfd48>
Dec  3 13:22:03 localhost kernel: ---[ end trace 99a60fb1c41fc8cc ]---

Ewan D. Milne (2):
  drivers/base: add bus_device_iter_init, bus_device_iter_next,
    bus_device_iter_exit
  scsi_proc: Change /proc/scsi/scsi to use bus device iterator

 drivers/base/bus.c       | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_proc.c | 49 ++++++++++++++++++++++++----------------
 include/linux/device.h   |  5 ++++
 3 files changed, 93 insertions(+), 20 deletions(-)

-- 
2.1.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drivers/base: add bus_device_iter_init, bus_device_iter_next, bus_device_iter_exit
  2016-01-11 17:28 [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices Ewan D. Milne
@ 2016-01-11 17:28 ` Ewan D. Milne
  2016-01-11 17:28 ` [PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator Ewan D. Milne
  2016-01-11 19:15 ` [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices James Bottomley
  2 siblings, 0 replies; 6+ messages in thread
From: Ewan D. Milne @ 2016-01-11 17:28 UTC (permalink / raw)
  To: linux-kernel, linux-scsi; +Cc: gregkh, martin.petersen, hare

From: "Ewan D. Milne" <emilne@redhat.com>

These functions are needed to expose an iterator for SCSI usage.

>From a patch originally developed by David Jeffery <djeffery@redhat.com>

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/base/bus.c     | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h |  5 +++++
 2 files changed, 64 insertions(+)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 5005924..a472e46 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -318,6 +318,65 @@ int bus_for_each_dev(struct bus_type *bus, struct device *start,
 EXPORT_SYMBOL_GPL(bus_for_each_dev);
 
 /**
+ * bus_device_iter_init - Initialize an iterator for walking a bus's devices.
+ * @iter: iterator structure to initialize
+ * @bus: bus type
+ *
+ * Initializes an iterator for safely walking a bus's list of devices.  The
+ * iterator can be used by bus_device_iter_next() to safely walk the list, even
+ * if a device is removed from the list while being examined.  Needs to be
+ * matched with a call to bus_device_iter_exit() to clean up the iterator when
+ * finished.
+ *
+ * Return 0 on success, non-zero on failure.  Iterator cannot be used
+ * for a non-zero result
+ */
+int bus_device_iter_init(struct klist_iter *iter,
+			 struct bus_type *bus)
+{
+	if (!bus || !bus->p)
+		return -EINVAL;
+
+	klist_iter_init_node(&bus->p->klist_devices, iter, NULL);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bus_device_iter_init);
+
+/**
+ * bus_device_iter_next - Get a bus's next device from the iterator.
+ * @iter: iterator structure from bus_device_iter_init()
+ *
+ * Finds the next valid device entry on a bus's device list.  Allows the list
+ * to be safely traversed by the caller even when other tasks remove devices
+ * from the list.
+ *
+ * Returns a reference to the next device, or NULL if no more devices.
+ */
+struct device *bus_device_iter_next(struct klist_iter *iter)
+{
+	struct device *dev;
+
+	while ((dev = next_device(iter)))
+		if (get_device(dev))
+			break;
+
+	return dev;
+}
+EXPORT_SYMBOL_GPL(bus_device_iter_next);
+
+/**
+ * bus_device_iter_exit - Clean up an iterator from walking a bus's device list.
+ * @iter: iterator structure from bus_device_iter_init() to clean up
+ *
+ * Clean up any remaining state after finishing walking a bus's device list.
+ */
+void bus_device_iter_exit(struct klist_iter *iter)
+{
+	klist_iter_exit(iter);
+}
+EXPORT_SYMBOL_GPL(bus_device_iter_exit);
+
+/**
  * bus_find_device - device iterator for locating a particular device.
  * @bus: bus type
  * @start: Device to begin with
diff --git a/include/linux/device.h b/include/linux/device.h
index b8f411b..a44d912 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -151,6 +151,11 @@ void subsys_dev_iter_exit(struct subsys_dev_iter *iter);
 
 int bus_for_each_dev(struct bus_type *bus, struct device *start, void *data,
 		     int (*fn)(struct device *dev, void *data));
+
+int bus_device_iter_init(struct klist_iter *iter, struct bus_type *bus);
+struct device *bus_device_iter_next(struct klist_iter *iter);
+void bus_device_iter_exit(struct klist_iter *iter);
+
 struct device *bus_find_device(struct bus_type *bus, struct device *start,
 			       void *data,
 			       int (*match)(struct device *dev, void *data));
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator
  2016-01-11 17:28 [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices Ewan D. Milne
  2016-01-11 17:28 ` [PATCH 1/2] drivers/base: add bus_device_iter_init, bus_device_iter_next, bus_device_iter_exit Ewan D. Milne
@ 2016-01-11 17:28 ` Ewan D. Milne
  2016-01-11 19:15 ` [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices James Bottomley
  2 siblings, 0 replies; 6+ messages in thread
From: Ewan D. Milne @ 2016-01-11 17:28 UTC (permalink / raw)
  To: linux-kernel, linux-scsi; +Cc: gregkh, martin.petersen, hare

From: "Ewan D. Milne" <emilne@redhat.com>

This prevents crashing due to accessing a removed element on the list,
the iterator will now hold the correct reference.  It was not sufficient
to rely on the klist's reference on the containing device object.

>From a patch originally developed by David Jeffery <djeffery@redhat.com>

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_proc.c | 49 ++++++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_proc.c b/drivers/scsi/scsi_proc.c
index 251598e..6c1b79d 100644
--- a/drivers/scsi/scsi_proc.c
+++ b/drivers/scsi/scsi_proc.c
@@ -40,6 +40,11 @@
 /* 4K page size, but our output routines, use some slack for overruns */
 #define PROC_BLOCK_SIZE (3*1024)
 
+struct scsi_proc_state {
+	struct klist_iter iter;
+	int pos;
+};
+
 static struct proc_dir_entry *proc_scsi;
 
 /* Protect sht->present and sht->proc_dir */
@@ -370,47 +375,50 @@ static ssize_t proc_scsi_write(struct file *file, const char __user *buf,
 	return err;
 }
 
-static int always_match(struct device *dev, void *data)
-{
-	return 1;
-}
-
-static inline struct device *next_scsi_device(struct device *start)
-{
-	struct device *next = bus_find_device(&scsi_bus_type, start, NULL,
-					      always_match);
-	put_device(start);
-	return next;
-}
-
 static void *scsi_seq_start(struct seq_file *sfile, loff_t *pos)
 {
+	struct scsi_proc_state *state = sfile->private;
 	struct device *dev = NULL;
 	loff_t n = *pos;
+	int err;
+
+	err = bus_device_iter_init(&state->iter, &scsi_bus_type);
+	if (err < 0)
+		return ERR_PTR(err);
 
-	while ((dev = next_scsi_device(dev))) {
+	while ((dev = bus_device_iter_next(&state->iter))) {
 		if (!n--)
 			break;
-		sfile->private++;
+		put_device(dev);
+		state->pos++;
 	}
 	return dev;
 }
 
 static void *scsi_seq_next(struct seq_file *sfile, void *v, loff_t *pos)
 {
+	struct scsi_proc_state *state = sfile->private;
+
 	(*pos)++;
-	sfile->private++;
-	return next_scsi_device(v);
+	put_device(v);
+	state->pos++;
+
+	return bus_device_iter_next(&state->iter);
 }
 
 static void scsi_seq_stop(struct seq_file *sfile, void *v)
 {
+	struct scsi_proc_state *state = sfile->private;
+
 	put_device(v);
+	bus_device_iter_exit(&state->iter);
 }
 
 static int scsi_seq_show(struct seq_file *sfile, void *dev)
 {
-	if (!sfile->private)
+	struct scsi_proc_state *state = sfile->private;
+
+	if (!state->pos)
 		seq_puts(sfile, "Attached devices:\n");
 
 	return proc_print_scsidevice(dev, sfile);
@@ -436,7 +444,8 @@ static int proc_scsi_open(struct inode *inode, struct file *file)
 	 * We don't really need this for the write case but it doesn't
 	 * harm either.
 	 */
-	return seq_open(file, &scsi_seq_ops);
+	return seq_open_private(file, &scsi_seq_ops,
+				sizeof(struct scsi_proc_state));
 }
 
 static const struct file_operations proc_scsi_operations = {
@@ -445,7 +454,7 @@ static const struct file_operations proc_scsi_operations = {
 	.read		= seq_read,
 	.write		= proc_scsi_write,
 	.llseek		= seq_lseek,
-	.release	= seq_release,
+	.release	= seq_release_private,
 };
 
 /**
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices
  2016-01-11 17:28 [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices Ewan D. Milne
  2016-01-11 17:28 ` [PATCH 1/2] drivers/base: add bus_device_iter_init, bus_device_iter_next, bus_device_iter_exit Ewan D. Milne
  2016-01-11 17:28 ` [PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator Ewan D. Milne
@ 2016-01-11 19:15 ` James Bottomley
  2016-01-11 21:32   ` Ewan Milne
  2 siblings, 1 reply; 6+ messages in thread
From: James Bottomley @ 2016-01-11 19:15 UTC (permalink / raw)
  To: Ewan D. Milne, linux-kernel, linux-scsi; +Cc: gregkh, martin.petersen, hare

On Mon, 2016-01-11 at 12:28 -0500, Ewan D. Milne wrote:
> From: "Ewan D. Milne" <emilne@redhat.com>
> 
> The klist traversal used by the reading of /proc/scsi/scsi is not
> interlocked
> against device removal.  It takes a reference on the containing
> object, but
> this does not prevent the device from being removed from the list. 
>  Thus, we
> get errors and eventually panic, as shown in the traces below.  Fix
> this by
> keeping a klist iterator in the seq_file private data.
> 
> The problem can be easily reproduced by repeatedly increasing
> scsi_debug's
> max_luns to 30 and then deleting the devices via sysfs, while
> simulatenously
> accessing /proc/scsi/scsi.
>     
> From a patch originally developed by David Jeffery <
> djeffery@redhat.com>

OK, so it looks like this is a bug in the klist system.  When a
starting point is used, there should be a check to see if it's still
active otherwise the whole thing is racy.  If it's fixed in klist, the
fix works for everyone, not just SCSI.

How about this?  It causes the iterator to start at the beginning if
the node has been deleted.  That will produce double output during some
of your test, but I think that's OK given that this is a rare race.

James

---

diff --git a/lib/klist.c b/lib/klist.c
index d74cf7a..0507fa5 100644
--- a/lib/klist.c
+++ b/lib/klist.c
@@ -282,9 +282,9 @@ void klist_iter_init_node(struct klist *k, struct klist_iter *i,
 			  struct klist_node *n)
 {
 	i->i_klist = k;
-	i->i_cur = n;
-	if (n)
-		kref_get(&n->n_ref);
+	i->i_cur = NULL;
+	if (n && kref_get_unless_zero(&n->n_ref))
+		i->i_cur = n;
 }
 EXPORT_SYMBOL_GPL(klist_iter_init_node);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices
  2016-01-11 19:15 ` [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices James Bottomley
@ 2016-01-11 21:32   ` Ewan Milne
  2016-01-12  2:35     ` James Bottomley
  0 siblings, 1 reply; 6+ messages in thread
From: Ewan Milne @ 2016-01-11 21:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-kernel, linux-scsi, gregkh, martin.petersen, hare

On Mon, 2016-01-11 at 11:15 -0800, James Bottomley wrote:
> On Mon, 2016-01-11 at 12:28 -0500, Ewan D. Milne wrote:
> > From: "Ewan D. Milne" <emilne@redhat.com>
> > 
> > The klist traversal used by the reading of /proc/scsi/scsi is not
> > interlocked
> > against device removal.  It takes a reference on the containing
> > object, but
> > this does not prevent the device from being removed from the list. 
> >  Thus, we
> > get errors and eventually panic, as shown in the traces below.  Fix
> > this by
> > keeping a klist iterator in the seq_file private data.
> > 
> > The problem can be easily reproduced by repeatedly increasing
> > scsi_debug's
> > max_luns to 30 and then deleting the devices via sysfs, while
> > simulatenously
> > accessing /proc/scsi/scsi.
> >     
> > From a patch originally developed by David Jeffery <
> > djeffery@redhat.com>
> 
> OK, so it looks like this is a bug in the klist system.  When a
> starting point is used, there should be a check to see if it's still
> active otherwise the whole thing is racy.  If it's fixed in klist, the
> fix works for everyone, not just SCSI.
> 
> How about this?  It causes the iterator to start at the beginning if
> the node has been deleted.  That will produce double output during some
> of your test, but I think that's OK given that this is a rare race.
> 
> James

I'm running with your change now, it does appear to fix the problem.
I guess the question is whether this behavior would trip up any other
klist users, for /proc/scsi/scsi it is probably not a problem.  The
worst that might happen is that userspace tools that parse the output
would get duplicate entries.

-Ewan

> ---
> 
> diff --git a/lib/klist.c b/lib/klist.c
> index d74cf7a..0507fa5 100644
> --- a/lib/klist.c
> +++ b/lib/klist.c
> @@ -282,9 +282,9 @@ void klist_iter_init_node(struct klist *k, struct klist_iter *i,
>  			  struct klist_node *n)
>  {
>  	i->i_klist = k;
> -	i->i_cur = n;
> -	if (n)
> -		kref_get(&n->n_ref);
> +	i->i_cur = NULL;
> +	if (n && kref_get_unless_zero(&n->n_ref))
> +		i->i_cur = n;
>  }
>  EXPORT_SYMBOL_GPL(klist_iter_init_node);
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices
  2016-01-11 21:32   ` Ewan Milne
@ 2016-01-12  2:35     ` James Bottomley
  0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2016-01-12  2:35 UTC (permalink / raw)
  To: emilne; +Cc: linux-kernel, linux-scsi, gregkh, martin.petersen, hare

On Mon, 2016-01-11 at 16:32 -0500, Ewan Milne wrote:
> On Mon, 2016-01-11 at 11:15 -0800, James Bottomley wrote:
> > On Mon, 2016-01-11 at 12:28 -0500, Ewan D. Milne wrote:
> > > From: "Ewan D. Milne" <emilne@redhat.com>
> > > 
> > > The klist traversal used by the reading of /proc/scsi/scsi is not
> > > interlocked
> > > against device removal.  It takes a reference on the containing
> > > object, but
> > > this does not prevent the device from being removed from the
> > > list. 
> > >  Thus, we
> > > get errors and eventually panic, as shown in the traces below. 
> > >  Fix
> > > this by
> > > keeping a klist iterator in the seq_file private data.
> > > 
> > > The problem can be easily reproduced by repeatedly increasing
> > > scsi_debug's
> > > max_luns to 30 and then deleting the devices via sysfs, while
> > > simulatenously
> > > accessing /proc/scsi/scsi.
> > >     
> > > From a patch originally developed by David Jeffery <
> > > djeffery@redhat.com>
> > 
> > OK, so it looks like this is a bug in the klist system.  When a
> > starting point is used, there should be a check to see if it's
> > still
> > active otherwise the whole thing is racy.  If it's fixed in klist,
> > the
> > fix works for everyone, not just SCSI.
> > 
> > How about this?  It causes the iterator to start at the beginning
> > if
> > the node has been deleted.  That will produce double output during
> > some
> > of your test, but I think that's OK given that this is a rare race.
> > 
> > James
> 
> I'm running with your change now, it does appear to fix the problem.
> I guess the question is whether this behavior would trip up any other
> klist users,

I don't see how it can.  We simply can't use a removed node as the
starting point without triggering the kref warn on you see in your log.
 Things just go downhill from there.  This will happen to any klist
user, not just us.  I'd contend that starting at the beginning is
better than an eventual panic for anyone.

What you should see currently is actually the list is truncated when
the problem is hit because the next pointer is set to the poison value.
 That causes another warn on and traversal truncation (or straight
dereference if you're unlucky).

>  for /proc/scsi/scsi it is probably not a problem.  The
> worst that might happen is that userspace tools that parse the output
> would get duplicate entries.

Well, it's not really possible to keep the current behaviour and
truncate the list (before oopsing).  There's no non-racy way of
positioning the iterator at the last element so it exits on
klist_next() because the list can be added to during the iteration.

James

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-12  2:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-11 17:28 [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices Ewan D. Milne
2016-01-11 17:28 ` [PATCH 1/2] drivers/base: add bus_device_iter_init, bus_device_iter_next, bus_device_iter_exit Ewan D. Milne
2016-01-11 17:28 ` [PATCH 2/2] scsi_proc: Change /proc/scsi/scsi to use bus device iterator Ewan D. Milne
2016-01-11 19:15 ` [PATCH 0/2] avoid crashing when reading /proc/scsi/scsi and simultaneously removing devices James Bottomley
2016-01-11 21:32   ` Ewan Milne
2016-01-12  2:35     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox