* [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
@ 2015-04-10 7:14 Calvin Owens
2015-04-10 14:30 ` James Bottomley
0 siblings, 1 reply; 3+ messages in thread
From: Calvin Owens @ 2015-04-10 7:14 UTC (permalink / raw)
To: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
Abhijit Mahajan
Cc: James E.J. Bottomley, MPT-FusionLinux.pdl, linux-kernel,
kernel-team, stable, Calvin Owens
While _scsih_probe_sas() is initializing devices, the hardware can trigger a
MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
The handler for TARG_NOT_RESPONDING calls _scsih_device_remove_by_handle(),
which deletes the device in question from either ioc->sas_device_list or
ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion when
iterating over ioc->sas_device_init_list, this results in a use-after-free
in _scsih_probe_sas(), and also corrupts the list:
mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
mpt2sas1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
------------[ cut here ]------------
WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
list_del corruption, ffff88240012fa00->prev is LIST_POISON2 (dead000000200200)
<snip>
Workqueue: events work_for_cpu_fn
ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
Call Trace:
[<ffffffff810c4f17>] ? print_modules+0xd7/0x120
[<ffffffff8169b61e>] dump_stack+0x19/0x1b
[<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
[<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
[<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
[<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
[<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
[<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
[<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
[<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
[<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
[<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
[<ffffffff813444eb>] local_pci_probe+0x4b/0x80
[<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
[<ffffffff81070012>] process_one_work+0x212/0x6e0
[<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
[<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
[<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
[<ffffffff81070ae2>] worker_thread+0x262/0x370
[<ffffffff81070880>] ? rescuer_thread+0x360/0x360
[<ffffffff81078f3b>] kthread+0xdb/0xe0
[<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
[<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
[<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
---[ end trace 41352a0bd2d0d61b ]---
This either results in an immediate panic, or corrupts random memory and
causes nasty problems later in the box's uptime.
This patch splices the discovered devices out of the global list while
holding the lock, since _scsih_probe_sas() always removes them from that
global list anyway (either deleting them if initialization fails, or
moving them onto ioc->sas_device_list if it succeeds). The interrupt that
caused this bug will no longer cause the device to be removed during
initialization, since it won't exist on the global lists, but
_scsih_probe_sas() will remove it anyway when it fails to initialize.
Cc: stable@vger.kernel.org
Signed-off-by: Calvin Owens <calvinowens@fb.com>
---
drivers/scsi/mpt2sas/mpt2sas_scsih.c | 14 +++++++++++---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 +++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 3f26147..4d603cb 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7977,11 +7977,19 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
{
struct _sas_device *sas_device, *next;
unsigned long flags;
+ LIST_HEAD(head);
- /* SAS Device List */
- list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
- list) {
+ /*
+ * Yank the entries out of the global list before attempting to iterate
+ * over them, since interrupts can delete sas_device entries out of the
+ * global list while we iterate.
+ */
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ list_splice_init(&ioc->sas_device_init_list, &head);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ /* SAS Device List */
+ list_for_each_entry_safe(sas_device, next, &head, list) {
if (ioc->hide_drives)
continue;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..1a6a6a3 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7609,11 +7609,19 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
{
struct _sas_device *sas_device, *next;
unsigned long flags;
+ LIST_HEAD(head);
- /* SAS Device List */
- list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
- list) {
+ /*
+ * Yank the entries out of the global list before attempting to iterate
+ * over them, since interrupts can delete sas_device entries out of the
+ * global list while we iterate.
+ */
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ list_splice_init(&ioc->sas_device_init_list, &head);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ /* SAS Device List */
+ list_for_each_entry_safe(sas_device, next, &head, list) {
if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
sas_device->sas_address_parent)) {
list_del(&sas_device->list);
--
1.8.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-04-10 7:14 [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization Calvin Owens
@ 2015-04-10 14:30 ` James Bottomley
2015-04-10 16:43 ` Sathya Prakash
0 siblings, 1 reply; 3+ messages in thread
From: James Bottomley @ 2015-04-10 14:30 UTC (permalink / raw)
To: calvinowens@fb.com
Cc: linux-kernel@vger.kernel.org, MPT-FusionLinux.pdl@avagotech.com,
kernel-team@fb.com, stable@vger.kernel.org,
praveen.krishnamoorthy@avagotech.com,
abhijit.mahajan@avagotech.com,
nagalakshmi.nandigama@avagotech.com,
sreekanth.reddy@avagotech.com
On Fri, 2015-04-10 at 00:14 -0700, Calvin Owens wrote:
> While _scsih_probe_sas() is initializing devices, the hardware can trigger a
> MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
>
> The handler for TARG_NOT_RESPONDING calls _scsih_device_remove_by_handle(),
> which deletes the device in question from either ioc->sas_device_list or
> ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion when
> iterating over ioc->sas_device_init_list, this results in a use-after-free
> in _scsih_probe_sas(), and also corrupts the list:
>
> mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
> mpt2sas1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
> list_del corruption, ffff88240012fa00->prev is LIST_POISON2 (dead000000200200)
> <snip>
> Workqueue: events work_for_cpu_fn
> ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
> ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
> ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
> Call Trace:
> [<ffffffff810c4f17>] ? print_modules+0xd7/0x120
> [<ffffffff8169b61e>] dump_stack+0x19/0x1b
> [<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
> [<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
> [<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
> [<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
> [<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
> [<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
> [<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
> [<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
> [<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
> [<ffffffff813444eb>] local_pci_probe+0x4b/0x80
> [<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81070012>] process_one_work+0x212/0x6e0
> [<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
> [<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
> [<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
> [<ffffffff81070ae2>] worker_thread+0x262/0x370
> [<ffffffff81070880>] ? rescuer_thread+0x360/0x360
> [<ffffffff81078f3b>] kthread+0xdb/0xe0
> [<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> [<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> ---[ end trace 41352a0bd2d0d61b ]---
>
> This either results in an immediate panic, or corrupts random memory and
> causes nasty problems later in the box's uptime.
>
> This patch splices the discovered devices out of the global list while
> holding the lock, since _scsih_probe_sas() always removes them from that
> global list anyway (either deleting them if initialization fails, or
> moving them onto ioc->sas_device_list if it succeeds). The interrupt that
> caused this bug will no longer cause the device to be removed during
> initialization, since it won't exist on the global lists, but
> _scsih_probe_sas() will remove it anyway when it fails to initialize.
Hopefully the avago team will curate this, but just in case they don't,
the correct list to make sure it gets the attention of storage people
should be
linux-scsi@vger.kernel.org
James
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-04-10 14:30 ` James Bottomley
@ 2015-04-10 16:43 ` Sathya Prakash
0 siblings, 0 replies; 3+ messages in thread
From: Sathya Prakash @ 2015-04-10 16:43 UTC (permalink / raw)
To: James Bottomley, calvinowens
Cc: linux-kernel, PDL-MPT-FUSIONLINUX, kernel-team, stable,
praveen.krishnamoorthy, abhijit.mahajan, nagalakshmi.nandigama,
Sreekanth Reddy
James & Calvin,
Noted this, we will review and ACK/revert back with further questions in
next couple of weeks.
Thanks
Sathya
-----Original Message-----
From: James Bottomley [mailto:jbottomley@odin.com]
Sent: Friday, April 10, 2015 9:31 AM
To: calvinowens@fb.com
Cc: linux-kernel@vger.kernel.org; MPT-FusionLinux.pdl@avagotech.com;
kernel-team@fb.com; stable@vger.kernel.org;
praveen.krishnamoorthy@avagotech.com; abhijit.mahajan@avagotech.com;
nagalakshmi.nandigama@avagotech.com; sreekanth.reddy@avagotech.com
Subject: Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during
initialization
On Fri, 2015-04-10 at 00:14 -0700, Calvin Owens wrote:
> While _scsih_probe_sas() is initializing devices, the hardware can
> trigger a MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
>
> The handler for TARG_NOT_RESPONDING calls
> _scsih_device_remove_by_handle(), which deletes the device in question
> from either ioc->sas_device_list or
> ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion
> ioc->when
> iterating over ioc->sas_device_init_list, this results in a
> use-after-free in _scsih_probe_sas(), and also corrupts the list:
>
> mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
> mpt2sas1: log_info(0x31111000): originator(PL), code(0x11),
> sub_code(0x1000)
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
> list_del corruption, ffff88240012fa00->prev is LIST_POISON2
> (dead000000200200)
> <snip>
> Workqueue: events work_for_cpu_fn
> ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
> ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
> ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
> Call Trace:
> [<ffffffff810c4f17>] ? print_modules+0xd7/0x120
> [<ffffffff8169b61e>] dump_stack+0x19/0x1b
> [<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
> [<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
> [<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
> [<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
> [<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
> [<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
> [<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
> [<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
> [<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
> [<ffffffff813444eb>] local_pci_probe+0x4b/0x80
> [<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81070012>] process_one_work+0x212/0x6e0
> [<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
> [<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
> [<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
> [<ffffffff81070ae2>] worker_thread+0x262/0x370
> [<ffffffff81070880>] ? rescuer_thread+0x360/0x360
> [<ffffffff81078f3b>] kthread+0xdb/0xe0
> [<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> [<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> ---[ end trace 41352a0bd2d0d61b ]---
>
> This either results in an immediate panic, or corrupts random memory
> and causes nasty problems later in the box's uptime.
>
> This patch splices the discovered devices out of the global list while
> holding the lock, since _scsih_probe_sas() always removes them from
> that global list anyway (either deleting them if initialization fails,
> or moving them onto ioc->sas_device_list if it succeeds). The
> interrupt that caused this bug will no longer cause the device to be
> removed during initialization, since it won't exist on the global
> lists, but
> _scsih_probe_sas() will remove it anyway when it fails to initialize.
Hopefully the avago team will curate this, but just in case they don't, the
correct list to make sure it gets the attention of storage people should be
linux-scsi@vger.kernel.org
James
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-04-10 16:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-10 7:14 [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization Calvin Owens
2015-04-10 14:30 ` James Bottomley
2015-04-10 16:43 ` Sathya Prakash
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).