* [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
@ 2015-04-10 7:14 Calvin Owens
2015-04-10 14:30 ` James Bottomley
0 siblings, 1 reply; 7+ messages in thread
From: Calvin Owens @ 2015-04-10 7:14 UTC (permalink / raw)
To: Nagalakshmi Nandigama, Praveen Krishnamoorthy, Sreekanth Reddy,
Abhijit Mahajan
Cc: James E.J. Bottomley, MPT-FusionLinux.pdl, linux-kernel,
kernel-team, stable, Calvin Owens
While _scsih_probe_sas() is initializing devices, the hardware can trigger a
MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
The handler for TARG_NOT_RESPONDING calls _scsih_device_remove_by_handle(),
which deletes the device in question from either ioc->sas_device_list or
ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion when
iterating over ioc->sas_device_init_list, this results in a use-after-free
in _scsih_probe_sas(), and also corrupts the list:
mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
mpt2sas1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
------------[ cut here ]------------
WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
list_del corruption, ffff88240012fa00->prev is LIST_POISON2 (dead000000200200)
<snip>
Workqueue: events work_for_cpu_fn
ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
Call Trace:
[<ffffffff810c4f17>] ? print_modules+0xd7/0x120
[<ffffffff8169b61e>] dump_stack+0x19/0x1b
[<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
[<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
[<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
[<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
[<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
[<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
[<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
[<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
[<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
[<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
[<ffffffff813444eb>] local_pci_probe+0x4b/0x80
[<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
[<ffffffff81070012>] process_one_work+0x212/0x6e0
[<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
[<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
[<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
[<ffffffff81070ae2>] worker_thread+0x262/0x370
[<ffffffff81070880>] ? rescuer_thread+0x360/0x360
[<ffffffff81078f3b>] kthread+0xdb/0xe0
[<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
[<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
[<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
---[ end trace 41352a0bd2d0d61b ]---
This either results in an immediate panic, or corrupts random memory and
causes nasty problems later in the box's uptime.
This patch splices the discovered devices out of the global list while
holding the lock, since _scsih_probe_sas() always removes them from that
global list anyway (either deleting them if initialization fails, or
moving them onto ioc->sas_device_list if it succeeds). The interrupt that
caused this bug will no longer cause the device to be removed during
initialization, since it won't exist on the global lists, but
_scsih_probe_sas() will remove it anyway when it fails to initialize.
Cc: stable@vger.kernel.org
Signed-off-by: Calvin Owens <calvinowens@fb.com>
---
drivers/scsi/mpt2sas/mpt2sas_scsih.c | 14 +++++++++++---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 14 +++++++++++---
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 3f26147..4d603cb 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7977,11 +7977,19 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
{
struct _sas_device *sas_device, *next;
unsigned long flags;
+ LIST_HEAD(head);
- /* SAS Device List */
- list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
- list) {
+ /*
+ * Yank the entries out of the global list before attempting to iterate
+ * over them, since interrupts can delete sas_device entries out of the
+ * global list while we iterate.
+ */
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ list_splice_init(&ioc->sas_device_init_list, &head);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ /* SAS Device List */
+ list_for_each_entry_safe(sas_device, next, &head, list) {
if (ioc->hide_drives)
continue;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..1a6a6a3 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7609,11 +7609,19 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
{
struct _sas_device *sas_device, *next;
unsigned long flags;
+ LIST_HEAD(head);
- /* SAS Device List */
- list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
- list) {
+ /*
+ * Yank the entries out of the global list before attempting to iterate
+ * over them, since interrupts can delete sas_device entries out of the
+ * global list while we iterate.
+ */
+ spin_lock_irqsave(&ioc->sas_device_lock, flags);
+ list_splice_init(&ioc->sas_device_init_list, &head);
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ /* SAS Device List */
+ list_for_each_entry_safe(sas_device, next, &head, list) {
if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
sas_device->sas_address_parent)) {
list_del(&sas_device->list);
--
1.8.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-04-10 7:14 [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization Calvin Owens
@ 2015-04-10 14:30 ` James Bottomley
2015-04-10 16:43 ` Sathya Prakash
0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2015-04-10 14:30 UTC (permalink / raw)
To: calvinowens@fb.com
Cc: linux-kernel@vger.kernel.org, MPT-FusionLinux.pdl@avagotech.com,
kernel-team@fb.com, stable@vger.kernel.org,
praveen.krishnamoorthy@avagotech.com,
abhijit.mahajan@avagotech.com,
nagalakshmi.nandigama@avagotech.com,
sreekanth.reddy@avagotech.com
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3602 bytes --]
On Fri, 2015-04-10 at 00:14 -0700, Calvin Owens wrote:
> While _scsih_probe_sas() is initializing devices, the hardware can trigger a
> MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
>
> The handler for TARG_NOT_RESPONDING calls _scsih_device_remove_by_handle(),
> which deletes the device in question from either ioc->sas_device_list or
> ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion when
> iterating over ioc->sas_device_init_list, this results in a use-after-free
> in _scsih_probe_sas(), and also corrupts the list:
>
> mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
> mpt2sas1: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
> list_del corruption, ffff88240012fa00->prev is LIST_POISON2 (dead000000200200)
> <snip>
> Workqueue: events work_for_cpu_fn
> ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
> ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
> ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
> Call Trace:
> [<ffffffff810c4f17>] ? print_modules+0xd7/0x120
> [<ffffffff8169b61e>] dump_stack+0x19/0x1b
> [<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
> [<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
> [<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
> [<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
> [<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
> [<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
> [<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
> [<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
> [<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
> [<ffffffff813444eb>] local_pci_probe+0x4b/0x80
> [<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81070012>] process_one_work+0x212/0x6e0
> [<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
> [<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
> [<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
> [<ffffffff81070ae2>] worker_thread+0x262/0x370
> [<ffffffff81070880>] ? rescuer_thread+0x360/0x360
> [<ffffffff81078f3b>] kthread+0xdb/0xe0
> [<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> [<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> ---[ end trace 41352a0bd2d0d61b ]---
>
> This either results in an immediate panic, or corrupts random memory and
> causes nasty problems later in the box's uptime.
>
> This patch splices the discovered devices out of the global list while
> holding the lock, since _scsih_probe_sas() always removes them from that
> global list anyway (either deleting them if initialization fails, or
> moving them onto ioc->sas_device_list if it succeeds). The interrupt that
> caused this bug will no longer cause the device to be removed during
> initialization, since it won't exist on the global lists, but
> _scsih_probe_sas() will remove it anyway when it fails to initialize.
Hopefully the avago team will curate this, but just in case they don't,
the correct list to make sure it gets the attention of storage people
should be
linux-scsi@vger.kernel.org
James
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-04-10 14:30 ` James Bottomley
@ 2015-04-10 16:43 ` Sathya Prakash
0 siblings, 0 replies; 7+ messages in thread
From: Sathya Prakash @ 2015-04-10 16:43 UTC (permalink / raw)
To: James Bottomley, calvinowens
Cc: linux-kernel, PDL-MPT-FUSIONLINUX, kernel-team, stable,
praveen.krishnamoorthy, abhijit.mahajan, nagalakshmi.nandigama,
Sreekanth Reddy
James & Calvin,
Noted this, we will review and ACK/revert back with further questions in
next couple of weeks.
Thanks
Sathya
-----Original Message-----
From: James Bottomley [mailto:jbottomley@odin.com]
Sent: Friday, April 10, 2015 9:31 AM
To: calvinowens@fb.com
Cc: linux-kernel@vger.kernel.org; MPT-FusionLinux.pdl@avagotech.com;
kernel-team@fb.com; stable@vger.kernel.org;
praveen.krishnamoorthy@avagotech.com; abhijit.mahajan@avagotech.com;
nagalakshmi.nandigama@avagotech.com; sreekanth.reddy@avagotech.com
Subject: Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during
initialization
On Fri, 2015-04-10 at 00:14 -0700, Calvin Owens wrote:
> While _scsih_probe_sas() is initializing devices, the hardware can
> trigger a MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING interrupt.
>
> The handler for TARG_NOT_RESPONDING calls
> _scsih_device_remove_by_handle(), which deletes the device in question
> from either ioc->sas_device_list or
> ioc->sas_device_init_list. Since _scsih_probe_sas() uses no exclusion
> ioc->when
> iterating over ioc->sas_device_init_list, this results in a
> use-after-free in _scsih_probe_sas(), and also corrupts the list:
>
> mpt2sas1: removing handle(0x0020), sas_addr(0x5f80f418573360e0)
> mpt2sas1: log_info(0x31111000): originator(PL), code(0x11),
> sub_code(0x1000)
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:56 __list_del_entry+0xc3/0xd0()
> list_del corruption, ffff88240012fa00->prev is LIST_POISON2
> (dead000000200200)
> <snip>
> Workqueue: events work_for_cpu_fn
> ffffffff810c4f17 ffff881214825b38 0000000000000009 ffff881214825ae8
> ffffffff8169b61e ffff881214825b28 ffffffff8104a990 0000000000000002
> ffff88240012f900 ffff88240012fa00 ffff881217595af8 0000000000000282
> Call Trace:
> [<ffffffff810c4f17>] ? print_modules+0xd7/0x120
> [<ffffffff8169b61e>] dump_stack+0x19/0x1b
> [<ffffffff8104a990>] warn_slowpath_common+0x70/0xa0
> [<ffffffff8104aa76>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff816a2234>] ? _raw_spin_lock_irqsave+0x84/0xa0
> [<ffffffffa0010e8e>] ? _scsih_probe_sas+0x8e/0x110 [mpt2sas]
> [<ffffffff8132a5a3>] __list_del_entry+0xc3/0xd0
> [<ffffffffa0010e99>] _scsih_probe_sas+0x99/0x110 [mpt2sas]
> [<ffffffffa0011d5f>] _scsih_scan_finished+0x19f/0x2c0 [mpt2sas]
> [<ffffffff81429d67>] do_scsi_scan_host+0x77/0xa0
> [<ffffffff81429f20>] scsi_scan_host+0x190/0x1c0
> [<ffffffffa0011402>] _scsih_probe+0x452/0x640 [mpt2sas]
> [<ffffffff813444eb>] local_pci_probe+0x4b/0x80
> [<ffffffff8106b848>] work_for_cpu_fn+0x18/0x30
> [<ffffffff81070012>] process_one_work+0x212/0x6e0
> [<ffffffff8106ffa6>] ? process_one_work+0x1a6/0x6e0
> [<ffffffff8108ed1f>] ? local_clock+0x4f/0x60
> [<ffffffff8107050c>] process_scheduled_works+0x2c/0x40
> [<ffffffff81070ae2>] worker_thread+0x262/0x370
> [<ffffffff81070880>] ? rescuer_thread+0x360/0x360
> [<ffffffff81078f3b>] kthread+0xdb/0xe0
> [<ffffffff810b5e8d>] ? trace_hardirqs_on+0xd/0x10
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> [<ffffffff816ac01c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81078e60>] ? kthread_create_on_node+0x140/0x140
> ---[ end trace 41352a0bd2d0d61b ]---
>
> This either results in an immediate panic, or corrupts random memory
> and causes nasty problems later in the box's uptime.
>
> This patch splices the discovered devices out of the global list while
> holding the lock, since _scsih_probe_sas() always removes them from
> that global list anyway (either deleting them if initialization fails,
> or moving them onto ioc->sas_device_list if it succeeds). The
> interrupt that caused this bug will no longer cause the device to be
> removed during initialization, since it won't exist on the global
> lists, but
> _scsih_probe_sas() will remove it anyway when it fails to initialize.
Hopefully the avago team will curate this, but just in case they don't, the
correct list to make sure it gets the attention of storage people should be
linux-scsi@vger.kernel.org
James
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
@ 2015-05-04 15:05 Sreekanth Reddy
2015-05-05 15:35 ` Tomas Henzl
2015-05-06 18:48 ` Calvin Owens
0 siblings, 2 replies; 7+ messages in thread
From: Sreekanth Reddy @ 2015-05-04 15:05 UTC (permalink / raw)
To: calvinowens
Cc: martin.petersen, linux-scsi, jejb, JBottomley, Sathya.Prakash,
chaitra.basappa, linux-kernel, hch, Sreekanth Reddy
I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
the drives which are actually attached to the HBA.
When I debug this issue then I see that in '_scsih_target_alloc'
driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
from this function without updating the required device addition information.
To solve the original problem (i.e memory corruption), here I have attached the patch,
in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
and I followed below algorithm.
1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.
2. And during the addition of this device to SCSI mid layer,
if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.
3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
a. driver will check whether addition of discovered devices to SML process is currently running or not,
i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this device removal event.
Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
---
drivers/scsi/mpt2sas/mpt2sas_base.h | 2 ++
drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
drivers/scsi/mpt3sas/mpt3sas_base.h | 2 ++
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
4 files changed, 71 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
index caff8d1..1aa10d2 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.h
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
@@ -376,6 +376,7 @@ struct _sas_device {
u8 phy;
u8 responding;
u8 pfa_led_on;
+ atomic_t is_on_sas_device_init_list;
};
/**
@@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
u8 broadcast_aen_busy;
u16 broadcast_aen_pending;
u8 shost_recovery;
+ u8 discovered_device_addition_on;
struct mutex reset_in_progress_mutex;
spinlock_t ioc_reset_in_progress_lock;
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 3f26147..2a61286 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
struct _sas_device *sas_device)
{
unsigned long flags;
+ struct _sas_device *same_sas_device;
if (!sas_device)
return;
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- list_del(&sas_device->list);
- kfree(sas_device);
+ same_sas_device = _scsih_sas_device_find_by_handle(ioc,
+ sas_device->handle);
+ if (same_sas_device) {
+ list_del(&same_sas_device->list);
+ if (atomic_read(&sas_device->is_on_sas_device_init_list))
+ atomic_set(&sas_device->is_on_sas_device_init_list, 0);
+ kfree(same_sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
@@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
"(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
sas_device->handle, (unsigned long long)sas_device->sas_address));
+ atomic_set(&sas_device->is_on_sas_device_init_list, 1);
spin_lock_irqsave(&ioc->sas_device_lock, flags);
list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
_scsih_determine_boot_device(ioc, sas_device, 0);
@@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- if (sas_device)
- list_del(&sas_device->list);
+ if (sas_device) {
+ if (ioc->discovered_device_addition_on &&
+ atomic_read(&sas_device->is_on_sas_device_init_list)) {
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ return;
+ } else
+ list_del(&sas_device->list);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
if (sas_device)
_scsih_remove_device(ioc, sas_device);
@@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
sas_address);
- if (sas_device)
- list_del(&sas_device->list);
+ if (sas_device) {
+ if (ioc->discovered_device_addition_on &&
+ atomic_read(&sas_device->is_on_sas_device_init_list)) {
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ return;
+ } else
+ list_del(&sas_device->list);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
if (sas_device)
_scsih_remove_device(ioc, sas_device);
@@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
struct _sas_device *sas_device, *next;
unsigned long flags;
+ ioc->discovered_device_addition_on = 1;
/* SAS Device List */
list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
list) {
if (ioc->hide_drives)
continue;
-
+
if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
sas_device->sas_address_parent)) {
- list_del(&sas_device->list);
- kfree(sas_device);
+ mpt2sas_transport_port_remove(ioc,
+ sas_device->sas_address,
+ sas_device->sas_address_parent);
+ _scsih_sas_device_remove(ioc, sas_device);
continue;
} else if (!sas_device->starget) {
if (!ioc->is_driver_loading) {
mpt2sas_transport_port_remove(ioc,
sas_device->sas_address,
sas_device->sas_address_parent);
- list_del(&sas_device->list);
- kfree(sas_device);
+ _scsih_sas_device_remove(ioc, sas_device);
continue;
}
}
spin_lock_irqsave(&ioc->sas_device_lock, flags);
list_move_tail(&sas_device->list, &ioc->sas_device_list);
+ atomic_dec(&sas_device->is_on_sas_device_init_list);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
+ ioc->discovered_device_addition_on = 0;
}
/**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index afa8816..6188490 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -315,6 +315,7 @@ struct _sas_device {
u8 responding;
u8 fast_path;
u8 pfa_led_on;
+ atomic_t is_on_sas_device_init_list;
};
/**
@@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
u8 broadcast_aen_busy;
u16 broadcast_aen_pending;
u8 shost_recovery;
+ u8 discovered_device_addition_on;
struct mutex reset_in_progress_mutex;
spinlock_t ioc_reset_in_progress_lock;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 5a97e32..53cc9ea 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
struct _sas_device *sas_device)
{
unsigned long flags;
+ struct _sas_device *same_sas_device;
if (!sas_device)
return;
spin_lock_irqsave(&ioc->sas_device_lock, flags);
- list_del(&sas_device->list);
- kfree(sas_device);
+ same_sas_device = _scsih_sas_device_find_by_handle(ioc,
+ sas_device->handle);
+ if (same_sas_device) {
+ list_del(&same_sas_device->list);
+ if (atomic_read(&sas_device->is_on_sas_device_init_list))
+ atomic_set(&sas_device->is_on_sas_device_init_list, 0);
+ kfree(same_sas_device);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
@@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
- if (sas_device)
- list_del(&sas_device->list);
+ if (sas_device) {
+ if (ioc->discovered_device_addition_on &&
+ atomic_read(&sas_device->is_on_sas_device_init_list)) {
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ return;
+ } else
+ list_del(&sas_device->list);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
if (sas_device)
_scsih_remove_device(ioc, sas_device);
@@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
spin_lock_irqsave(&ioc->sas_device_lock, flags);
sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
sas_address);
- if (sas_device)
- list_del(&sas_device->list);
+ if (sas_device) {
+ if (ioc->discovered_device_addition_on &&
+ atomic_read(&sas_device->is_on_sas_device_init_list)) {
+ spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
+ return;
+ } else
+ list_del(&sas_device->list);
+ }
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
if (sas_device)
_scsih_remove_device(ioc, sas_device);
@@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
ioc->name, __func__, sas_device->handle,
(unsigned long long)sas_device->sas_address));
+ atomic_set(&sas_device->is_on_sas_device_init_list, 1);
spin_lock_irqsave(&ioc->sas_device_lock, flags);
list_add_tail(&sas_device->list, &ioc->sas_device_list);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
@@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
struct _sas_device *sas_device, *next;
unsigned long flags;
+ ioc->discovered_device_addition_on = 1;
/* SAS Device List */
list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
list) {
if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
sas_device->sas_address_parent)) {
- list_del(&sas_device->list);
- kfree(sas_device);
+ mpt3sas_transport_port_remove(ioc,
+ sas_device->sas_address,
+ sas_device->sas_address_parent);
+ _scsih_sas_device_remove(ioc, sas_device);
continue;
} else if (!sas_device->starget) {
/*
@@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
mpt3sas_transport_port_remove(ioc,
sas_device->sas_address,
sas_device->sas_address_parent);
- list_del(&sas_device->list);
- kfree(sas_device);
+ _scsih_sas_device_remove(ioc, sas_device);
continue;
}
}
spin_lock_irqsave(&ioc->sas_device_lock, flags);
list_move_tail(&sas_device->list, &ioc->sas_device_list);
+ atomic_dec(&sas_device->is_on_sas_device_init_list);
spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
}
+ ioc->discovered_device_addition_on = 0;
}
/**
--
2.0.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-05-04 15:05 Sreekanth Reddy
@ 2015-05-05 15:35 ` Tomas Henzl
2015-05-12 9:38 ` Sreekanth Reddy
2015-05-06 18:48 ` Calvin Owens
1 sibling, 1 reply; 7+ messages in thread
From: Tomas Henzl @ 2015-05-05 15:35 UTC (permalink / raw)
To: Sreekanth Reddy, calvinowens
Cc: martin.petersen, linux-scsi, jejb, JBottomley, Sathya.Prakash,
chaitra.basappa, linux-kernel, hch
On 05/04/2015 05:05 PM, Sreekanth Reddy wrote:
> I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
> In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
> the drives which are actually attached to the HBA.
>
> When I debug this issue then I see that in '_scsih_target_alloc'
> driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
> based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
> since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
> from this function without updating the required device addition information.
>
> To solve the original problem (i.e memory corruption), here I have attached the patch,
> in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
> and I followed below algorithm.
>
> 1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.
>
> 2. And during the addition of this device to SCSI mid layer,
> if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
> if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.
>
> 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
> a. driver will check whether addition of discovered devices to SML process is currently running or not,
> i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
> if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
> ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
>
> 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this device removal event.
>
> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> ---
> drivers/scsi/mpt2sas/mpt2sas_base.h | 2 ++
> drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
> drivers/scsi/mpt3sas/mpt3sas_base.h | 2 ++
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
> 4 files changed, 71 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index caff8d1..1aa10d2 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -376,6 +376,7 @@ struct _sas_device {
> u8 phy;
> u8 responding;
> u8 pfa_led_on;
> + atomic_t is_on_sas_device_init_list;
Hi Sreekanth,
when is_on_sas_device_init_list is used it's protected
by ioc->sas_device_lock - why do you need a atomic_t ?
There is one exception, but easily fixable.
> };
>
> /**
> @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
> u8 broadcast_aen_busy;
> u16 broadcast_aen_pending;
> u8 shost_recovery;
> + u8 discovered_device_addition_on;
>
> struct mutex reset_in_progress_mutex;
> spinlock_t ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index 3f26147..2a61286 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
> struct _sas_device *sas_device)
> {
> unsigned long flags;
> + struct _sas_device *same_sas_device;
>
> if (!sas_device)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> + sas_device->handle);
Is it possible that when same_sas_device is not null, that the
value is not the same as for the sas_device ?
> + if (same_sas_device) {
> + list_del(&same_sas_device->list);
> + if (atomic_read(&sas_device->is_on_sas_device_init_list))
Seems easier to just set the variable without a test.
> + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> + kfree(same_sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
> "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
> sas_device->handle, (unsigned long long)sas_device->sas_address));
>
> + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
> _scsih_determine_boot_device(ioc, sas_device, 0);
> @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
> sas_address);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
> struct _sas_device *sas_device, *next;
> unsigned long flags;
>
> + ioc->discovered_device_addition_on = 1;
> /* SAS Device List */
> list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> list) {
>
> if (ioc->hide_drives)
> continue;
> -
> +
> if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
> sas_device->sas_address_parent)) {
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + mpt2sas_transport_port_remove(ioc,
> + sas_device->sas_address,
> + sas_device->sas_address_parent);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> } else if (!sas_device->starget) {
> if (!ioc->is_driver_loading) {
> mpt2sas_transport_port_remove(ioc,
> sas_device->sas_address,
> sas_device->sas_address_parent);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> }
> }
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_move_tail(&sas_device->list, &ioc->sas_device_list);
> + atomic_dec(&sas_device->is_on_sas_device_init_list);
Why not 'atomic_set(&sas_device->is_on_sas_device_init_list, 0);' ?
There is no place where you set the value of is_on_sas_device_init_list
higher than '1'.
Cheers,
Tomas
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> + ioc->discovered_device_addition_on = 0;
> }
>
> /**
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..6188490 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -315,6 +315,7 @@ struct _sas_device {
> u8 responding;
> u8 fast_path;
> u8 pfa_led_on;
> + atomic_t is_on_sas_device_init_list;
> };
>
> /**
> @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
> u8 broadcast_aen_busy;
> u16 broadcast_aen_pending;
> u8 shost_recovery;
> + u8 discovered_device_addition_on;
>
> struct mutex reset_in_progress_mutex;
> spinlock_t ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..53cc9ea 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> struct _sas_device *sas_device)
> {
> unsigned long flags;
> + struct _sas_device *same_sas_device;
>
> if (!sas_device)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> + sas_device->handle);
> + if (same_sas_device) {
> + list_del(&same_sas_device->list);
> + if (atomic_read(&sas_device->is_on_sas_device_init_list))
> + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> + kfree(same_sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> sas_address);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
> ioc->name, __func__, sas_device->handle,
> (unsigned long long)sas_device->sas_address));
>
> + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_add_tail(&sas_device->list, &ioc->sas_device_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> struct _sas_device *sas_device, *next;
> unsigned long flags;
>
> + ioc->discovered_device_addition_on = 1;
> /* SAS Device List */
> list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> list) {
>
> if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> sas_device->sas_address_parent)) {
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + mpt3sas_transport_port_remove(ioc,
> + sas_device->sas_address,
> + sas_device->sas_address_parent);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> } else if (!sas_device->starget) {
> /*
> @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> mpt3sas_transport_port_remove(ioc,
> sas_device->sas_address,
> sas_device->sas_address_parent);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> }
> }
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_move_tail(&sas_device->list, &ioc->sas_device_list);
> + atomic_dec(&sas_device->is_on_sas_device_init_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> + ioc->discovered_device_addition_on = 0;
> }
>
> /**
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-05-05 15:35 ` Tomas Henzl
@ 2015-05-12 9:38 ` Sreekanth Reddy
0 siblings, 0 replies; 7+ messages in thread
From: Sreekanth Reddy @ 2015-05-12 9:38 UTC (permalink / raw)
To: Tomas Henzl
Cc: calvinowens, Martin K. Petersen, linux-scsi@vger.kernel.org,
jejb@kernel.org, James E.J. Bottomley, Sathya Prakash,
Chaitra Basappa, linux-kernel@vger.kernel.org, Christoph Hellwig
HI Tomas & Calvin,
Thanks for reviewing this patch.
There is some problem with this patch, In this patch as the driver is
ignoring the device removal event (when ioc's
discovered_device_addition_on flag and device's
is_on_sas_device_init_list is one) so driver not freeing the
sas_device structure from the sas_device_init_list list.
Due to this when ever same device is hot plugged then the device
addition to the SML is not happing.
I have one more patch to fix the original issue and I will post it today.
Regards,
Sreekanth
On Tue, May 5, 2015 at 9:05 PM, Tomas Henzl <thenzl@redhat.com> wrote:
>
> On 05/04/2015 05:05 PM, Sreekanth Reddy wrote:
> > I have applied this patch on the latest upstream mpt3sas driver, then I have compiled and loaded the driver.
> > In the driver logs I didn't see any attached drives are added to the OS, 'fdisk -l' command also doesn't list
> > the drives which are actually attached to the HBA.
> >
> > When I debug this issue then I see that in '_scsih_target_alloc'
> > driver is searching for sas_device from the lists 'sas_device_init_list' & 'sas_device_list'
> > based on the device sas address using the function mpt3sas_scsih_sas_device_find_by_sas_address(),
> > since this device is not in the 'sas_device_init_list' (as it is moved it to head list) driver exit
> > from this function without updating the required device addition information.
> >
> > To solve the original problem (i.e memory corruption), here I have attached the patch,
> > in this patch I have added one atomic flag is_on_sas_device_init_list in _sas_device_structure
> > and I followed below algorithm.
> >
> > 1. when ever a device is added to sas_device_init_list then driver will set this atomic flag of this device to one.
> >
> > 2. And during the addition of this device to SCSI mid layer,
> > if the device is successfully added to the OS then driver will move this device list in to sas_device_list list from sas_device_init_list list and at this time driver will reset this flag to zero.
> > if device is failed to register with SCSI mid layer then also driver will reset this flag to zero in function _scsih_sas_device_remove and will remove the device entry from sas_device_init_list and will free the device structure.
> >
> > 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
> > a. driver will check whether addition of discovered devices to SML process is currently running or not,
> > i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
> > if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
> > ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
> >
> > 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this device removal event.
> >
> > Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> > ---
> > drivers/scsi/mpt2sas/mpt2sas_base.h | 2 ++
> > drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
> > drivers/scsi/mpt3sas/mpt3sas_base.h | 2 ++
> > drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
> > 4 files changed, 71 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> > index caff8d1..1aa10d2 100644
> > --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> > +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> > @@ -376,6 +376,7 @@ struct _sas_device {
> > u8 phy;
> > u8 responding;
> > u8 pfa_led_on;
> > + atomic_t is_on_sas_device_init_list;
>
> Hi Sreekanth,
> when is_on_sas_device_init_list is used it's protected
> by ioc->sas_device_lock - why do you need a atomic_t ?
> There is one exception, but easily fixable.
>
> > };
> >
> > /**
> > @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
> > u8 broadcast_aen_busy;
> > u16 broadcast_aen_pending;
> > u8 shost_recovery;
> > + u8 discovered_device_addition_on;
> >
> > struct mutex reset_in_progress_mutex;
> > spinlock_t ioc_reset_in_progress_lock;
> > diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > index 3f26147..2a61286 100644
> > --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> > @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
> > struct _sas_device *sas_device)
> > {
> > unsigned long flags;
> > + struct _sas_device *same_sas_device;
> >
> > if (!sas_device)
> > return;
> >
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> > + sas_device->handle);
>
> Is it possible that when same_sas_device is not null, that the
> value is not the same as for the sas_device ?
>
> > + if (same_sas_device) {
> > + list_del(&same_sas_device->list);
> > + if (atomic_read(&sas_device->is_on_sas_device_init_list))
>
> Seems easier to just set the variable without a test.
>
> > + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> > + kfree(same_sas_device);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > }
> >
> > @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
> > "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
> > sas_device->handle, (unsigned long long)sas_device->sas_address));
> >
> > + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
> > _scsih_determine_boot_device(ioc, sas_device, 0);
> > @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
> >
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > - if (sas_device)
> > - list_del(&sas_device->list);
> > + if (sas_device) {
> > + if (ioc->discovered_device_addition_on &&
> > + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > + return;
> > + } else
> > + list_del(&sas_device->list);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > if (sas_device)
> > _scsih_remove_device(ioc, sas_device);
> > @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
> > sas_address);
> > - if (sas_device)
> > - list_del(&sas_device->list);
> > + if (sas_device) {
> > + if (ioc->discovered_device_addition_on &&
> > + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > + return;
> > + } else
> > + list_del(&sas_device->list);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > if (sas_device)
> > _scsih_remove_device(ioc, sas_device);
> > @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
> > struct _sas_device *sas_device, *next;
> > unsigned long flags;
> >
> > + ioc->discovered_device_addition_on = 1;
> > /* SAS Device List */
> > list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> > list) {
> >
> > if (ioc->hide_drives)
> > continue;
> > -
> > +
> > if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
> > sas_device->sas_address_parent)) {
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + mpt2sas_transport_port_remove(ioc,
> > + sas_device->sas_address,
> > + sas_device->sas_address_parent);
> > + _scsih_sas_device_remove(ioc, sas_device);
> > continue;
> > } else if (!sas_device->starget) {
> > if (!ioc->is_driver_loading) {
> > mpt2sas_transport_port_remove(ioc,
> > sas_device->sas_address,
> > sas_device->sas_address_parent);
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + _scsih_sas_device_remove(ioc, sas_device);
> > continue;
> > }
> > }
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > list_move_tail(&sas_device->list, &ioc->sas_device_list);
> > + atomic_dec(&sas_device->is_on_sas_device_init_list);
>
> Why not 'atomic_set(&sas_device->is_on_sas_device_init_list, 0);' ?
> There is no place where you set the value of is_on_sas_device_init_list
> higher than '1'.
>
> Cheers,
> Tomas
>
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > }
> > + ioc->discovered_device_addition_on = 0;
> > }
> >
> > /**
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > index afa8816..6188490 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> > @@ -315,6 +315,7 @@ struct _sas_device {
> > u8 responding;
> > u8 fast_path;
> > u8 pfa_led_on;
> > + atomic_t is_on_sas_device_init_list;
> > };
> >
> > /**
> > @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
> > u8 broadcast_aen_busy;
> > u16 broadcast_aen_pending;
> > u8 shost_recovery;
> > + u8 discovered_device_addition_on;
> >
> > struct mutex reset_in_progress_mutex;
> > spinlock_t ioc_reset_in_progress_lock;
> > diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > index 5a97e32..53cc9ea 100644
> > --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> > @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> > struct _sas_device *sas_device)
> > {
> > unsigned long flags;
> > + struct _sas_device *same_sas_device;
> >
> > if (!sas_device)
> > return;
> >
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> > + sas_device->handle);
> > + if (same_sas_device) {
> > + list_del(&same_sas_device->list);
> > + if (atomic_read(&sas_device->is_on_sas_device_init_list))
> > + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> > + kfree(same_sas_device);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > }
> >
> > @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
> >
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > - if (sas_device)
> > - list_del(&sas_device->list);
> > + if (sas_device) {
> > + if (ioc->discovered_device_addition_on &&
> > + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > + return;
> > + } else
> > + list_del(&sas_device->list);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > if (sas_device)
> > _scsih_remove_device(ioc, sas_device);
> > @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> > sas_address);
> > - if (sas_device)
> > - list_del(&sas_device->list);
> > + if (sas_device) {
> > + if (ioc->discovered_device_addition_on &&
> > + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> > + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > + return;
> > + } else
> > + list_del(&sas_device->list);
> > + }
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > if (sas_device)
> > _scsih_remove_device(ioc, sas_device);
> > @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
> > ioc->name, __func__, sas_device->handle,
> > (unsigned long long)sas_device->sas_address));
> >
> > + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > list_add_tail(&sas_device->list, &ioc->sas_device_list);
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> > struct _sas_device *sas_device, *next;
> > unsigned long flags;
> >
> > + ioc->discovered_device_addition_on = 1;
> > /* SAS Device List */
> > list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> > list) {
> >
> > if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> > sas_device->sas_address_parent)) {
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + mpt3sas_transport_port_remove(ioc,
> > + sas_device->sas_address,
> > + sas_device->sas_address_parent);
> > + _scsih_sas_device_remove(ioc, sas_device);
> > continue;
> > } else if (!sas_device->starget) {
> > /*
> > @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> > mpt3sas_transport_port_remove(ioc,
> > sas_device->sas_address,
> > sas_device->sas_address_parent);
> > - list_del(&sas_device->list);
> > - kfree(sas_device);
> > + _scsih_sas_device_remove(ioc, sas_device);
> > continue;
> > }
> > }
> >
> > spin_lock_irqsave(&ioc->sas_device_lock, flags);
> > list_move_tail(&sas_device->list, &ioc->sas_device_list);
> > + atomic_dec(&sas_device->is_on_sas_device_init_list);
> > spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> > }
> > + ioc->discovered_device_addition_on = 0;
> > }
> >
> > /**
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization
2015-05-04 15:05 Sreekanth Reddy
2015-05-05 15:35 ` Tomas Henzl
@ 2015-05-06 18:48 ` Calvin Owens
1 sibling, 0 replies; 7+ messages in thread
From: Calvin Owens @ 2015-05-06 18:48 UTC (permalink / raw)
To: Sreekanth Reddy
Cc: martin.petersen, linux-scsi, jejb, JBottomley, Sathya.Prakash,
chaitra.basappa, linux-kernel, hch, calvinowens
On Monday 05/04 at 20:35 +0530, Sreekanth Reddy wrote:
> I have applied this patch on the latest upstream mpt3sas driver, then
> I have compiled and loaded the driver. In the driver logs I didn't
> see any attached drives are added to the OS, 'fdisk -l' command also
> doesn't list the drives which are actually attached to the HBA.
>
> When I debug this issue then I see that in '_scsih_target_alloc'
> driver is searching for sas_device from the lists
> 'sas_device_init_list' & 'sas_device_list' based on the device sas
> address using the function
> mpt3sas_scsih_sas_device_find_by_sas_address(), since this device is
> not in the 'sas_device_init_list' (as it is moved it to head list)
> driver exit from this function without updating the required device
> addition information.
Yes, I misunderstood that the initialization depended on the devices
still being on the init_list.
What's interesting about this is that when I tested it, it still worked.
I think the MPT2SAS_PORT_ENABLE_COMPLETE fw_event might zero
ioc->start_scan and allow scsih_scan_finished() to start probing devices
before all the devices are actually on the init_list. It seems to be
very repeatable per-machine whether or not it works.
But in any case, my patch was wrong.
> To solve the original problem (i.e memory corruption), here I have
> attached the patch, in this patch I have added one atomic flag
> is_on_sas_device_init_list in _sas_device_structure and I followed
> below algorithm.
The problem is that this only solves a single case. There isn't anything
to enforce that this or a similar chain of events can't happen elsewhere
in the code.
I think the best general solution would be to add a refcount to these
objects. They sit on a list that can be concurrently accessed from
multiple threads, so I think a refcount is the best way to ensure that
objects aren't freed out from under other users.
I'm working on a patchset that does this. I'm starting by adding a
refcount to the sas_device object only, and refactoring the code in
mpt2sas_scsih.c to use it. I should be able to send up a first version
of that pretty soon to get some feedback.
Thanks,
Calvin
> 1. when ever a device is added to sas_device_init_list then driver
> will set this atomic flag of this device to one.
>
> 2. And during the addition of this device to SCSI mid layer, if the
> device is successfully added to the OS then driver will move this
> device list in to sas_device_list list from sas_device_init_list list
> and at this time driver will reset this flag to zero. if device is
> failed to register with SCSI mid layer then also driver will reset
> this flag to zero in function _scsih_sas_device_remove and will remove
> the device entry from sas_device_init_list and will free the device
> structure.
>
> 3. Now when a device is removed then driver will receive target not responding event and in the function _scsih_device_remove_by_handle,
> a. driver will check whether addition of discovered devices to SML process is currently running or not,
> i. if addition (or registration) of discovered devices to SML process is running then driver will check whether device is in sas_device_init_list or not (by reading the atomic flag)?.
> if it is in a sas_device_init_list then driver will ignore this device removal event (since device registration with SML will fail and it is removed in function _scsih_sas_device_remove as mentioned in step 2).
> ii. if the device is not in a sas_device_init_list or addition (or registration) of discovered devices to SML process is already completed then device structure is removed from this function and this device entry is removed from sas_device_list.
>
> 4. if the device removal event is received after device structure is freed due to failure of device registration with SML them in the function _scsih_device_remove_by_handle driver won't find this device in the sas_device_list or in a sas_device_init_list and so driver will ignore this device removal event.
>
> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@avagotech.com>
> ---
> drivers/scsi/mpt2sas/mpt2sas_base.h | 2 ++
> drivers/scsi/mpt2sas/mpt2sas_scsih.c | 45 +++++++++++++++++++++++++++---------
> drivers/scsi/mpt3sas/mpt3sas_base.h | 2 ++
> drivers/scsi/mpt3sas/mpt3sas_scsih.c | 43 ++++++++++++++++++++++++++--------
> 4 files changed, 71 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.h b/drivers/scsi/mpt2sas/mpt2sas_base.h
> index caff8d1..1aa10d2 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_base.h
> +++ b/drivers/scsi/mpt2sas/mpt2sas_base.h
> @@ -376,6 +376,7 @@ struct _sas_device {
> u8 phy;
> u8 responding;
> u8 pfa_led_on;
> + atomic_t is_on_sas_device_init_list;
> };
>
> /**
> @@ -833,6 +834,7 @@ struct MPT2SAS_ADAPTER {
> u8 broadcast_aen_busy;
> u16 broadcast_aen_pending;
> u8 shost_recovery;
> + u8 discovered_device_addition_on;
>
> struct mutex reset_in_progress_mutex;
> spinlock_t ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> index 3f26147..2a61286 100644
> --- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> +++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
> @@ -590,13 +590,20 @@ _scsih_sas_device_remove(struct MPT2SAS_ADAPTER *ioc,
> struct _sas_device *sas_device)
> {
> unsigned long flags;
> + struct _sas_device *same_sas_device;
>
> if (!sas_device)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> + sas_device->handle);
> + if (same_sas_device) {
> + list_del(&same_sas_device->list);
> + if (atomic_read(&sas_device->is_on_sas_device_init_list))
> + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> + kfree(same_sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -658,6 +664,7 @@ _scsih_sas_device_init_add(struct MPT2SAS_ADAPTER *ioc,
> "(0x%04x), sas_addr(0x%016llx)\n", ioc->name, __func__,
> sas_device->handle, (unsigned long long)sas_device->sas_address));
>
> + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_add_tail(&sas_device->list, &ioc->sas_device_init_list);
> _scsih_determine_boot_device(ioc, sas_device, 0);
> @@ -5364,8 +5371,14 @@ _scsih_device_remove_by_handle(struct MPT2SAS_ADAPTER *ioc, u16 handle)
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -5391,8 +5404,14 @@ mpt2sas_device_remove_by_sas_address(struct MPT2SAS_ADAPTER *ioc,
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = mpt2sas_scsih_sas_device_find_by_sas_address(ioc,
> sas_address);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -7978,32 +7997,36 @@ _scsih_probe_sas(struct MPT2SAS_ADAPTER *ioc)
> struct _sas_device *sas_device, *next;
> unsigned long flags;
>
> + ioc->discovered_device_addition_on = 1;
> /* SAS Device List */
> list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> list) {
>
> if (ioc->hide_drives)
> continue;
> -
> +
> if (!mpt2sas_transport_port_add(ioc, sas_device->handle,
> sas_device->sas_address_parent)) {
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + mpt2sas_transport_port_remove(ioc,
> + sas_device->sas_address,
> + sas_device->sas_address_parent);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> } else if (!sas_device->starget) {
> if (!ioc->is_driver_loading) {
> mpt2sas_transport_port_remove(ioc,
> sas_device->sas_address,
> sas_device->sas_address_parent);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> }
> }
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_move_tail(&sas_device->list, &ioc->sas_device_list);
> + atomic_dec(&sas_device->is_on_sas_device_init_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> + ioc->discovered_device_addition_on = 0;
> }
>
> /**
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index afa8816..6188490 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -315,6 +315,7 @@ struct _sas_device {
> u8 responding;
> u8 fast_path;
> u8 pfa_led_on;
> + atomic_t is_on_sas_device_init_list;
> };
>
> /**
> @@ -766,6 +767,7 @@ struct MPT3SAS_ADAPTER {
> u8 broadcast_aen_busy;
> u16 broadcast_aen_pending;
> u8 shost_recovery;
> + u8 discovered_device_addition_on;
>
> struct mutex reset_in_progress_mutex;
> spinlock_t ioc_reset_in_progress_lock;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 5a97e32..53cc9ea 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -582,13 +582,20 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
> struct _sas_device *sas_device)
> {
> unsigned long flags;
> + struct _sas_device *same_sas_device;
>
> if (!sas_device)
> return;
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + same_sas_device = _scsih_sas_device_find_by_handle(ioc,
> + sas_device->handle);
> + if (same_sas_device) {
> + list_del(&same_sas_device->list);
> + if (atomic_read(&sas_device->is_on_sas_device_init_list))
> + atomic_set(&sas_device->is_on_sas_device_init_list, 0);
> + kfree(same_sas_device);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
>
> @@ -610,8 +616,14 @@ _scsih_device_remove_by_handle(struct MPT3SAS_ADAPTER *ioc, u16 handle)
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -637,8 +649,14 @@ mpt3sas_device_remove_by_sas_address(struct MPT3SAS_ADAPTER *ioc,
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> sas_device = mpt3sas_scsih_sas_device_find_by_sas_address(ioc,
> sas_address);
> - if (sas_device)
> - list_del(&sas_device->list);
> + if (sas_device) {
> + if (ioc->discovered_device_addition_on &&
> + atomic_read(&sas_device->is_on_sas_device_init_list)) {
> + spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> + return;
> + } else
> + list_del(&sas_device->list);
> + }
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> if (sas_device)
> _scsih_remove_device(ioc, sas_device);
> @@ -663,6 +681,7 @@ _scsih_sas_device_add(struct MPT3SAS_ADAPTER *ioc,
> ioc->name, __func__, sas_device->handle,
> (unsigned long long)sas_device->sas_address));
>
> + atomic_set(&sas_device->is_on_sas_device_init_list, 1);
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_add_tail(&sas_device->list, &ioc->sas_device_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> @@ -7610,14 +7629,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> struct _sas_device *sas_device, *next;
> unsigned long flags;
>
> + ioc->discovered_device_addition_on = 1;
> /* SAS Device List */
> list_for_each_entry_safe(sas_device, next, &ioc->sas_device_init_list,
> list) {
>
> if (!mpt3sas_transport_port_add(ioc, sas_device->handle,
> sas_device->sas_address_parent)) {
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + mpt3sas_transport_port_remove(ioc,
> + sas_device->sas_address,
> + sas_device->sas_address_parent);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> } else if (!sas_device->starget) {
> /*
> @@ -7630,16 +7652,17 @@ _scsih_probe_sas(struct MPT3SAS_ADAPTER *ioc)
> mpt3sas_transport_port_remove(ioc,
> sas_device->sas_address,
> sas_device->sas_address_parent);
> - list_del(&sas_device->list);
> - kfree(sas_device);
> + _scsih_sas_device_remove(ioc, sas_device);
> continue;
> }
> }
>
> spin_lock_irqsave(&ioc->sas_device_lock, flags);
> list_move_tail(&sas_device->list, &ioc->sas_device_list);
> + atomic_dec(&sas_device->is_on_sas_device_init_list);
> spin_unlock_irqrestore(&ioc->sas_device_lock, flags);
> }
> + ioc->discovered_device_addition_on = 0;
> }
>
> /**
> --
> 2.0.2
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-12 9:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-10 7:14 [PATCH] mpt2sas: mpt3sas: Fix memory corruption during initialization Calvin Owens
2015-04-10 14:30 ` James Bottomley
2015-04-10 16:43 ` Sathya Prakash
-- strict thread matches above, loose matches on Subject: below --
2015-05-04 15:05 Sreekanth Reddy
2015-05-05 15:35 ` Tomas Henzl
2015-05-12 9:38 ` Sreekanth Reddy
2015-05-06 18:48 ` Calvin Owens
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox