linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
@ 2015-09-30  0:35 Junichi Nomura
  2015-09-30  9:28 ` Hannes Reinecke
  2015-09-30 15:18 ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Junichi Nomura @ 2015-09-30  0:35 UTC (permalink / raw)
  To: linux-scsi, Christoph Hellwig, Hannes Reinecke

With v4.3-rc3, stress testing of SCSI device addition/removal quickly
trigger random crash in memory allocator (e.g. __kmalloc).  I found that
a commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
moved the call of scsi_dh->detach() to very early part of sdev tear down
process (scsi_remove_device()). As a result, related data structure such
as alua_dh_data can be freed while rtpg/stpg are still in-flight.

Here is an example of crash log.  You can see strange rtpg log
'port group 25a43100' for 514:0:0:0 after 'Detached':

>> [  170.518427] sd 517:0:0:0: [sdd] Attached SCSI disk
>> [  170.528681] sd 514:0:0:0: alua: Detached
>> [  170.528720] sd 517:0:0:0: alua: Detached
>> [  170.528827] sd 515:0:0:0: alua: Detached
>> [  170.528922] sd 514:0:0:0: alua: port group 25a43100 state A non-preferred supports tolusna
>> [  170.528924] device-mapper: multipath: Failing path 8:16.
>> [  170.529059] device-mapper: table: 253:0: multipath: error getting device
>> [  170.529060] device-mapper: ioctl: error adding target to table
>> [  170.529212] device-mapper: multipath: Failing path 8:0.
>> [  170.540498] sd 516:0:0:0: alua: Detached
>> [  170.540543] sd 515:0:0:0: [sdb] Synchronizing SCSI cache
>> [  170.541849] sd 517:0:0:0: [sdd] Synchronizing SCSI cache
>> [  170.542393] sd 514:0:0:0: [sda] Synchronizing SCSI cache
>> [  170.546836] BUG: unable to handle kernel paging request at 0000001028000000
>> [  170.548499] IP: [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>> [  170.550069] sd 516:0:0:0: [sdc] Synchronizing SCSI cache
>> [  170.551540] PGD 0 
>> [  170.552686] Oops: 0000 [#1] SMP 
>> [  170.553946] Modules linked in: dm_queue_length sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nls_utf8 isofs cirrus ttm drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect crc32_pclmul sys imgblt ghash_clmulni_intel fb_sys_fops aesni_intel drm lrw gf128mul virtio_console glue_helper ablk_helper ppdev cryptd parport_pc i2c_piix4 serio_raw pcspkr parport i2c_core virtio_balloon acpi_cpufreq xfs l ibcrc32c sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net crc32c_intel floppy ata_piix virtio_pci libata virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod
>> [  170.564500] CPU: 2 PID: 2255 Comm: systemd-udevd Not tainted 4.3.0-rc3 #1
>> [  170.566078] Hardware name: 
>> [  170.567655] task: ffff88002c729580 ti: ffff8802212e8000 task.ti: ffff8802212e8000
>> [  170.569320] RIP: 0010:[<ffffffff811dce0a>]  [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>> [  170.571208] RSP: 0018:ffff8802212ebe40  EFLAGS: 00010282
>> [  170.572715] RAX: 0000000000000000 RBX: 00000000000080d0 RCX:
>> 0000000000017622
>> [  170.574461] RDX: 0000000000017621 RSI: 00000000000080d0 RDI:
>> ffff880237001800
>> [  170.576148] RBP: ffff8802212ebe78 R08: 000000000001a160 R09:
>> ffff8800afda5460
>> [  170.577902] R10: ffffffff81206740 R11: 0000000000000202 R12:
>> 00000000000080d0
>> [  170.579530] R13: ffff880237001800 R14: 0000001028000000 R15:
>> ffff880237001800
>> [  170.581276] FS:  00007fd7236ae880(0000) GS:ffff88023fd00000(0000)
>> knlGS:0000000000000000
>> [  170.583110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  170.584596] CR2: 0000001028000000 CR3: 0000000225199000 CR4:
>> 00000000001406e0
>> [  170.586320] Stack:
>> [  170.587495]  ffffffff81206740 0000000000000088 ffff8800bb1cec60
>> ffff8802212ebf28
>> [  170.589280]  0000000000000800 ffff8802212ebf20 0000558c20687b50
>> ffff8802212ebe90
>> [  170.591131]  ffffffff81206740 ffff8800bb1cec60 ffff8802212ebed8
>> ffffffff81206c56
>> [  170.592929] Call Trace:
>> [  170.594228]  [<ffffffff81206740>] ? alloc_pipe_info+0x20/0xb0
>> [  170.595895]  [<ffffffff81206740>] alloc_pipe_info+0x20/0xb0
>> [  170.597637]  [<ffffffff81206c56>] create_pipe_files+0x56/0x230
>> [  170.599337]  [<ffffffff8112408f>] ? __audit_syscall_entry+0xaf/0x100
>> [  170.601071]  [<ffffffff81206e64>] __do_pipe_flags+0x34/0xe0
>> [  170.602688]  [<ffffffff81206faf>] SyS_pipe2+0x2f/0xb0
>> [  170.604215]  [<ffffffff816859ae>] entry_SYSCALL_64_fastpath+0x12/0x71
>> [  170.605863] Code: 4c 03 05 2a d3 e2 7e 4d 8b 30 49 8b 40 10 4d 85 f6 0f 84 22 01 00 00 48 85 c0 0f 84 19 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 
>> [  170.610895] RIP  [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>> [  170.612728]  RSP <ffff8802212ebe40>
>> [  170.614244] CR2: 0000001028000000
>> [  170.615745] ---[ end trace 02faa1e1a864f6bd ]---
>> [  170.617351] Kernel panic - not syncing: Fatal exception
>> [  170.619462] Kernel Offset: disabled
>> [  170.620946] ---[ end Kernel panic - not syncing: Fatal exception


Attached below is a sample reproducer for ALUA capable storage.

--- cut here --
#!/bin/bash

# Set appropriate values for your storage
hostno=2
host=host${hostno}
dev=${hostno}:0:0:0

service multipathd stop
multipath -F

n=1
while true; do
	echo "== loop $n ==" > /dev/kmsg

	echo "-- scan $host --" > /dev/kmsg
	echo '- - -' > /sys/class/scsi_host/${host}/scan

	echo "-- activate $dev --" > /dev/kmsg
	echo activate > /sys/class/scsi_device/${dev}/device/dh_state &

	echo "-- delete $dev --" > /dev/kmsg
	echo 1 > /sys/class/scsi_device/${dev}/device/delete

	n=$((n + 1))
done
--- cut here --

# lsscsi
[0:0:0:0]    cd/dvd  QEMU     QEMU DVD-ROM     1.5.  /dev/sr0 
[2:0:0:0]    disk    LIO-ORG  VirtioDisk       4.0   /dev/sda 
# ./a.sh

<system crash>

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-09-30  0:35 [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device Junichi Nomura
@ 2015-09-30  9:28 ` Hannes Reinecke
  2015-09-30 10:35   ` Boaz Harrosh
  2015-09-30 15:18 ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2015-09-30  9:28 UTC (permalink / raw)
  To: Junichi Nomura, linux-scsi, Christoph Hellwig

On 09/30/2015 02:35 AM, Junichi Nomura wrote:
> With v4.3-rc3, stress testing of SCSI device addition/removal quickly
> trigger random crash in memory allocator (e.g. __kmalloc).  I found that
> a commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
> moved the call of scsi_dh->detach() to very early part of sdev tear down
> process (scsi_remove_device()). As a result, related data structure such
> as alua_dh_data can be freed while rtpg/stpg are still in-flight.
> 
> Here is an example of crash log.  You can see strange rtpg log
> 'port group 25a43100' for 514:0:0:0 after 'Detached':
> 
>>> [  170.518427] sd 517:0:0:0: [sdd] Attached SCSI disk
>>> [  170.528681] sd 514:0:0:0: alua: Detached
>>> [  170.528720] sd 517:0:0:0: alua: Detached
>>> [  170.528827] sd 515:0:0:0: alua: Detached
>>> [  170.528922] sd 514:0:0:0: alua: port group 25a43100 state A non-preferred supports tolusna
>>> [  170.528924] device-mapper: multipath: Failing path 8:16.
>>> [  170.529059] device-mapper: table: 253:0: multipath: error getting device
>>> [  170.529060] device-mapper: ioctl: error adding target to table
>>> [  170.529212] device-mapper: multipath: Failing path 8:0.
>>> [  170.540498] sd 516:0:0:0: alua: Detached
>>> [  170.540543] sd 515:0:0:0: [sdb] Synchronizing SCSI cache
>>> [  170.541849] sd 517:0:0:0: [sdd] Synchronizing SCSI cache
>>> [  170.542393] sd 514:0:0:0: [sda] Synchronizing SCSI cache
>>> [  170.546836] BUG: unable to handle kernel paging request at 0000001028000000
>>> [  170.548499] IP: [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>>> [  170.550069] sd 516:0:0:0: [sdc] Synchronizing SCSI cache
>>> [  170.551540] PGD 0 
>>> [  170.552686] Oops: 0000 [#1] SMP 
>>> [  170.553946] Modules linked in: dm_queue_length sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nls_utf8 isofs cirrus ttm drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect crc32_pclmul sys imgblt ghash_clmulni_intel fb_sys_fops aesni_intel drm lrw gf128mul virtio_console glue_helper ablk_helper ppdev cryptd parport_pc i2c_piix4 serio_raw pcspkr parport i2c_core virtio_balloon acpi_cpufreq xfs l ibcrc32c sr_mod cdrom ata_generic pata_acpi virtio_blk virtio_net crc32c_intel floppy ata_piix virtio_pci libata virtio_ring virtio sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod
>>> [  170.564500] CPU: 2 PID: 2255 Comm: systemd-udevd Not tainted 4.3.0-rc3 #1
>>> [  170.566078] Hardware name: 
>>> [  170.567655] task: ffff88002c729580 ti: ffff8802212e8000 task.ti: ffff8802212e8000
>>> [  170.569320] RIP: 0010:[<ffffffff811dce0a>]  [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>>> [  170.571208] RSP: 0018:ffff8802212ebe40  EFLAGS: 00010282
>>> [  170.572715] RAX: 0000000000000000 RBX: 00000000000080d0 RCX:
>>> 0000000000017622
>>> [  170.574461] RDX: 0000000000017621 RSI: 00000000000080d0 RDI:
>>> ffff880237001800
>>> [  170.576148] RBP: ffff8802212ebe78 R08: 000000000001a160 R09:
>>> ffff8800afda5460
>>> [  170.577902] R10: ffffffff81206740 R11: 0000000000000202 R12:
>>> 00000000000080d0
>>> [  170.579530] R13: ffff880237001800 R14: 0000001028000000 R15:
>>> ffff880237001800
>>> [  170.581276] FS:  00007fd7236ae880(0000) GS:ffff88023fd00000(0000)
>>> knlGS:0000000000000000
>>> [  170.583110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  170.584596] CR2: 0000001028000000 CR3: 0000000225199000 CR4:
>>> 00000000001406e0
>>> [  170.586320] Stack:
>>> [  170.587495]  ffffffff81206740 0000000000000088 ffff8800bb1cec60
>>> ffff8802212ebf28
>>> [  170.589280]  0000000000000800 ffff8802212ebf20 0000558c20687b50
>>> ffff8802212ebe90
>>> [  170.591131]  ffffffff81206740 ffff8800bb1cec60 ffff8802212ebed8
>>> ffffffff81206c56
>>> [  170.592929] Call Trace:
>>> [  170.594228]  [<ffffffff81206740>] ? alloc_pipe_info+0x20/0xb0
>>> [  170.595895]  [<ffffffff81206740>] alloc_pipe_info+0x20/0xb0
>>> [  170.597637]  [<ffffffff81206c56>] create_pipe_files+0x56/0x230
>>> [  170.599337]  [<ffffffff8112408f>] ? __audit_syscall_entry+0xaf/0x100
>>> [  170.601071]  [<ffffffff81206e64>] __do_pipe_flags+0x34/0xe0
>>> [  170.602688]  [<ffffffff81206faf>] SyS_pipe2+0x2f/0xb0
>>> [  170.604215]  [<ffffffff816859ae>] entry_SYSCALL_64_fastpath+0x12/0x71
>>> [  170.605863] Code: 4c 03 05 2a d3 e2 7e 4d 8b 30 49 8b 40 10 4d 85 f6 0f 84 22 01 00 00 48 85 c0 0f 84 19 01 00 00 49 63 47 20 48 8d 4a 01 4d 8b 07 <49> 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 
>>> [  170.610895] RIP  [<ffffffff811dce0a>] kmem_cache_alloc_trace+0x7a/0x1f0
>>> [  170.612728]  RSP <ffff8802212ebe40>
>>> [  170.614244] CR2: 0000001028000000
>>> [  170.615745] ---[ end trace 02faa1e1a864f6bd ]---
>>> [  170.617351] Kernel panic - not syncing: Fatal exception
>>> [  170.619462] Kernel Offset: disabled
>>> [  170.620946] ---[ end Kernel panic - not syncing: Fatal exception
> 
> 
> Attached below is a sample reproducer for ALUA capable storage.
> 
> --- cut here --
> #!/bin/bash
> 
> # Set appropriate values for your storage
> hostno=2
> host=host${hostno}
> dev=${hostno}:0:0:0
> 
> service multipathd stop
> multipath -F
> 
> n=1
> while true; do
> 	echo "== loop $n ==" > /dev/kmsg
> 
> 	echo "-- scan $host --" > /dev/kmsg
> 	echo '- - -' > /sys/class/scsi_host/${host}/scan
> 
> 	echo "-- activate $dev --" > /dev/kmsg
> 	echo activate > /sys/class/scsi_device/${dev}/device/dh_state &
> 
Pushing things into the background is typically not the best of
ideas; actually I've been running into issues with udev not being
complete by the time the next round is started. So more often than
not I would be greeted with messages:

'write: no such file or directory'

when executing this line. Removing the '&' at the end made this
warning go away.

And actually I'm not sure if the above script is a valid testcase;
from what I've seen there is no locking / reference counting when
accessing sysfs attributes. So as soon as you _can_ access the sysfs
attribute it is implicitly assumed to be valid.
In your case you will be _removing_ the sysfs attribute even though
it is still accessed, which of course will crash.

Can you still reproduce this problem after removing the '&' in that
line?

> 	echo "-- delete $dev --" > /dev/kmsg
> 	echo 1 > /sys/class/scsi_device/${dev}/device/delete
> 
> 	n=$((n + 1))
> done
> --- cut here --

Having said that I've retried your test script with my ALUA handler
update, and didn't find any issues there.
It happily completed about 500 rounds at which point I got bored.
Of course, this is after removing the '&' in the said line.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-09-30  9:28 ` Hannes Reinecke
@ 2015-09-30 10:35   ` Boaz Harrosh
  2015-09-30 14:49     ` Hannes Reinecke
  0 siblings, 1 reply; 12+ messages in thread
From: Boaz Harrosh @ 2015-09-30 10:35 UTC (permalink / raw)
  To: Hannes Reinecke, Junichi Nomura, linux-scsi, Christoph Hellwig

On 09/30/2015 12:28 PM, Hannes Reinecke wrote:
<>
> Pushing things into the background is typically not the best of
> ideas; actually I've been running into issues with udev not being
> complete by the time the next round is started. So more often than
> not I would be greeted with messages:
> 
> 'write: no such file or directory'
> 
> when executing this line. Removing the '&' at the end made this
> warning go away.
> 
> And actually I'm not sure if the above script is a valid testcase;

So are you saying it is allowed to crash the Kernel with a crappy
script?

> from what I've seen there is no locking / reference counting when
> accessing sysfs attributes. So as soon as you _can_ access the sysfs
> attribute it is implicitly assumed to be valid.
> In your case you will be _removing_ the sysfs attribute even though
> it is still accessed, which of course will crash.
> 

Is that allowed? for usermode script to race and crash the Kernel?

>From the original email it sounds like this used to be fine and it
now crashes (with the &)

Thanks
Boaz

> Can you still reproduce this problem after removing the '&' in that
> line?
> 
>> 	echo "-- delete $dev --" > /dev/kmsg
>> 	echo 1 > /sys/class/scsi_device/${dev}/device/delete
>>
>> 	n=$((n + 1))
>> done
>> --- cut here --
> 
> Having said that I've retried your test script with my ALUA handler
> update, and didn't find any issues there.
> It happily completed about 500 rounds at which point I got bored.
> Of course, this is after removing the '&' in the said line.
> 
> Cheers,
> 
> Hannes
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-09-30 10:35   ` Boaz Harrosh
@ 2015-09-30 14:49     ` Hannes Reinecke
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Reinecke @ 2015-09-30 14:49 UTC (permalink / raw)
  To: Boaz Harrosh, Junichi Nomura, linux-scsi, Christoph Hellwig

On 09/30/2015 12:35 PM, Boaz Harrosh wrote:
> On 09/30/2015 12:28 PM, Hannes Reinecke wrote:
> <>
>> Pushing things into the background is typically not the best of
>> ideas; actually I've been running into issues with udev not being
>> complete by the time the next round is started. So more often than
>> not I would be greeted with messages:
>>
>> 'write: no such file or directory'
>>
>> when executing this line. Removing the '&' at the end made this
>> warning go away.
>>
>> And actually I'm not sure if the above script is a valid testcase;
> 
> So are you saying it is allowed to crash the Kernel with a crappy
> script?
> 
I'm just saying there might be race conditions lurking in the sysfs
code which just now came to light, without the patch being the
actual culprit.

>> from what I've seen there is no locking / reference counting when
>> accessing sysfs attributes. So as soon as you _can_ access the sysfs
>> attribute it is implicitly assumed to be valid.
>> In your case you will be _removing_ the sysfs attribute even though
>> it is still accessed, which of course will crash.
>>
> 
> Is that allowed? for usermode script to race and crash the Kernel?
> 
> From the original email it sounds like this used to be fine and it
> now crashes (with the &)
> 
Yeah, it is not meant to be as an excuse. Just an observation.
I still would like to see the results with my ALUA handler update;
there's a fair chance the issue is solved with that.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		               zSeries & Storage
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-09-30  0:35 [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device Junichi Nomura
  2015-09-30  9:28 ` Hannes Reinecke
@ 2015-09-30 15:18 ` Christoph Hellwig
  2015-10-01  0:56   ` Junichi Nomura
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2015-09-30 15:18 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: linux-scsi, Christoph Hellwig, Hannes Reinecke

On Wed, Sep 30, 2015 at 12:35:50AM +0000, Junichi Nomura wrote:
> With v4.3-rc3, stress testing of SCSI device addition/removal quickly
> trigger random crash in memory allocator (e.g. __kmalloc).  I found that
> a commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
> moved the call of scsi_dh->detach() to very early part of sdev tear down
> process (scsi_remove_device()). As a result, related data structure such
> as alua_dh_data can be freed while rtpg/stpg are still in-flight.

Hi Junichi,

the code should have been called from that early in the process before,
as it was called from the bus notifier that was called first in device_del.

While something in this series obviously caused the regression are you
sure it's exactly this patch?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-09-30 15:18 ` Christoph Hellwig
@ 2015-10-01  0:56   ` Junichi Nomura
  2015-10-01  4:38     ` Junichi Nomura
  0 siblings, 1 reply; 12+ messages in thread
From: Junichi Nomura @ 2015-10-01  0:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, Hannes Reinecke

On 10/01/15 00:18, Christoph Hellwig wrote:
> On Wed, Sep 30, 2015 at 12:35:50AM +0000, Junichi Nomura wrote:
>> With v4.3-rc3, stress testing of SCSI device addition/removal quickly
>> trigger random crash in memory allocator (e.g. __kmalloc).  I found that
>> a commit 086b91d052eb ("scsi_dh: integrate into the core SCSI code")
>> moved the call of scsi_dh->detach() to very early part of sdev tear down
>> process (scsi_remove_device()). As a result, related data structure such
>> as alua_dh_data can be freed while rtpg/stpg are still in-flight.
> 
> Hi Junichi,
> 
> the code should have been called from that early in the process before,
> as it was called from the bus notifier that was called first in device_del.

With 4.2 kernel, scsi_dh->detach() was not called until the last
reference has gone.  With 4.3-rc3, scsi_dh->detach() is directly called
from the context of scsi_remove_device(). That's the point.

And in terms of that, my example script might not be reproducing the
situation I'm claiming because activation via sysfs doesn't seem to take
refcount anyway..  The original crash I saw happend when dm-mpath was
involved, which used to take refcount of scsi_dh while in-use.

> While something in this series obviously caused the regression are you
> sure it's exactly this patch?

So it might be the commit 1bab0de0274f ("dm-mpath, scsi_dh: don't let dm
detach device handlers"), which eliminates refcounting of scsi_dh.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-01  0:56   ` Junichi Nomura
@ 2015-10-01  4:38     ` Junichi Nomura
  2015-10-01  5:21       ` Christoph Hellwig
  2015-10-04  7:43       ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Junichi Nomura @ 2015-10-01  4:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, Hannes Reinecke

On 10/01/15 09:56, Junichi Nomura wrote:
> With 4.2 kernel, scsi_dh->detach() was not called until the last
> reference has gone.  With 4.3-rc3, scsi_dh->detach() is directly called
> from the context of scsi_remove_device(). That's the point.

For my test script in the original post, I can't reproduce the crash
by just swapping the order of scsi_dh_handler_detach() and
device_remove_file() like this:

  void scsi_dh_remove_device(struct scsi_device *sdev)
  {
        device_remove_file(&sdev->sdev_gendev, &scsi_dh_state_attr);
        if (sdev->handler)
                scsi_dh_handler_detach(sdev);
  }

But another workload using dm-multipath still caues the same crash.
I think a patch like the following is needed.
What do you think?


The commit 1bab0de0274f ("dm-mpath, scsi_dh: don't let dm detach device
handlers") removed reference counting of attached scsi device handler.
As a result scsi_dh->detach() is directly called in the context of
scsi_remove_device() where activation request can be still in flight.

This patch moves scsi_dh_handler_detach() to
scsi_device_dev_release_usercontext(), at that point the device
is already in quiesced state.

diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c
index edb044a..19bf9db 100644
--- a/drivers/scsi/scsi_dh.c
+++ b/drivers/scsi/scsi_dh.c
@@ -232,10 +232,14 @@ int scsi_dh_add_device(struct scsi_device *sdev)
 	return err;
 }
 
-void scsi_dh_remove_device(struct scsi_device *sdev)
+void scsi_dh_release_device(struct scsi_device *sdev)
 {
 	if (sdev->handler)
 		scsi_dh_handler_detach(sdev);
+}
+
+void scsi_dh_remove_device(struct scsi_device *sdev)
+{
 	device_remove_file(&sdev->sdev_gendev, &scsi_dh_state_attr);
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 644bb73..4d01cdb 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -173,9 +173,11 @@ extern struct async_domain scsi_sd_probe_domain;
 /* scsi_dh.c */
 #ifdef CONFIG_SCSI_DH
 int scsi_dh_add_device(struct scsi_device *sdev);
+void scsi_dh_release_device(struct scsi_device *sdev);
 void scsi_dh_remove_device(struct scsi_device *sdev);
 #else
 static inline int scsi_dh_add_device(struct scsi_device *sdev) { return 0; }
+static inline void scsi_dh_release_device(struct scsi_device *sdev) { }
 static inline void scsi_dh_remove_device(struct scsi_device *sdev) { }
 #endif
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b333389..dff8faf 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,6 +399,8 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 
 	sdev = container_of(work, struct scsi_device, ew.work);
 
+	scsi_dh_release_device(sdev);
+
 	parent = sdev->sdev_gendev.parent;
 
 	spin_lock_irqsave(sdev->host->host_lock, flags);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-01  4:38     ` Junichi Nomura
@ 2015-10-01  5:21       ` Christoph Hellwig
  2015-10-01 11:40         ` Junichi Nomura
  2015-10-04  7:43       ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2015-10-01  5:21 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke

On Thu, Oct 01, 2015 at 04:38:45AM +0000, Junichi Nomura wrote:
> But another workload using dm-multipath still caues the same crash.
> I think a patch like the following is needed.
> What do you think?

Thanks Junichi,

I suspected that we should defer the release until we tear down
the sdev instead of unregister time.  Give me a little more time
to review your patch and actually reproduce your issue before a formal
ACK, though.

Any chance you could share all your multipath tests in a git repository
somewhere?  It seems like you're the only one actually having a good
set of reproducable but minimalistic tests.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-01  5:21       ` Christoph Hellwig
@ 2015-10-01 11:40         ` Junichi Nomura
  2015-10-04  7:42           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Junichi Nomura @ 2015-10-01 11:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, Hannes Reinecke

On 10/01/15 14:21, Christoph Hellwig wrote:
> Any chance you could share all your multipath tests in a git repository
> somewhere?  It seems like you're the only one actually having a good
> set of reproducable but minimalistic tests.

Hmm, sorry I don't have a public git repository...

I'm using a pair of KVM guest, one for iSCSI target and the other
for testing.

I could reproduce the crash using this loop, within a few minutes:

   service multipathd start
   while true; do
     multipath -F
     iscsiadm -m node --logout
     iscsiadm -m node --login
   done

It might implicitly depend on udev to do some small amount of I/O
after device uevent though.

/etc/multipath.conf is like this:

blacklist {
      devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|vd)[0-9]*"
}
defaults {
      user_friendly_names no
      find_multipaths yes
      path_grouping_policy    group_by_prio
      path_selector           "queue-length 0"
      path_checker            "tur"
      failback immediate
      prio alua
      retain_attached_hw_handler yes
}

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-01 11:40         ` Junichi Nomura
@ 2015-10-04  7:42           ` Christoph Hellwig
  2015-10-07  5:55             ` Junichi Nomura
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2015-10-04  7:42 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke

On Thu, Oct 01, 2015 at 11:40:23AM +0000, Junichi Nomura wrote:
> On 10/01/15 14:21, Christoph Hellwig wrote:
> > Any chance you could share all your multipath tests in a git repository
> > somewhere?  It seems like you're the only one actually having a good
> > set of reproducable but minimalistic tests.
> 
> Hmm, sorry I don't have a public git repository...
> 
> I'm using a pair of KVM guest, one for iSCSI target and the other
> for testing.

Any chance you could share your various scripts in some way to that
people doing multipath changes can run them to verify those changes?

> 
> I could reproduce the crash using this loop, within a few minutes:
> 
>    service multipathd start
>    while true; do
>      multipath -F
>      iscsiadm -m node --logout
>      iscsiadm -m node --login
>    done
> 
> It might implicitly depend on udev to do some small amount of I/O
> after device uevent though.

I can't reproduce this unfortunately.  I suspect udev doesn't do enough
stupid things on my old Debian test system.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-01  4:38     ` Junichi Nomura
  2015-10-01  5:21       ` Christoph Hellwig
@ 2015-10-04  7:43       ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2015-10-04  7:43 UTC (permalink / raw)
  To: Junichi Nomura; +Cc: linux-scsi, Hannes Reinecke

This patch looks fine to me:

Acked-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
  2015-10-04  7:42           ` Christoph Hellwig
@ 2015-10-07  5:55             ` Junichi Nomura
  0 siblings, 0 replies; 12+ messages in thread
From: Junichi Nomura @ 2015-10-07  5:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, Hannes Reinecke

On 10/04/15 16:42, Christoph Hellwig wrote:
> Any chance you could share your various scripts in some way to that
> people doing multipath changes can run them to verify those changes?

Hi Christoph,

I've refined some pieces of test scripts which could be useful for
people and posted here:
  http://marc.info/?l=linux-scsi&m=144419652913948&w=2
  https://www.redhat.com/archives/dm-devel/2015-October/msg00034.html

The scripts are stress testing and expected to trigger crash/stall/
data corruption if you run long enough. Though it's not perfect I
hope this helps people do regression testing for dm-mpath related
code changes.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-10-07  5:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-30  0:35 [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device Junichi Nomura
2015-09-30  9:28 ` Hannes Reinecke
2015-09-30 10:35   ` Boaz Harrosh
2015-09-30 14:49     ` Hannes Reinecke
2015-09-30 15:18 ` Christoph Hellwig
2015-10-01  0:56   ` Junichi Nomura
2015-10-01  4:38     ` Junichi Nomura
2015-10-01  5:21       ` Christoph Hellwig
2015-10-01 11:40         ` Junichi Nomura
2015-10-04  7:42           ` Christoph Hellwig
2015-10-07  5:55             ` Junichi Nomura
2015-10-04  7:43       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).