* v4.3-rc2 dm-mq bug
@ 2015-09-24 19:09 Bart Van Assche
2015-09-24 21:18 ` Mike Snitzer
0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2015-09-24 19:09 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development
Hello Mike,
The below was reported while testing the SRP initiator in kernel
v4.3-rc2 with scsi-mq and dm-mq enabled. Can you have a look at this ?
Thanks,
Bart.
------------[ cut here ]------------
kernel BUG at drivers/md/dm.c:2811!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_queue_length ib_srp(O) scsi_transport_srp(O) netconsole configfs af_packet msr sg coretemp x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ipmi_devintf ablk_helper cryptd tg3 lpc_ich microcode pcspkr libphy mfd_core ipmi_si ipmi_msghandler ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad dm_multipath dm_mod rdma_cm ib_cm acpi_power_meter hwmon iw_cm processor button wmi ext4 crc16 mbcache jbd2 mlx4_ib mlx4_en ib_sa ptp pps_core ib_mad ib_core ib_addr sd_mod sr_mod cdrom hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd mlx4_core usbcore usb_common scsi_mod autofs4 [last unloaded: brd]
CPU: 0 PID: 334 Comm: kworker/0:1H Tainted: G W O 4.3.0-rc2-debug+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: kblockd blk_mq_run_work_fn
task: ffff880450604380 ti: ffff880465864000 task.ti: ffff880465864000
RIP: 0010:[<ffffffffa03db95d>] [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
RSP: 0018:ffff880465867c90 EFLAGS: 00010202
RAX: 0000000000000018 RBX: ffff88006cc5a520 RCX: 000000010000affd
RDX: 0000000000000000 RSI: ffffffff81a47140 RDI: ffff88006cc5a520
RBP: ffff880465867c90 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804036c0600
R13: ffffc90000ca0040 R14: ffff88040429f620 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000560a1f749178 CR3: 000000044d3f8000 CR4: 00000000001406f0
Stack:
ffff880465867cb8 ffffffffa03dba40 ffff88006cc5a520 ffff8804036c0600
ffff88006cc5a520 ffff880465867cf8 ffffffffa03dbd95 0000000100000001
ffff8804034ba4f8 ffff880465867d20 0000000000000000 ffff8804036c0600
Call Trace:
[<ffffffffa03dba40>] dm_start_request+0x60/0x100 [dm_mod]
[<ffffffffa03dbd95>] dm_mq_queue_rq+0xa5/0x240 [dm_mod]
[<ffffffff81273275>] __blk_mq_run_hw_queue+0x1c5/0x360
[<ffffffff812739d5>] blk_mq_run_work_fn+0x15/0x20
[<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
[<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
[<ffffffff8108df04>] worker_thread+0x114/0x460
[<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
[<ffffffff810941f8>] kthread+0xf8/0x110
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
[<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
[<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
Code: 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 d0 01 00 00 48 8b 87 70 02 00 00 a8 08 75 02 5d c3 <0f> 0b 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 c7 c7 a0 b0
RIP [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
RSP <ffff880465867c90>
---[ end trace e0f10c6e2c55ad9a ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-24 19:09 v4.3-rc2 dm-mq bug Bart Van Assche
@ 2015-09-24 21:18 ` Mike Snitzer
2015-09-24 21:27 ` Bart Van Assche
2015-09-25 0:37 ` Junichi Nomura
0 siblings, 2 replies; 11+ messages in thread
From: Mike Snitzer @ 2015-09-24 21:18 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jun'ichi Nomura, device-mapper development
On Thu, Sep 24 2015 at 3:09pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> Hello Mike,
>
> The below was reported while testing the SRP initiator in kernel
> v4.3-rc2 with scsi-mq and dm-mq enabled. Can you have a look at this ?
>
> Thanks,
>
> Bart.
>
> ------------[ cut here ]------------
> kernel BUG at drivers/md/dm.c:2811!
> invalid opcode: 0000 [#1] SMP
...
> CPU: 0 PID: 334 Comm: kworker/0:1H Tainted: G W O 4.3.0-rc2-debug+ #1
> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> Workqueue: kblockd blk_mq_run_work_fn
> task: ffff880450604380 ti: ffff880465864000 task.ti: ffff880465864000
> RIP: 0010:[<ffffffffa03db95d>] [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
> RSP: 0018:ffff880465867c90 EFLAGS: 00010202
> RAX: 0000000000000018 RBX: ffff88006cc5a520 RCX: 000000010000affd
> RDX: 0000000000000000 RSI: ffffffff81a47140 RDI: ffff88006cc5a520
> RBP: ffff880465867c90 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804036c0600
> R13: ffffc90000ca0040 R14: ffff88040429f620 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000560a1f749178 CR3: 000000044d3f8000 CR4: 00000000001406f0
> Stack:
> ffff880465867cb8 ffffffffa03dba40 ffff88006cc5a520 ffff8804036c0600
> ffff88006cc5a520 ffff880465867cf8 ffffffffa03dbd95 0000000100000001
> ffff8804034ba4f8 ffff880465867d20 0000000000000000 ffff8804036c0600
> Call Trace:
> [<ffffffffa03dba40>] dm_start_request+0x60/0x100 [dm_mod]
> [<ffffffffa03dbd95>] dm_mq_queue_rq+0xa5/0x240 [dm_mod]
> [<ffffffff81273275>] __blk_mq_run_hw_queue+0x1c5/0x360
> [<ffffffff812739d5>] blk_mq_run_work_fn+0x15/0x20
> [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
> [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
> [<ffffffff8108df04>] worker_thread+0x114/0x460
> [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
> [<ffffffff810941f8>] kthread+0xf8/0x110
> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
> [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
> Code: 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 d0 01 00 00 48 8b 87 70 02 00 00 a8 08 75 02 5d c3 <0f> 0b 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 c7 c7 a0 b0
> RIP [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
> RSP <ffff880465867c90>
> ---[ end trace e0f10c6e2c55ad9a ]---
That is dm_get's BUG_ON(test_bit(DMF_FREEING, &md->flags));
which was introduced back in 2010 via commit 3f77316de0.
In 2012 that dm_get moved from map_request to dm_start_request() with
commit ba1cbad93d.
dm_get() is called at the end of dm_start_request().
__dm_destroy() sets DMF_FREEING when a device is in the process of being
destroyed. And the reason for dm_get's BUG_ON() is detailed in
__dm_destroy's comment:
/*
* Rare, but there may be I/O requests still going to complete,
* for example. Wait for all references to disappear.
* No one should increment the reference count of the mapped_device,
* after the mapped_device state becomes DMF_FREEING.
*/
It could be the original intent was that no _new_ requests be
initiated if/when the DM device is destroyed. But AFAICT nothing ever
actually enforced that distinction.
And clearly in your test dm_start_request() is called after
__dm_destroy() sets DMF_FREEING.
But of all the dm_get() callers dm_start_request() does strike me as odd
because it is a catch-22. We need to start requests that were issued
even if the device is being destroyed. And __dm_destroy() needs to wait
for them (by waiting for all dm_get references to be dropped).
It should be noted that bio-based DM doesn't call dm_get() in its IO
submission path. So this is unique to request-based DM (be it blk-mq or
not).
In short: dm_get's BUG_ON() looks bogus and should likely be removed.
But I've cc'd Junichi to see what he thinks.
Mike
p.s. Bart, why is your test destroying the DM mpath device while it is
issuing IO to the device? ;)
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: v4.3-rc2 dm-mq bug
2015-09-24 21:18 ` Mike Snitzer
@ 2015-09-24 21:27 ` Bart Van Assche
2015-09-24 23:01 ` Mike Snitzer
2015-09-25 0:37 ` Junichi Nomura
1 sibling, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2015-09-24 21:27 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jun'ichi Nomura, device-mapper development
On 09/24/2015 02:18 PM, Mike Snitzer wrote:
> p.s. Bart, why is your test destroying the DM mpath device while it is
> issuing IO to the device? ;)
Removing SCSI device instances created by the SRP initiator while I/O is
being submitted is a great way to stress-test the SRP initiator driver
but apparently the test I'm running triggers bugs in other drivers than
the SRP initiator driver ...
Bart.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-24 21:27 ` Bart Van Assche
@ 2015-09-24 23:01 ` Mike Snitzer
2015-09-25 15:34 ` Bart Van Assche
0 siblings, 1 reply; 11+ messages in thread
From: Mike Snitzer @ 2015-09-24 23:01 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jun'ichi Nomura, device-mapper development
On Thu, Sep 24 2015 at 5:27pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/24/2015 02:18 PM, Mike Snitzer wrote:
> >p.s. Bart, why is your test destroying the DM mpath device while it is
> >issuing IO to the device? ;)
>
> Removing SCSI device instances created by the SRP initiator while
> I/O is being submitted is a great way to stress-test the SRP
> initiator driver but apparently the test I'm running triggers bugs
> in other drivers than the SRP initiator driver ...
I'm not referring to the removal of a DM mpath device's underlying SRP
paths. The __dm_destroy() path is only used when destroying a DM
device. The only DM device used by multipath is the DM mpath device.
So somehow your test is running the equivalent of:
dmsetup remove <toplevel_DM_mpath_device>
While IO is being issued directly to that device.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-24 23:01 ` Mike Snitzer
@ 2015-09-25 15:34 ` Bart Van Assche
2015-09-25 18:49 ` Mike Snitzer
0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2015-09-25 15:34 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jun'ichi Nomura, device-mapper development
On 09/24/2015 04:01 PM, Mike Snitzer wrote:
> I'm not referring to the removal of a DM mpath device's underlying SRP
> paths. The __dm_destroy() path is only used when destroying a DM
> device. The only DM device used by multipath is the DM mpath device.
>
> So somehow your test is running the equivalent of:
> dmsetup remove <toplevel_DM_mpath_device>
>
> While IO is being issued directly to that device.
Hello Mike,
One of the steps in my test is to remove all dm devices that have been
created on top of SRP paths. The call trace at the start of this thread
was triggered by the following command (unloading the SRP initiator is
only possible after all holders of SRP paths have been removed):
for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done;
wait; dmsetup remove_all; modprobe -r ib_srp
The I/O was probably generated by multipathd itself (a path check).
Bart.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-25 15:34 ` Bart Van Assche
@ 2015-09-25 18:49 ` Mike Snitzer
0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2015-09-25 18:49 UTC (permalink / raw)
To: Bart Van Assche; +Cc: Jun'ichi Nomura, device-mapper development
On Fri, Sep 25 2015 at 11:34am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 09/24/2015 04:01 PM, Mike Snitzer wrote:
> >I'm not referring to the removal of a DM mpath device's underlying SRP
> >paths. The __dm_destroy() path is only used when destroying a DM
> >device. The only DM device used by multipath is the DM mpath device.
> >
> >So somehow your test is running the equivalent of:
> > dmsetup remove <toplevel_DM_mpath_device>
> >
> >While IO is being issued directly to that device.
>
> Hello Mike,
>
> One of the steps in my test is to remove all dm devices that have
> been created on top of SRP paths. The call trace at the start of
> this thread was triggered by the following command (unloading the
> SRP initiator is only possible after all holders of SRP paths have
> been removed):
>
> for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done;
> wait; dmsetup remove_all; modprobe -r ib_srp
>
> The I/O was probably generated by multipathd itself (a path check).
I'd be surprised, the multipathd path checks go direct to the underlying
paths. But I would also use 'multipath -F' before 'dmsetup remove_all'
in your test sequence -- could be that alone "fixes" the problem.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-24 21:18 ` Mike Snitzer
2015-09-24 21:27 ` Bart Van Assche
@ 2015-09-25 0:37 ` Junichi Nomura
2015-09-25 2:13 ` Mike Snitzer
2015-09-25 15:37 ` Bart Van Assche
1 sibling, 2 replies; 11+ messages in thread
From: Junichi Nomura @ 2015-09-25 0:37 UTC (permalink / raw)
To: Mike Snitzer, Bart Van Assche; +Cc: device-mapper development
On 09/25/15 06:18, Mike Snitzer wrote:
> On Thu, Sep 24 2015 at 3:09pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>> kernel BUG at drivers/md/dm.c:2811!
>> invalid opcode: 0000 [#1] SMP
> ...
>> CPU: 0 PID: 334 Comm: kworker/0:1H Tainted: G W O 4.3.0-rc2-debug+ #1
>> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
>> Workqueue: kblockd blk_mq_run_work_fn
>> task: ffff880450604380 ti: ffff880465864000 task.ti: ffff880465864000
>> RIP: 0010:[<ffffffffa03db95d>] [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
>> RSP: 0018:ffff880465867c90 EFLAGS: 00010202
>> RAX: 0000000000000018 RBX: ffff88006cc5a520 RCX: 000000010000affd
>> RDX: 0000000000000000 RSI: ffffffff81a47140 RDI: ffff88006cc5a520
>> RBP: ffff880465867c90 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804036c0600
>> R13: ffffc90000ca0040 R14: ffff88040429f620 R15: 0000000000000001
>> FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000560a1f749178 CR3: 000000044d3f8000 CR4: 00000000001406f0
>> Stack:
>> ffff880465867cb8 ffffffffa03dba40 ffff88006cc5a520 ffff8804036c0600
>> ffff88006cc5a520 ffff880465867cf8 ffffffffa03dbd95 0000000100000001
>> ffff8804034ba4f8 ffff880465867d20 0000000000000000 ffff8804036c0600
>> Call Trace:
>> [<ffffffffa03dba40>] dm_start_request+0x60/0x100 [dm_mod]
>> [<ffffffffa03dbd95>] dm_mq_queue_rq+0xa5/0x240 [dm_mod]
>> [<ffffffff81273275>] __blk_mq_run_hw_queue+0x1c5/0x360
>> [<ffffffff812739d5>] blk_mq_run_work_fn+0x15/0x20
>> [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
>> [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
>> [<ffffffff8108df04>] worker_thread+0x114/0x460
>> [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
>> [<ffffffff810941f8>] kthread+0xf8/0x110
>> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
>> [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
>> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
>> Code: 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 d0 01 00 00 48 8b 87 70 02 00 00 a8 08 75 02 5d c3 <0f> 0b 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 c7 c7 a0 b0
>> RIP [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
>> RSP <ffff880465867c90>
>> ---[ end trace e0f10c6e2c55ad9a ]---
>
> That is dm_get's BUG_ON(test_bit(DMF_FREEING, &md->flags));
> which was introduced back in 2010 via commit 3f77316de0.
> In 2012 that dm_get moved from map_request to dm_start_request() with
> commit ba1cbad93d.
>
> dm_get() is called at the end of dm_start_request().
>
> __dm_destroy() sets DMF_FREEING when a device is in the process of being
> destroyed. And the reason for dm_get's BUG_ON() is detailed in
> __dm_destroy's comment:
>
> /*
> * Rare, but there may be I/O requests still going to complete,
> * for example. Wait for all references to disappear.
> * No one should increment the reference count of the mapped_device,
> * after the mapped_device state becomes DMF_FREEING.
> */
>
> It could be the original intent was that no _new_ requests be
> initiated if/when the DM device is destroyed. But AFAICT nothing ever
> actually enforced that distinction.
>
> And clearly in your test dm_start_request() is called after
> __dm_destroy() sets DMF_FREEING.
>
> But of all the dm_get() callers dm_start_request() does strike me as odd
> because it is a catch-22. We need to start requests that were issued
> even if the device is being destroyed. And __dm_destroy() needs to wait
> for them (by waiting for all dm_get references to be dropped).
>
> It should be noted that bio-based DM doesn't call dm_get() in its IO
> submission path. So this is unique to request-based DM (be it blk-mq or
> not).
>
> In short: dm_get's BUG_ON() looks bogus and should likely be removed.
> But I've cc'd Junichi to see what he thinks.
Since __dm_destroy() depends on monotonic decrease of md->holders,
assertion check of !DMF_FREEING in dm_get() is a valid protection
from use-after-free. If we are to remove the check, __dm_destroy()
should be changed to cope with the situation.
I'm curious why there were pending I/Os after DMF_FEEING set.
Can this problem be reproducible with non dm-mq setup or older kernels?
How did you remove the dm device in your testing?
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-25 0:37 ` Junichi Nomura
@ 2015-09-25 2:13 ` Mike Snitzer
2015-09-25 15:37 ` Bart Van Assche
1 sibling, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2015-09-25 2:13 UTC (permalink / raw)
To: Junichi Nomura; +Cc: Bart Van Assche, device-mapper development
On Thu, Sep 24 2015 at 8:37pm -0400,
Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> On 09/25/15 06:18, Mike Snitzer wrote:
> > On Thu, Sep 24 2015 at 3:09pm -0400,
> > Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >> kernel BUG at drivers/md/dm.c:2811!
> >> invalid opcode: 0000 [#1] SMP
> > ...
> >> CPU: 0 PID: 334 Comm: kworker/0:1H Tainted: G W O 4.3.0-rc2-debug+ #1
> >> Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
> >> Workqueue: kblockd blk_mq_run_work_fn
> >> task: ffff880450604380 ti: ffff880465864000 task.ti: ffff880465864000
> >> RIP: 0010:[<ffffffffa03db95d>] [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
> >> RSP: 0018:ffff880465867c90 EFLAGS: 00010202
> >> RAX: 0000000000000018 RBX: ffff88006cc5a520 RCX: 000000010000affd
> >> RDX: 0000000000000000 RSI: ffffffff81a47140 RDI: ffff88006cc5a520
> >> RBP: ffff880465867c90 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804036c0600
> >> R13: ffffc90000ca0040 R14: ffff88040429f620 R15: 0000000000000001
> >> FS: 0000000000000000(0000) GS:ffff88047fc00000(0000) knlGS:0000000000000000
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 0000560a1f749178 CR3: 000000044d3f8000 CR4: 00000000001406f0
> >> Stack:
> >> ffff880465867cb8 ffffffffa03dba40 ffff88006cc5a520 ffff8804036c0600
> >> ffff88006cc5a520 ffff880465867cf8 ffffffffa03dbd95 0000000100000001
> >> ffff8804034ba4f8 ffff880465867d20 0000000000000000 ffff8804036c0600
> >> Call Trace:
> >> [<ffffffffa03dba40>] dm_start_request+0x60/0x100 [dm_mod]
> >> [<ffffffffa03dbd95>] dm_mq_queue_rq+0xa5/0x240 [dm_mod]
> >> [<ffffffff81273275>] __blk_mq_run_hw_queue+0x1c5/0x360
> >> [<ffffffff812739d5>] blk_mq_run_work_fn+0x15/0x20
> >> [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
> >> [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
> >> [<ffffffff8108df04>] worker_thread+0x114/0x460
> >> [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
> >> [<ffffffff810941f8>] kthread+0xf8/0x110
> >> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
> >> [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
> >> [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
> >> Code: 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 f0 ff 87 d0 01 00 00 48 8b 87 70 02 00 00 a8 08 75 02 5d c3 <0f> 0b 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 c7 c7 a0 b0
> >> RIP [<ffffffffa03db95d>] dm_get+0x1d/0x20 [dm_mod]
> >> RSP <ffff880465867c90>
> >> ---[ end trace e0f10c6e2c55ad9a ]---
> >
> > That is dm_get's BUG_ON(test_bit(DMF_FREEING, &md->flags));
> > which was introduced back in 2010 via commit 3f77316de0.
> > In 2012 that dm_get moved from map_request to dm_start_request() with
> > commit ba1cbad93d.
> >
> > dm_get() is called at the end of dm_start_request().
> >
> > __dm_destroy() sets DMF_FREEING when a device is in the process of being
> > destroyed. And the reason for dm_get's BUG_ON() is detailed in
> > __dm_destroy's comment:
> >
> > /*
> > * Rare, but there may be I/O requests still going to complete,
> > * for example. Wait for all references to disappear.
> > * No one should increment the reference count of the mapped_device,
> > * after the mapped_device state becomes DMF_FREEING.
> > */
> >
> > It could be the original intent was that no _new_ requests be
> > initiated if/when the DM device is destroyed. But AFAICT nothing ever
> > actually enforced that distinction.
> >
> > And clearly in your test dm_start_request() is called after
> > __dm_destroy() sets DMF_FREEING.
> >
> > But of all the dm_get() callers dm_start_request() does strike me as odd
> > because it is a catch-22. We need to start requests that were issued
> > even if the device is being destroyed. And __dm_destroy() needs to wait
> > for them (by waiting for all dm_get references to be dropped).
> >
> > It should be noted that bio-based DM doesn't call dm_get() in its IO
> > submission path. So this is unique to request-based DM (be it blk-mq or
> > not).
> >
> > In short: dm_get's BUG_ON() looks bogus and should likely be removed.
> > But I've cc'd Junichi to see what he thinks.
>
> Since __dm_destroy() depends on monotonic decrease of md->holders,
> assertion check of !DMF_FREEING in dm_get() is a valid protection
> from use-after-free. If we are to remove the check, __dm_destroy()
> should be changed to cope with the situation.
Good point. dm_remove() _should_ protect us from ever destroying a
device that is still in use.
Unless there is some bug that caused something like
dm_lock_for_deletion() to regress -- which seems unlikely.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-25 0:37 ` Junichi Nomura
2015-09-25 2:13 ` Mike Snitzer
@ 2015-09-25 15:37 ` Bart Van Assche
2015-09-30 0:42 ` Junichi Nomura
1 sibling, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2015-09-25 15:37 UTC (permalink / raw)
To: Junichi Nomura, Mike Snitzer; +Cc: device-mapper development
On 09/24/2015 05:42 PM, Junichi Nomura wrote:
> Since __dm_destroy() depends on monotonic decrease of md->holders,
> assertion check of !DMF_FREEING in dm_get() is a valid protection
> from use-after-free. If we are to remove the check, __dm_destroy()
> should be changed to cope with the situation.
>
> I'm curious why there were pending I/Os after DMF_FEEING set.
> Can this problem be reproducible with non dm-mq setup or older kernels?
> How did you remove the dm device in your testing?
Hello Junichi,
Thanks for stepping in.
Sorry but I do not know whether or not this problem is reproducible
without dm-mq or with older kernels.
The dm device was removed via the command "dmsetup remove_all".
Bart.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: v4.3-rc2 dm-mq bug
2015-09-25 15:37 ` Bart Van Assche
@ 2015-09-30 0:42 ` Junichi Nomura
2015-09-30 2:54 ` Bart Van Assche
0 siblings, 1 reply; 11+ messages in thread
From: Junichi Nomura @ 2015-09-30 0:42 UTC (permalink / raw)
To: Bart Van Assche, Mike Snitzer; +Cc: device-mapper development
On 09/26/15 00:37, Bart Van Assche wrote:
> On 09/24/2015 05:42 PM, Junichi Nomura wrote:
>> Since __dm_destroy() depends on monotonic decrease of md->holders,
>> assertion check of !DMF_FREEING in dm_get() is a valid protection
>> from use-after-free. If we are to remove the check, __dm_destroy()
>> should be changed to cope with the situation.
>>
>> I'm curious why there were pending I/Os after DMF_FEEING set.
>> Can this problem be reproducible with non dm-mq setup or older kernels?
>> How did you remove the dm device in your testing?
>
> Hello Junichi,
>
> Thanks for stepping in.
>
> Sorry but I do not know whether or not this problem is reproducible without dm-mq or with older kernels.
>
> The dm device was removed via the command "dmsetup remove_all".
I tried simply repeating 'dmsetup remove_all' and multipath scan
but couldn't reproduce the problem.
However, when I added scsi device removal and rescan to the mix
the system crashed within a few seconds. It looks like the change
in v4.3-rc which integrates scsi_dh to scsi core introduced
use-after-free. I reported the problem to linux-scsi:
[REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device
http://marc.info/?l=linux-scsi&m=144357350800712&w=2
Though I'm not sure if it's related to your issue, just FYI.
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-09-30 2:54 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-24 19:09 v4.3-rc2 dm-mq bug Bart Van Assche
2015-09-24 21:18 ` Mike Snitzer
2015-09-24 21:27 ` Bart Van Assche
2015-09-24 23:01 ` Mike Snitzer
2015-09-25 15:34 ` Bart Van Assche
2015-09-25 18:49 ` Mike Snitzer
2015-09-25 0:37 ` Junichi Nomura
2015-09-25 2:13 ` Mike Snitzer
2015-09-25 15:37 ` Bart Van Assche
2015-09-30 0:42 ` Junichi Nomura
2015-09-30 2:54 ` Bart Van Assche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.