* Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
@ 2015-04-28 11:52 Bart Van Assche
2015-04-28 13:52 ` Mike Snitzer
2015-04-29 13:20 ` Christoph Hellwig
0 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-04-28 11:52 UTC (permalink / raw)
To: Mike Snitzer, Christoph Hellwig; +Cc: device-mapper development
Hello,
Earlier today I started testing an SRP initiator patch series on top of
Linux kernel v4.1-rc1. Although that patch series works reliably on top
of kernel v4.0, a test during which I triggered scsi_remove_host() +
relogin (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete &
done; wait; srp_daemon -oaec) triggered the following kernel oops:
device-mapper: multipath: Failing path 8:0.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: dm_queue_length scsi_dh_alua dm_round_robin
dm_multipath scsi_dh dm_mod sd_mod ib_uverbs mlx4_ib ib_umad netconsole
ib_srp scsi_transport_srp configfs ib_iser rdma_cm iw_cm ib_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi hid_generic af_packet sg microcode sr_mod i2c_piix4
usbhid hid intel_agp intel_gtt i2c_core cdrom virtio_balloon agpgart
acpi_cpufreq processor thermal_sys fuse hwmon button ata_generic
pata_acpi mlx4_en ptp pps_core ext4 crc16 jbd2 mbcache virtio_blk
virtio_net ata_piix libata uhci_hcd virtio_pci virtio_ring mlx4_core
usbcore virtio scsi_mod usb_common
CPU: 1 PID: 5423 Comm: kdmwork-252:0 Not tainted 4.1.0-rc1-debug+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880056568000 ti: ffff8800516c0000 task.ti: ffff8800516c0000
RIP: 0010:[<ffffffffa045f8e9>] [<ffffffffa045f8e9>]
free_rq_clone+0x29/0xb0 [dm_mod]
RSP: 0018:ffff8800516c3d18 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff880058bb7040 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8179d723 RDI: 0000000000000000
RBP: ffff8800516c3d38 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880058bb6ff0
R13: ffff880051570948 R14: 0000000000000000 R15: ffffc900000b7040
FS: 0000000000000000(0000) GS:ffff88005fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000138 CR3: 000000005dab8000 CR4: 00000000000007e0
Stack:
0000000000000001 ffff880058bb6e80 ffff8800342ab0a0 ffff880051570948
ffff8800516c3d78 ffffffffa045fa67 ffff880058bb6e80 ffff880058bb6ff0
0000000000000002 ffff880058bb6e80 ffff880051570c60 ffffc900000b7040
Call Trace:
[<ffffffffa045fa67>] dm_requeue_unmapped_original_request+0x47/0xe0
[dm_mod]
[<ffffffffa0461daf>] map_request.isra.35+0x9f/0x430 [dm_mod]
[<ffffffff814ff2d0>] ? _raw_spin_unlock_irq+0x30/0x70
[<ffffffffa0462166>] map_tio_request+0x26/0x40 [dm_mod]
[<ffffffff8108382f>] kthread_worker_fn+0xaf/0x200
[<ffffffff81083780>] ? __init_kthread_worker+0x60/0x60
[<ffffffff8108370a>] kthread+0x10a/0x120
[<ffffffff814ff2d0>] ? _raw_spin_unlock_irq+0x30/0x70
[<ffffffff81083600>] ? kthread_create_on_node+0x220/0x220
[<ffffffff815001a2>] ret_from_fork+0x42/0x70
[<ffffffff81083600>] ? kthread_create_on_node+0x220/0x220
Code: 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec
08 4c 8b a7 60 01 00 00 4d 8b 2c 24 e8 bb 6e e0 e0 48 8b 43 30 <48> 83
b8 38 01 00 00 00 74 2d 49 8b 44 24 08 48 89 df 48 8b 40
RIP [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
RSP <ffff8800516c3d18>
CR2: 0000000000000138
---[ end trace ba598d96706a7353 ]---
In case anyone wants to see the translation of the crash address:
(gdb) list *(free_rq_clone+0x29)
0x919 is in free_rq_clone (drivers/md/dm.c:1092).
1087 struct dm_rq_target_io *tio = clone->end_io_data;
1088 struct mapped_device *md = tio->md;
1089
1090 blk_rq_unprep_clone(clone);
1091
1092 if (clone->q->mq_ops)
1093 tio->ti->type->release_clone_rq(clone);
1094 else if (!md->queue->mq_ops)
1095 /* request_fn queue stacked on request_fn
queue(s) */
1096 free_clone_request(md, clone);
(gdb) list *(dm_requeue_unmapped_original_request+0x47)
0xa97 is in dm_requeue_unmapped_original_request (drivers/md/dm.c:1146).
1141 rq->special = NULL;
1142 rq->cmd_flags &= ~REQ_DONTPREP;
1143 }
1144
1145 if (clone)
1146 free_rq_clone(clone);
1147 }
1148
1149 /*
1150 * Requeue the original request of a clone.
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
@ 2015-04-28 13:52 ` Mike Snitzer
2015-04-28 21:54 ` Mike Snitzer
2015-04-29 13:20 ` Christoph Hellwig
1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-28 13:52 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development, Christoph Hellwig
On Tue, Apr 28 2015 at 7:52am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> Hello,
>
> Earlier today I started testing an SRP initiator patch series on top
> of Linux kernel v4.1-rc1. Although that patch series works reliably
> on top of kernel v4.0, a test during which I triggered
> scsi_remove_host() + relogin (for p in
> /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> srp_daemon -oaec) triggered the following kernel oops:
>
> device-mapper: multipath: Failing path 8:0.
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
> IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
...
> In case anyone wants to see the translation of the crash address:
>
> (gdb) list *(free_rq_clone+0x29)
> 0x919 is in free_rq_clone (drivers/md/dm.c:1092).
> 1087 struct dm_rq_target_io *tio = clone->end_io_data;
> 1088 struct mapped_device *md = tio->md;
> 1089
> 1090 blk_rq_unprep_clone(clone);
> 1091
> 1092 if (clone->q->mq_ops)
> 1093 tio->ti->type->release_clone_rq(clone);
> 1094 else if (!md->queue->mq_ops)
> 1095 /* request_fn queue stacked on request_fn
> queue(s) */
> 1096 free_clone_request(md, clone);
I saw a crash like this yesterday with 4.1-rc1 (definitely due to
clone->q being NULL) but I didn't get a full backtrace over serial
console so I cannot be sure it is exactly like yours.
In my case I was using hch's lio-utils based test setup that he
documented here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00138.html
But I got the crash the first time I ran this script:
multipathd -F
tcm_loop --unload
tcm_node --freedev iblock_0/array
Rough first experience with LIO ;) So I just chalked it up to tcm_loop
or something not being careful about device lifetime.
So we now have 2 data points (each using different storage backend). I
haven't been able to reproduce the issue again though -- but I switch
away from using multipathd to create the multipath device and resorted
to using dmsetup directly (with a dmsetup remove for cleanup instead of
multipath -F).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-28 13:52 ` Mike Snitzer
@ 2015-04-28 21:54 ` Mike Snitzer
2015-04-29 13:24 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-28 21:54 UTC (permalink / raw)
To: Bart Van Assche; +Cc: device-mapper development, Christoph Hellwig
On Tue, Apr 28 2015 at 9:52am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Tue, Apr 28 2015 at 7:52am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
> > Hello,
> >
> > Earlier today I started testing an SRP initiator patch series on top
> > of Linux kernel v4.1-rc1. Although that patch series works reliably
> > on top of kernel v4.0, a test during which I triggered
> > scsi_remove_host() + relogin (for p in
> > /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> > srp_daemon -oaec) triggered the following kernel oops:
> >
> > device-mapper: multipath: Failing path 8:0.
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
> > IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
> ...
>
> > In case anyone wants to see the translation of the crash address:
> >
> > (gdb) list *(free_rq_clone+0x29)
> > 0x919 is in free_rq_clone (drivers/md/dm.c:1092).
> > 1087 struct dm_rq_target_io *tio = clone->end_io_data;
> > 1088 struct mapped_device *md = tio->md;
> > 1089
> > 1090 blk_rq_unprep_clone(clone);
> > 1091
> > 1092 if (clone->q->mq_ops)
> > 1093 tio->ti->type->release_clone_rq(clone);
> > 1094 else if (!md->queue->mq_ops)
> > 1095 /* request_fn queue stacked on request_fn
> > queue(s) */
> > 1096 free_clone_request(md, clone);
>
> I saw a crash like this yesterday with 4.1-rc1 (definitely due to
> clone->q being NULL) but I didn't get a full backtrace over serial
> console so I cannot be sure it is exactly like yours.
>
> In my case I was using hch's lio-utils based test setup that he
> documented here:
> https://www.redhat.com/archives/dm-devel/2015-April/msg00138.html
>
> But I got the crash the first time I ran this script:
> multipathd -F
> tcm_loop --unload
> tcm_node --freedev iblock_0/array
Turns out that after running hch's script, with multipathd having setup
an mpath device ontop of tcm_loop, if I just do:
tcm_loop --unload
(tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)
I get:
[ 463.168175] sd 10:0:1:0: alua: Detached
[ 463.172808] sd 10:0:1:0: [sdk] Synchronizing SCSI cache
[ 463.211388] device-mapper: multipath: Failing path 8:144.
[ 463.211494] sd 9:0:0:0: alua: Detached
[ 463.211621] sd 9:0:0:0: [sdj] Synchronizing SCSI cache
[ 463.227362] blk_update_request: I/O error, dev dm-13, sector 32
[ 463.234238] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8
[ 463.242990] IP: [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[ 463.250188] PGD 0
[ 463.252437] Oops: 0000 [#1] SMP
[ 463.256051] Modules linked in: tcm_loop dm_service_time dm_multipath target_core_iblock target_core_mod sg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi core
temp kvm_intel kvm ixgbe igb crct10dif_pclmul crc32_pclmul crc32c_intel mdio ptp ghash_clmulni_intel pps_core aesni_intel dca ipmi_si iTCO_wdt glue_helper lrw gf128mul
shpchp ablk_helper ipmi_msghandler cryptd iTCO_vendor_support lpc_ich mfd_core i7core_edac pcspkr edac_core i2c_i801 acpi_power_meter acpi_cpufreq xfs libcrc32c sr_mo
d cdrom sd_mod mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi ata_piix drm libata iomemory_vsl(POE) megaraid_sas i2c_c
ore skd dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tcm_loop]
[ 463.328610] CPU: 3 PID: 29 Comm: ksoftirqd/3 Tainted: P W IOE 4.1.0-rc1.snitm+ #55
[ 463.337923] Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
[ 463.352566] task: ffff880330c60000 ti: ffff880330c5c000 task.ti: ffff880330c5c000
[ 463.360911] RIP: 0010:[<ffffffffa0000bd7>] [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[ 463.370817] RSP: 0018:ffff880330c5fd58 EFLAGS: 00010297
[ 463.376740] RAX: ffff8803301f0000 RBX: ffff8803301f0000 RCX: dead000000200200
[ 463.384698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880310bb8000
[ 463.392654] RBP: ffff880330c5fd68 R08: ffffffffa04cf970 R09: ffff880321bf4d80
[ 463.400610] R10: 0000000000000d71 R11: 000000000000bc00 R12: ffff880310bb8000
[ 463.408568] R13: ffff88032aa1db50 R14: ffff88032bcae800 R15: 0000000000000000
[ 463.416524] FS: 0000000000000000(0000) GS:ffff8803332c0000(0000) knlGS:0000000000000000
[ 463.425548] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 463.431955] CR2: 00000000000000d8 CR3: 000000000198a000 CR4: 00000000000007e0
[ 463.439910] Stack:
[ 463.442150] 00000000fffffffb ffff880310bb8000 ffff880330c5fda8 ffffffffa0001837
[ 463.450434] ffff8803332d7540 ffff88032aa1dbc8 ffff88032aa1db50 ffff880330c5fdc0
[ 463.458719] 0000000000000100 0000000000000000 ffff880330c5fdf8 ffffffff812f1c10
[ 463.467001] Call Trace:
[ 463.469730] [<ffffffffa0001837>] dm_softirq_done+0x137/0x250 [dm_mod]
[ 463.477010] [<ffffffff812f1c10>] blk_done_softirq+0x90/0xc0
[ 463.483321] [<ffffffff81080264>] __do_softirq+0xf4/0x2d0
[ 463.489340] [<ffffffff81080469>] run_ksoftirqd+0x29/0x50
[ 463.495360] [<ffffffff8109eddf>] smpboot_thread_fn+0x12f/0x180
[ 463.501963] [<ffffffff8109ecb0>] ? sort_range+0x30/0x30
[ 463.507886] [<ffffffff8109b448>] kthread+0xd8/0xf0
[ 463.513323] [<ffffffff8109b370>] ? kthread_create_on_node+0x1b0/0x1b0
[ 463.520604] [<ffffffff81683322>] ret_from_fork+0x42/0x70
[ 463.526624] [<ffffffff8109b370>] ? kthread_create_on_node+0x1b0/0x1b0
[ 463.533921] Code: a4 62 67 e1 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 8b 57 30 48 8b 9f 60 01 00 00 <48> 83 ba d8 00
00 00 00 4c 8b 23 74 2c 48 8b 53 08 48 8b 52 08
[ 463.555555] RIP [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[ 463.562846] RSP <ffff880330c5fd58>
[ 463.566733] CR2: 00000000000000d8
[ 463.570426] ---[ end trace 4f71f94089df7bda ]---
[ 465.155661] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.541 msecs
[ 469.833245] Kernel panic - not syncing: Fatal exception in interrupt
[ 469.833257] ------------[ cut here ]------------
> Rough first experience with LIO ;) So I just chalked it up to tcm_loop
> or something not being careful about device lifetime.
Think this is an issue where the cloned request doesn't have a reference
on the underlying device (with above I'm testing with old request_fn
_not_ blk-mq dm-mpath).
Christoph, if I do use blk-mq dm-multpath with:
echo Y > /sys/module/dm_mod/parameters/use_blk_mq
Then run your script; I'm able to reproduce the failure you reported as
soon as I run:
# tcm_loop --unload
sas_target_naa: naa.6001405ac0a3d2eb
sas_target_tpgt: tpgt_1
sas_target_lun: lun_0
Succesfully deleted local virtual SCSI Logical Unit from SAS Target Port
Successfully deleted virtual SCSI I_T Nexus between TCM and Linux/SCSI HBA
Successfully removed NAA based SAS Target Address: /sys/kernel/config/target/loopback/naa.6001405ac0a3d2eb//sys/kernel/config/target/loopback/naa.6001405ac0a3d2eb/tpgt_1
sas_target_naa: naa.6001405b0a839d09
sas_target_tpgt: tpgt_0
sas_target_lun: lun_0
Falls over much like you saw but then ultimately crashes with the same
as above:
[ 232.179074] WARNING: CPU: 9 PID: 2836 at lib/list_debug.c:36 __list_add+0x92/0xc0()
...
[ 232.443124] kobject (ffff880035625fd0): tried to init an initialized object, something is seriously wrong.
...
[ 232.621489] sysfs: cannot create duplicate filename '/devices/virtual/block/dm-13/mq'
...
[ 234.514036] NULL pointer dereference at 00000000000000d8
[ 234.520168] IP: [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[ 234.527363] PGD 0
[ 234.529610] Oops: 0000 [#1] SMP
Basically, the dm-mq error path needs proper testing (multipathd's
reloading of the associated blk-mq request-based DM table is causing
these list_add and duplicate object problems; will hopefully be easy
enough to fix).
But as for the NULL pointer crash, in both cases the underlying paths
are being ripped out from under dm-multipath. That just should not be
possible while IO is still in-flight/completing.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
2015-04-28 13:52 ` Mike Snitzer
@ 2015-04-29 13:20 ` Christoph Hellwig
2015-04-29 13:34 ` Mike Snitzer
1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:20 UTC (permalink / raw)
To: Bart Van Assche
Cc: device-mapper development, Christoph Hellwig, Mike Snitzer
On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> Hello,
>
> Earlier today I started testing an SRP initiator patch series on top of
> Linux kernel v4.1-rc1. Although that patch series works reliably on top of
> kernel v4.0, a test during which I triggered scsi_remove_host() + relogin
> (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> srp_daemon -oaec) triggered the following kernel oops:
Can you try the patch below? From my cursory reading of the dm code
it can have tio->clone allocated for a while before it sets up the ->q
pointer for it:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f8c7ca3..ee74764 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
blk_rq_unprep_clone(clone);
- if (clone->q->mq_ops)
+ if (clone->q && clone->q->mq_ops)
tio->ti->type->release_clone_rq(clone);
else if (!md->queue->mq_ops)
/* request_fn queue stacked on request_fn queue(s) */
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-28 21:54 ` Mike Snitzer
@ 2015-04-29 13:24 ` Christoph Hellwig
2015-04-29 13:43 ` Mike Snitzer
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:24 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, device-mapper development, Christoph Hellwig
On Tue, Apr 28, 2015 at 05:54:57PM -0400, Mike Snitzer wrote:
> Turns out that after running hch's script, with multipathd having setup
> an mpath device ontop of tcm_loop, if I just do:
>
> tcm_loop --unload
>
> (tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)
That has nothing to do with tcm_loop. SCSI devices can get unregistered
any time, and we use refcountin to ensure the structures stay in place
forever.
See my reply to Bart for what I suspect is the real issue.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-29 13:20 ` Christoph Hellwig
@ 2015-04-29 13:34 ` Mike Snitzer
2015-04-29 13:37 ` Christoph Hellwig
2015-04-29 18:53 ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 13:34 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Bart Van Assche, device-mapper development
On Wed, Apr 29 2015 at 9:20am -0400,
Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> > Hello,
> >
> > Earlier today I started testing an SRP initiator patch series on top of
> > Linux kernel v4.1-rc1. Although that patch series works reliably on top of
> > kernel v4.0, a test during which I triggered scsi_remove_host() + relogin
> > (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> > srp_daemon -oaec) triggered the following kernel oops:
>
> Can you try the patch below? From my cursory reading of the dm code
> it can have tio->clone allocated for a while before it sets up the ->q
> pointer for it:
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index f8c7ca3..ee74764 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
>
> blk_rq_unprep_clone(clone);
>
> - if (clone->q->mq_ops)
> + if (clone->q && clone->q->mq_ops)
> tio->ti->type->release_clone_rq(clone);
> else if (!md->queue->mq_ops)
> /* request_fn queue stacked on request_fn queue(s) */
I'm seeing this same crash on the completion path (when using your
tcm_loop script). But for Bart's case his stacktrace included
dm_requeue_unmapped_original_request() -- which if called from
map_request() implies clone->q won't have been initialized given
__multipath_map()'s code for setting up the old request_fn case.
Long story short: your fix is right for Bart's crash (but not the ones
I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
attributed to you and cc'ing stable as needed.
Thanks,
Mike
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-29 13:34 ` Mike Snitzer
@ 2015-04-29 13:37 ` Christoph Hellwig
2015-04-29 18:53 ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:37 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, device-mapper development, Christoph Hellwig
On Wed, Apr 29, 2015 at 09:34:33AM -0400, Mike Snitzer wrote:
> I'm seeing this same crash on the completion path (when using your
> tcm_loop script). But for Bart's case his stacktrace included
> dm_requeue_unmapped_original_request() -- which if called from
> map_request() implies clone->q won't have been initialized given
> __multipath_map()'s code for setting up the old request_fn case.
>
> Long story short: your fix is right for Bart's crash (but not the ones
> I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
> attributed to you and cc'ing stable as needed.
I don't really have a good explanation for your crash then, but no
block driver will zero out ->q on outstanding requests, so I'd look for
the cause for your issue in device mapper. That allocation of request
from a mempool for dm-mpath is something very special that I don't
think anyone else does.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
2015-04-29 13:24 ` Christoph Hellwig
@ 2015-04-29 13:43 ` Mike Snitzer
0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 13:43 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Bart Van Assche, device-mapper development
On Wed, Apr 29 2015 at 9:24am -0400,
Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Apr 28, 2015 at 05:54:57PM -0400, Mike Snitzer wrote:
> > Turns out that after running hch's script, with multipathd having setup
> > an mpath device ontop of tcm_loop, if I just do:
> >
> > tcm_loop --unload
> >
> > (tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)
>
> That has nothing to do with tcm_loop. SCSI devices can get unregistered
> any time, and we use refcountin to ensure the structures stay in place
> forever.
>
> See my reply to Bart for what I suspect is the real issue.
Already saw it, see my reply ;)
Your fix only applies to he case where DM_MAPIO_REQUEUE is returned from
map_request() due to .map_rq or .clone_and_map_rq failing. It doesn't
apply to IO completion, example stack here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00157.html
SCSI unregistering shouldn't nuke an in-flight request's queue
(refcounting should work like you say) -- so my crashes are still
unexplained, no?
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-29 13:34 ` Mike Snitzer
2015-04-29 13:37 ` Christoph Hellwig
@ 2015-04-29 18:53 ` Mike Snitzer
2015-04-29 19:11 ` Bart Van Assche
1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 18:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Bart Van Assche, device-mapper development, Aaro Koskinen
On Wed, Apr 29 2015 at 9:34am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Wed, Apr 29 2015 at 9:20am -0400,
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> > > Hello,
> > >
> > > Earlier today I started testing an SRP initiator patch series on top of
> > > Linux kernel v4.1-rc1. Although that patch series works reliably on top of
> > > kernel v4.0, a test during which I triggered scsi_remove_host() + relogin
> > > (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> > > srp_daemon -oaec) triggered the following kernel oops:
> >
> > Can you try the patch below? From my cursory reading of the dm code
> > it can have tio->clone allocated for a while before it sets up the ->q
> > pointer for it:
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index f8c7ca3..ee74764 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
> >
> > blk_rq_unprep_clone(clone);
> >
> > - if (clone->q->mq_ops)
> > + if (clone->q && clone->q->mq_ops)
> > tio->ti->type->release_clone_rq(clone);
> > else if (!md->queue->mq_ops)
> > /* request_fn queue stacked on request_fn queue(s) */
>
> I'm seeing this same crash on the completion path (when using your
> tcm_loop script). But for Bart's case his stacktrace included
> dm_requeue_unmapped_original_request() -- which if called from
> map_request() implies clone->q won't have been initialized given
> __multipath_map()'s code for setting up the old request_fn case.
>
> Long story short: your fix is right for Bart's crash (but not the ones
> I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
> attributed to you and cc'ing stable as needed.
Actually, here is the proper 4.1-only fix (Bart please verify this works
for you):
From: Mike Snitzer <snitzer@redhat.com>
Date: Wed, 29 Apr 2015 10:48:09 -0400
Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops. It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path
The clone->q check made sense for case #1 but not for #2. However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).
The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3d34b5d..5998c26 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1031,16 +1031,24 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
dm_put(md);
}
-static void free_rq_clone(struct request *clone)
+static void free_rq_clone(struct request *clone, bool must_be_mapped)
{
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
- if (clone->q->mq_ops)
+ WARN_ON_ONCE(must_be_mapped && !clone->q);
+
+ if (md->type == DM_TYPE_MQ_REQUEST_BASED)
+ /* stacked on blk-mq queue(s) */
tio->ti->type->release_clone_rq(clone);
else if (!md->queue->mq_ops)
/* request_fn queue stacked on request_fn queue(s) */
free_clone_request(md, clone);
+ /*
+ * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
+ * no need to call free_clone_request() because we leverage blk-mq by
+ * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
+ */
if (!md->queue->mq_ops)
free_rq_tio(tio);
@@ -1071,7 +1079,7 @@ static void dm_end_request(struct request *clone, int error)
rq->sense_len = clone->sense_len;
}
- free_rq_clone(clone);
+ free_rq_clone(clone, true);
if (!rq->q->mq_ops)
blk_end_request_all(rq, error);
else
@@ -1090,7 +1098,7 @@ static void dm_unprep_request(struct request *rq)
}
if (clone)
- free_rq_clone(clone);
+ free_rq_clone(clone, false);
}
/*
--
2.3.2 (Apple Git-55)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-29 18:53 ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
@ 2015-04-29 19:11 ` Bart Van Assche
2015-04-29 19:53 ` Mike Snitzer
0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-04-29 19:11 UTC (permalink / raw)
To: Mike Snitzer, Christoph Hellwig; +Cc: device-mapper development, Aaro Koskinen
On 04/29/15 20:53, Mike Snitzer wrote:
> Actually, here is the proper 4.1-only fix (Bart please verify this works
> for you):
Hello Mike,
Thanks for the patch. But against which tree has this patch been
generated ? It doesn't seem to apply on v4.1-rc1:
$ git reset --hard v4.1-rc1
HEAD is now at b787f68 Linux 4.1-rc1
$ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
when\ requeueing\ unmapped\ request.eml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file drivers/md/dm.c
Hunk #1 FAILED at 1031.
Hunk #2 succeeded at 1124 (offset 53 lines).
Hunk #3 succeeded at 1143 (offset 53 lines).
1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-29 19:11 ` Bart Van Assche
@ 2015-04-29 19:53 ` Mike Snitzer
2015-04-30 9:07 ` Bart Van Assche
2015-04-30 9:11 ` Aaro Koskinen
0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 19:53 UTC (permalink / raw)
To: Bart Van Assche
Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen
On Wed, Apr 29 2015 at 3:11P -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 04/29/15 20:53, Mike Snitzer wrote:
> >Actually, here is the proper 4.1-only fix (Bart please verify this works
> >for you):
>
> Hello Mike,
>
> Thanks for the patch. But against which tree has this patch been generated ?
> It doesn't seem to apply on v4.1-rc1:
>
> $ git reset --hard v4.1-rc1
> HEAD is now at b787f68 Linux 4.1-rc1
> $ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
> when\ requeueing\ unmapped\ request.eml
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file drivers/md/dm.c
> Hunk #1 FAILED at 1031.
> Hunk #2 succeeded at 1124 (offset 53 lines).
> Hunk #3 succeeded at 1143 (offset 53 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
It was implemented against my "private" wip2 branch (since rebased):
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
WARN_ON_ONCE but I need to research further.. if you guys think that
there are perfectly resonable ways to explain why clone->q is NULL in
the IO completion path then I'm all ears):
From: Mike Snitzer <snitzer@redhat.com>
Date: Wed, 29 Apr 2015 10:48:09 -0400
Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops. It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path
The clone->q check made sense for case #1 but not for #2. However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).
The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
drivers/md/dm.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6754bbd..dfb7bde 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
dm_put(md);
}
-static void free_rq_clone(struct request *clone)
+static void free_rq_clone(struct request *clone, bool must_be_mapped)
{
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
+ WARN_ON_ONCE(must_be_mapped && !clone->q);
+
blk_rq_unprep_clone(clone);
- if (clone->q->mq_ops)
+ if (md->type == DM_TYPE_MQ_REQUEST_BASED)
+ /* stacked on blk-mq queue(s) */
tio->ti->type->release_clone_rq(clone);
else if (!md->queue->mq_ops)
/* request_fn queue stacked on request_fn queue(s) */
free_clone_request(md, clone);
+ /*
+ * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
+ * no need to call free_clone_request() because we leverage blk-mq by
+ * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
+ */
if (!md->queue->mq_ops)
free_rq_tio(tio);
@@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
rq->sense_len = clone->sense_len;
}
- free_rq_clone(clone);
+ free_rq_clone(clone, true);
if (!rq->q->mq_ops)
blk_end_request_all(rq, error);
else
@@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
}
if (clone)
- free_rq_clone(clone);
+ free_rq_clone(clone, false);
}
/*
--
2.3.2 (Apple Git-55)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-29 19:53 ` Mike Snitzer
@ 2015-04-30 9:07 ` Bart Van Assche
2015-04-30 12:57 ` Mike Snitzer
2015-04-30 9:11 ` Aaro Koskinen
1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-04-30 9:07 UTC (permalink / raw)
To: Mike Snitzer; +Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen
On 04/29/15 21:53, Mike Snitzer wrote:
> On Wed, Apr 29 2015 at 3:11P -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> On 04/29/15 20:53, Mike Snitzer wrote:
>>> Actually, here is the proper 4.1-only fix (Bart please verify this works
>>> for you):
>>
>> Hello Mike,
>>
>> Thanks for the patch. But against which tree has this patch been generated ?
>> It doesn't seem to apply on v4.1-rc1:
>>
>> $ git reset --hard v4.1-rc1
>> HEAD is now at b787f68 Linux 4.1-rc1
>> $ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
>> when\ requeueing\ unmapped\ request.eml
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file drivers/md/dm.c
>> Hunk #1 FAILED at 1031.
>> Hunk #2 succeeded at 1124 (offset 53 lines).
>> Hunk #3 succeeded at 1143 (offset 53 lines).
>> 1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
>
> It was implemented against my "private" wip2 branch (since rebased):
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
>
> Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> WARN_ON_ONCE but I need to research further.. if you guys think that
> there are perfectly resonable ways to explain why clone->q is NULL in
> the IO completion path then I'm all ears):
>
> From: Mike Snitzer <snitzer@redhat.com>
> Date: Wed, 29 Apr 2015 10:48:09 -0400
> Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
>
> Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
> using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
> before testing clone->q->mq_ops. It was an oversight to discontinue
> that check for 1 of the 2 use-cases for free_rq_clone():
> 1) free_rq_clone() called when an unmapped original request is requeued
> 2) free_rq_clone() called in the request-based IO completion path
>
> The clone->q check made sense for case #1 but not for #2. However, we
> cannot just reinstate the check as it'd mask a serious bug in the IO
> completion case #2 -- no in-flight request should have an uninitialized
> request_queue (basic block layer refcounting _should_ ensure this).
>
> The NULL pointer seen for case #1 is detailed here:
> https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
>
> Fix this free_rq_clone() NULL pointer by simply checking if the
> mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
> blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
> dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
> uninitialized clone request is being completed.
>
> Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
> drivers/md/dm.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 6754bbd..dfb7bde 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
> dm_put(md);
> }
>
> -static void free_rq_clone(struct request *clone)
> +static void free_rq_clone(struct request *clone, bool must_be_mapped)
> {
> struct dm_rq_target_io *tio = clone->end_io_data;
> struct mapped_device *md = tio->md;
>
> + WARN_ON_ONCE(must_be_mapped && !clone->q);
> +
> blk_rq_unprep_clone(clone);
>
> - if (clone->q->mq_ops)
> + if (md->type == DM_TYPE_MQ_REQUEST_BASED)
> + /* stacked on blk-mq queue(s) */
> tio->ti->type->release_clone_rq(clone);
> else if (!md->queue->mq_ops)
> /* request_fn queue stacked on request_fn queue(s) */
> free_clone_request(md, clone);
> + /*
> + * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
> + * no need to call free_clone_request() because we leverage blk-mq by
> + * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
> + */
>
> if (!md->queue->mq_ops)
> free_rq_tio(tio);
> @@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
> rq->sense_len = clone->sense_len;
> }
>
> - free_rq_clone(clone);
> + free_rq_clone(clone, true);
> if (!rq->q->mq_ops)
> blk_end_request_all(rq, error);
> else
> @@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
> }
>
> if (clone)
> - free_rq_clone(clone);
> + free_rq_clone(clone, false);
> }
>
> /*
Hello Mike,
This patch survives my SRP initiator tests without triggering any kernel
warning.
Thanks !
Bart.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-29 19:53 ` Mike Snitzer
2015-04-30 9:07 ` Bart Van Assche
@ 2015-04-30 9:11 ` Aaro Koskinen
2015-04-30 12:56 ` Mike Snitzer
1 sibling, 1 reply; 15+ messages in thread
From: Aaro Koskinen @ 2015-04-30 9:11 UTC (permalink / raw)
To: Mike Snitzer
Cc: Bart Van Assche, device-mapper development, Christoph Hellwig
Hi,
On Wed, Apr 29, 2015 at 03:53:42PM -0400, Mike Snitzer wrote:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
>
> Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> WARN_ON_ONCE but I need to research further.. if you guys think that
> there are perfectly resonable ways to explain why clone->q is NULL in
> the IO completion path then I'm all ears):
This fixes the crash I'm seeing, but the WARN ON is still triggering
on almost (*) every boot. I'm using rootfs where multipathd is built
and started with the default configuration it ships with, and it looks
like this:
[ OK ] Started Device-Mapper Multipath Device Controller.
[ OK ] Started Network Service.
Starting Network Name Resolution...
[ OK ] Reached target Network.
Starting GlusterFS, a clustered file-system server...
[ 16.562604] device-mapper: multipath service-time: version 0.2.0 loaded
[ 16.586067] device-mapper: table: 253:0: multipath: error getting device
[ 16.586428] device-mapper: ioctl: error adding target to table
[ 16.679048] device-mapper: multipath: Failing path 8:16.
[ OK ] Started Network Name Resolution.
[* ] A start job is running for GlusterF...le-system server (13s / 5min 7s)
[...]
[ 23.034550] ------------[ cut here ]------------
[ 23.035525] WARNING: CPU: 0 PID: 3 at /home/aakoskin/linux/drivers/md/dm.c:1090 free_rq_clone+0xbc/0x130 [dm_mod]()
[...]
[ 23.041885] Call Trace:
[ 23.042064] [<ffffffff8010bc90>] show_stack+0x78/0x90
[ 23.042505] [<ffffffff80133764>] warn_slowpath_common+0xa4/0xe0
[ 23.043019] [<ffffffffc000e37c>] free_rq_clone+0xbc/0x130 [dm_mod]
[ 23.043412] [<ffffffffc000e830>] dm_softirq_done+0x198/0x2c0 [dm_mod]
[ 23.043775] [<ffffffff803388dc>] blk_done_softirq+0xac/0xc0
[ 23.044076] [<ffffffff80136894>] __do_softirq+0x174/0x368
[ 23.044376] [<ffffffff80136af8>] run_ksoftirqd+0x70/0xa8
[ 23.044668] [<ffffffff8015604c>] smpboot_thread_fn+0x1bc/0x1c8
[ 23.044980] [<ffffffff80152440>] kthread+0xe0/0xf8
[ 23.045247] [<ffffffff80105768>] ret_from_kernel_thread+0x14/0x1c
[ 23.045673]
[ 23.045824] ---[ end trace e0e5377c5d7b858b ]---
[ 23.046326] blk_update_request: I/O error, dev dm-0, sector 0
[ 23.056271] blk_update_request: I/O error, dev dm-0, sector 0
[ 23.056745] Buffer I/O error on dev dm-0, logical block 0, async page read
[ 23.070427] blk_update_request: I/O error, dev dm-0, sector 0
[ 23.070833] Buffer I/O error on dev dm-0, logical block 0, async page read
(*) Strange thing is that it only happens when my test bot is booting
the system. With interactive console it's OK without any I/O errors.
A.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-30 9:11 ` Aaro Koskinen
@ 2015-04-30 12:56 ` Mike Snitzer
0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-30 12:56 UTC (permalink / raw)
To: Aaro Koskinen
Cc: Bart Van Assche, device-mapper development, Christoph Hellwig
On Thu, Apr 30 2015 at 5:11am -0400,
Aaro Koskinen <aaro.koskinen@nokia.com> wrote:
> Hi,
>
> On Wed, Apr 29, 2015 at 03:53:42PM -0400, Mike Snitzer wrote:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
> >
> > Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> > WARN_ON_ONCE but I need to research further.. if you guys think that
> > there are perfectly resonable ways to explain why clone->q is NULL in
> > the IO completion path then I'm all ears):
>
> This fixes the crash I'm seeing, but the WARN ON is still triggering
> on almost (*) every boot. I'm using rootfs where multipathd is built
> and started with the default configuration it ships with, and it looks
> like this:
Can you show multipath -ll for the device in question? Are you saying
that you're using multipath for the root device?
Do you have the scsi_dh module that the device uses getting preloaded
at boot? (e.g. add "rdloaddriver=scsi_dh_alua" to the grub kernel
commandline). Alternatively the relevant scsi_dh can just be built-in
to the kernel, that way it'll always get attached when the SCSI device
scan occurs.
> [ OK ] Started Device-Mapper Multipath Device Controller.
> [ OK ] Started Network Service.
> Starting Network Name Resolution...
> [ OK ] Reached target Network.
> Starting GlusterFS, a clustered file-system server...
> [ 16.562604] device-mapper: multipath service-time: version 0.2.0 loaded
> [ 16.586067] device-mapper: table: 253:0: multipath: error getting device
> [ 16.586428] device-mapper: ioctl: error adding target to table
> [ 16.679048] device-mapper: multipath: Failing path 8:16.
> [ OK ] Started Network Name Resolution.
> [* ] A start job is running for GlusterF...le-system server (13s / 5min 7s)
> [...]
> [ 23.034550] ------------[ cut here ]------------
> [ 23.035525] WARNING: CPU: 0 PID: 3 at /home/aakoskin/linux/drivers/md/dm.c:1090 free_rq_clone+0xbc/0x130 [dm_mod]()
> [...]
> [ 23.041885] Call Trace:
> [ 23.042064] [<ffffffff8010bc90>] show_stack+0x78/0x90
> [ 23.042505] [<ffffffff80133764>] warn_slowpath_common+0xa4/0xe0
> [ 23.043019] [<ffffffffc000e37c>] free_rq_clone+0xbc/0x130 [dm_mod]
> [ 23.043412] [<ffffffffc000e830>] dm_softirq_done+0x198/0x2c0 [dm_mod]
> [ 23.043775] [<ffffffff803388dc>] blk_done_softirq+0xac/0xc0
> [ 23.044076] [<ffffffff80136894>] __do_softirq+0x174/0x368
> [ 23.044376] [<ffffffff80136af8>] run_ksoftirqd+0x70/0xa8
> [ 23.044668] [<ffffffff8015604c>] smpboot_thread_fn+0x1bc/0x1c8
> [ 23.044980] [<ffffffff80152440>] kthread+0xe0/0xf8
> [ 23.045247] [<ffffffff80105768>] ret_from_kernel_thread+0x14/0x1c
> [ 23.045673]
> [ 23.045824] ---[ end trace e0e5377c5d7b858b ]---
> [ 23.046326] blk_update_request: I/O error, dev dm-0, sector 0
> [ 23.056271] blk_update_request: I/O error, dev dm-0, sector 0
> [ 23.056745] Buffer I/O error on dev dm-0, logical block 0, async page read
> [ 23.070427] blk_update_request: I/O error, dev dm-0, sector 0
> [ 23.070833] Buffer I/O error on dev dm-0, logical block 0, async page read
>
> (*) Strange thing is that it only happens when my test bot is booting
> the system. With interactive console it's OK without any I/O errors.
>
> A.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
2015-04-30 9:07 ` Bart Van Assche
@ 2015-04-30 12:57 ` Mike Snitzer
0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-30 12:57 UTC (permalink / raw)
To: Bart Van Assche
Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen
On Thu, Apr 30 2015 at 5:07am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> On 04/29/15 21:53, Mike Snitzer wrote:
> >On Wed, Apr 29 2015 at 3:11P -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >>On 04/29/15 20:53, Mike Snitzer wrote:
> >>>Actually, here is the proper 4.1-only fix (Bart please verify this works
> >>>for you):
> >>
> >>Hello Mike,
> >>
> >>Thanks for the patch. But against which tree has this patch been generated ?
> >>It doesn't seem to apply on v4.1-rc1:
> >>
> >>$ git reset --hard v4.1-rc1
> >>HEAD is now at b787f68 Linux 4.1-rc1
> >>$ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
> >>when\ requeueing\ unmapped\ request.eml
> >>(Stripping trailing CRs from patch; use --binary to disable.)
> >>patching file drivers/md/dm.c
> >>Hunk #1 FAILED at 1031.
> >>Hunk #2 succeeded at 1124 (offset 53 lines).
> >>Hunk #3 succeeded at 1143 (offset 53 lines).
> >>1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
> >
> >It was implemented against my "private" wip2 branch (since rebased):
> >http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
> >
> >Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> >WARN_ON_ONCE but I need to research further.. if you guys think that
> >there are perfectly resonable ways to explain why clone->q is NULL in
> >the IO completion path then I'm all ears):
> >
> >From: Mike Snitzer <snitzer@redhat.com>
> >Date: Wed, 29 Apr 2015 10:48:09 -0400
> >Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
> >
> >Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
> >using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
> >before testing clone->q->mq_ops. It was an oversight to discontinue
> >that check for 1 of the 2 use-cases for free_rq_clone():
> >1) free_rq_clone() called when an unmapped original request is requeued
> >2) free_rq_clone() called in the request-based IO completion path
> >
> >The clone->q check made sense for case #1 but not for #2. However, we
> >cannot just reinstate the check as it'd mask a serious bug in the IO
> >completion case #2 -- no in-flight request should have an uninitialized
> >request_queue (basic block layer refcounting _should_ ensure this).
> >
> >The NULL pointer seen for case #1 is detailed here:
> >https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
> >
> >Fix this free_rq_clone() NULL pointer by simply checking if the
> >mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
> >blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
> >dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
> >uninitialized clone request is being completed.
> >
> >Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> >---
> > drivers/md/dm.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >index 6754bbd..dfb7bde 100644
> >--- a/drivers/md/dm.c
> >+++ b/drivers/md/dm.c
> >@@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
> > dm_put(md);
> > }
> >
> >-static void free_rq_clone(struct request *clone)
> >+static void free_rq_clone(struct request *clone, bool must_be_mapped)
> > {
> > struct dm_rq_target_io *tio = clone->end_io_data;
> > struct mapped_device *md = tio->md;
> >
> >+ WARN_ON_ONCE(must_be_mapped && !clone->q);
> >+
> > blk_rq_unprep_clone(clone);
> >
> >- if (clone->q->mq_ops)
> >+ if (md->type == DM_TYPE_MQ_REQUEST_BASED)
> >+ /* stacked on blk-mq queue(s) */
> > tio->ti->type->release_clone_rq(clone);
> > else if (!md->queue->mq_ops)
> > /* request_fn queue stacked on request_fn queue(s) */
> > free_clone_request(md, clone);
> >+ /*
> >+ * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
> >+ * no need to call free_clone_request() because we leverage blk-mq by
> >+ * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
> >+ */
> >
> > if (!md->queue->mq_ops)
> > free_rq_tio(tio);
> >@@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
> > rq->sense_len = clone->sense_len;
> > }
> >
> >- free_rq_clone(clone);
> >+ free_rq_clone(clone, true);
> > if (!rq->q->mq_ops)
> > blk_end_request_all(rq, error);
> > else
> >@@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
> > }
> >
> > if (clone)
> >- free_rq_clone(clone);
> >+ free_rq_clone(clone, false);
> > }
> >
> > /*
>
> Hello Mike,
>
> This patch survives my SRP initiator tests without triggering any
> kernel warning.
Great.
> Thanks !
No problem, thanks for testing.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-04-30 12:57 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
2015-04-28 13:52 ` Mike Snitzer
2015-04-28 21:54 ` Mike Snitzer
2015-04-29 13:24 ` Christoph Hellwig
2015-04-29 13:43 ` Mike Snitzer
2015-04-29 13:20 ` Christoph Hellwig
2015-04-29 13:34 ` Mike Snitzer
2015-04-29 13:37 ` Christoph Hellwig
2015-04-29 18:53 ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
2015-04-29 19:11 ` Bart Van Assche
2015-04-29 19:53 ` Mike Snitzer
2015-04-30 9:07 ` Bart Van Assche
2015-04-30 12:57 ` Mike Snitzer
2015-04-30 9:11 ` Aaro Koskinen
2015-04-30 12:56 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.