Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops

All of lore.kernel.org
 help / color / mirror / Atom feed

* Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
@ 2015-04-28 11:52 Bart Van Assche
  2015-04-28 13:52 ` Mike Snitzer
  2015-04-29 13:20 ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-04-28 11:52 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig; +Cc: device-mapper development

Hello,

Earlier today I started testing an SRP initiator patch series on top of 
Linux kernel v4.1-rc1. Although that patch series works reliably on top 
of kernel v4.0, a test during which I triggered scsi_remove_host() + 
relogin (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & 
done; wait; srp_daemon -oaec) triggered the following kernel oops:

device-mapper: multipath: Failing path 8:0.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: dm_queue_length scsi_dh_alua dm_round_robin 
dm_multipath scsi_dh dm_mod sd_mod ib_uverbs mlx4_ib ib_umad netconsole 
ib_srp scsi_transport_srp configfs ib_iser rdma_cm iw_cm ib_cm ib_sa 
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi hid_generic af_packet sg microcode sr_mod i2c_piix4 
usbhid hid intel_agp intel_gtt i2c_core cdrom virtio_balloon agpgart 
acpi_cpufreq processor thermal_sys fuse hwmon button ata_generic 
pata_acpi mlx4_en ptp pps_core ext4 crc16 jbd2 mbcache virtio_blk 
virtio_net ata_piix libata uhci_hcd virtio_pci virtio_ring mlx4_core 
usbcore virtio scsi_mod usb_common
CPU: 1 PID: 5423 Comm: kdmwork-252:0 Not tainted 4.1.0-rc1-debug+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880056568000 ti: ffff8800516c0000 task.ti: ffff8800516c0000
RIP: 0010:[<ffffffffa045f8e9>]  [<ffffffffa045f8e9>] 
free_rq_clone+0x29/0xb0 [dm_mod]
RSP: 0018:ffff8800516c3d18  EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff880058bb7040 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8179d723 RDI: 0000000000000000
RBP: ffff8800516c3d38 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880058bb6ff0
R13: ffff880051570948 R14: 0000000000000000 R15: ffffc900000b7040
FS:  0000000000000000(0000) GS:ffff88005fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000138 CR3: 000000005dab8000 CR4: 00000000000007e0
Stack:
  0000000000000001 ffff880058bb6e80 ffff8800342ab0a0 ffff880051570948
  ffff8800516c3d78 ffffffffa045fa67 ffff880058bb6e80 ffff880058bb6ff0
  0000000000000002 ffff880058bb6e80 ffff880051570c60 ffffc900000b7040
Call Trace:
  [<ffffffffa045fa67>] dm_requeue_unmapped_original_request+0x47/0xe0 
[dm_mod]
  [<ffffffffa0461daf>] map_request.isra.35+0x9f/0x430 [dm_mod]
  [<ffffffff814ff2d0>] ? _raw_spin_unlock_irq+0x30/0x70
  [<ffffffffa0462166>] map_tio_request+0x26/0x40 [dm_mod]
  [<ffffffff8108382f>] kthread_worker_fn+0xaf/0x200
  [<ffffffff81083780>] ? __init_kthread_worker+0x60/0x60
  [<ffffffff8108370a>] kthread+0x10a/0x120
  [<ffffffff814ff2d0>] ? _raw_spin_unlock_irq+0x30/0x70
  [<ffffffff81083600>] ? kthread_create_on_node+0x220/0x220
  [<ffffffff815001a2>] ret_from_fork+0x42/0x70
  [<ffffffff81083600>] ? kthread_create_on_node+0x220/0x220
Code: 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 
08 4c 8b a7 60 01 00 00 4d 8b 2c 24 e8 bb 6e e0 e0 48 8b 43 30 <48> 83 
b8 38 01 00 00 00 74 2d 49 8b 44 24 08 48 89 df 48 8b 40
RIP  [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
  RSP <ffff8800516c3d18>
CR2: 0000000000000138
---[ end trace ba598d96706a7353 ]---

In case anyone wants to see the translation of the crash address:

(gdb) list *(free_rq_clone+0x29)
0x919 is in free_rq_clone (drivers/md/dm.c:1092).
1087            struct dm_rq_target_io *tio = clone->end_io_data;
1088            struct mapped_device *md = tio->md;
1089
1090            blk_rq_unprep_clone(clone);
1091
1092            if (clone->q->mq_ops)
1093                    tio->ti->type->release_clone_rq(clone);
1094            else if (!md->queue->mq_ops)
1095                    /* request_fn queue stacked on request_fn 
queue(s) */
1096                    free_clone_request(md, clone);
(gdb) list *(dm_requeue_unmapped_original_request+0x47)
0xa97 is in dm_requeue_unmapped_original_request (drivers/md/dm.c:1146).
1141                    rq->special = NULL;
1142                    rq->cmd_flags &= ~REQ_DONTPREP;
1143            }
1144
1145            if (clone)
1146                    free_rq_clone(clone);
1147    }
1148
1149    /*
1150     * Requeue the original request of a clone.

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
@ 2015-04-28 13:52 ` Mike Snitzer
  2015-04-28 21:54   ` Mike Snitzer
  2015-04-29 13:20 ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-28 13:52 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Christoph Hellwig

On Tue, Apr 28 2015 at  7:52am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Hello,
> 
> Earlier today I started testing an SRP initiator patch series on top
> of Linux kernel v4.1-rc1. Although that patch series works reliably
> on top of kernel v4.0, a test during which I triggered
> scsi_remove_host() + relogin (for p in
> /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> srp_daemon -oaec) triggered the following kernel oops:
> 
> device-mapper: multipath: Failing path 8:0.
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
> IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
...
 
> In case anyone wants to see the translation of the crash address:
> 
> (gdb) list *(free_rq_clone+0x29)
> 0x919 is in free_rq_clone (drivers/md/dm.c:1092).
> 1087            struct dm_rq_target_io *tio = clone->end_io_data;
> 1088            struct mapped_device *md = tio->md;
> 1089
> 1090            blk_rq_unprep_clone(clone);
> 1091
> 1092            if (clone->q->mq_ops)
> 1093                    tio->ti->type->release_clone_rq(clone);
> 1094            else if (!md->queue->mq_ops)
> 1095                    /* request_fn queue stacked on request_fn
> queue(s) */
> 1096                    free_clone_request(md, clone);

I saw a crash like this yesterday with 4.1-rc1 (definitely due to
clone->q being NULL) but I didn't get a full backtrace over serial
console so I cannot be sure it is exactly like yours.

In my case I was using hch's lio-utils based test setup that he
documented here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00138.html

But I got the crash the first time I ran this script:
multipathd -F
tcm_loop --unload
tcm_node --freedev iblock_0/array

Rough first experience with LIO ;)  So I just chalked it up to tcm_loop
or something not being careful about device lifetime.

So we now have 2 data points (each using different storage backend).  I
haven't been able to reproduce the issue again though -- but I switch
away from using multipathd to create the multipath device and resorted
to using dmsetup directly (with a dmsetup remove for cleanup instead of
multipath -F).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-28 13:52 ` Mike Snitzer
@ 2015-04-28 21:54   ` Mike Snitzer
  2015-04-29 13:24     ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-28 21:54 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Christoph Hellwig

On Tue, Apr 28 2015 at  9:52am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Apr 28 2015 at  7:52am -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > Hello,
> > 
> > Earlier today I started testing an SRP initiator patch series on top
> > of Linux kernel v4.1-rc1. Although that patch series works reliably
> > on top of kernel v4.0, a test during which I triggered
> > scsi_remove_host() + relogin (for p in
> > /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> > srp_daemon -oaec) triggered the following kernel oops:
> > 
> > device-mapper: multipath: Failing path 8:0.
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
> > IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
> ...
>  
> > In case anyone wants to see the translation of the crash address:
> > 
> > (gdb) list *(free_rq_clone+0x29)
> > 0x919 is in free_rq_clone (drivers/md/dm.c:1092).
> > 1087            struct dm_rq_target_io *tio = clone->end_io_data;
> > 1088            struct mapped_device *md = tio->md;
> > 1089
> > 1090            blk_rq_unprep_clone(clone);
> > 1091
> > 1092            if (clone->q->mq_ops)
> > 1093                    tio->ti->type->release_clone_rq(clone);
> > 1094            else if (!md->queue->mq_ops)
> > 1095                    /* request_fn queue stacked on request_fn
> > queue(s) */
> > 1096                    free_clone_request(md, clone);
> 
> I saw a crash like this yesterday with 4.1-rc1 (definitely due to
> clone->q being NULL) but I didn't get a full backtrace over serial
> console so I cannot be sure it is exactly like yours.
> 
> In my case I was using hch's lio-utils based test setup that he
> documented here:
> https://www.redhat.com/archives/dm-devel/2015-April/msg00138.html
> 
> But I got the crash the first time I ran this script:
> multipathd -F
> tcm_loop --unload
> tcm_node --freedev iblock_0/array

Turns out that after running hch's script, with multipathd having setup
an mpath device ontop of tcm_loop, if I just do:

tcm_loop --unload

(tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)

I get:

[  463.168175] sd 10:0:1:0: alua: Detached
[  463.172808] sd 10:0:1:0: [sdk] Synchronizing SCSI cache
[  463.211388] device-mapper: multipath: Failing path 8:144.
[  463.211494] sd 9:0:0:0: alua: Detached
[  463.211621] sd 9:0:0:0: [sdj] Synchronizing SCSI cache
[  463.227362] blk_update_request: I/O error, dev dm-13, sector 32
[  463.234238] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8
[  463.242990] IP: [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[  463.250188] PGD 0
[  463.252437] Oops: 0000 [#1] SMP
[  463.256051] Modules linked in: tcm_loop dm_service_time dm_multipath target_core_iblock target_core_mod sg iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi core
temp kvm_intel kvm ixgbe igb crct10dif_pclmul crc32_pclmul crc32c_intel mdio ptp ghash_clmulni_intel pps_core aesni_intel dca ipmi_si iTCO_wdt glue_helper lrw gf128mul
 shpchp ablk_helper ipmi_msghandler cryptd iTCO_vendor_support lpc_ich mfd_core i7core_edac pcspkr edac_core i2c_i801 acpi_power_meter acpi_cpufreq xfs libcrc32c sr_mo
d cdrom sd_mod mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi ata_piix drm libata iomemory_vsl(POE) megaraid_sas i2c_c
ore skd dm_mirror dm_region_hash dm_log dm_mod [last unloaded: tcm_loop]
[  463.328610] CPU: 3 PID: 29 Comm: ksoftirqd/3 Tainted: P        W IOE   4.1.0-rc1.snitm+ #55
[  463.337923] Hardware name: FUJITSU                          PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.10.2619.N1           05/24/2011
[  463.352566] task: ffff880330c60000 ti: ffff880330c5c000 task.ti: ffff880330c5c000
[  463.360911] RIP: 0010:[<ffffffffa0000bd7>]  [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[  463.370817] RSP: 0018:ffff880330c5fd58  EFLAGS: 00010297
[  463.376740] RAX: ffff8803301f0000 RBX: ffff8803301f0000 RCX: dead000000200200
[  463.384698] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880310bb8000
[  463.392654] RBP: ffff880330c5fd68 R08: ffffffffa04cf970 R09: ffff880321bf4d80
[  463.400610] R10: 0000000000000d71 R11: 000000000000bc00 R12: ffff880310bb8000
[  463.408568] R13: ffff88032aa1db50 R14: ffff88032bcae800 R15: 0000000000000000
[  463.416524] FS:  0000000000000000(0000) GS:ffff8803332c0000(0000) knlGS:0000000000000000
[  463.425548] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  463.431955] CR2: 00000000000000d8 CR3: 000000000198a000 CR4: 00000000000007e0
[  463.439910] Stack:
[  463.442150]  00000000fffffffb ffff880310bb8000 ffff880330c5fda8 ffffffffa0001837
[  463.450434]  ffff8803332d7540 ffff88032aa1dbc8 ffff88032aa1db50 ffff880330c5fdc0
[  463.458719]  0000000000000100 0000000000000000 ffff880330c5fdf8 ffffffff812f1c10
[  463.467001] Call Trace:
[  463.469730]  [<ffffffffa0001837>] dm_softirq_done+0x137/0x250 [dm_mod]
[  463.477010]  [<ffffffff812f1c10>] blk_done_softirq+0x90/0xc0
[  463.483321]  [<ffffffff81080264>] __do_softirq+0xf4/0x2d0
[  463.489340]  [<ffffffff81080469>] run_ksoftirqd+0x29/0x50
[  463.495360]  [<ffffffff8109eddf>] smpboot_thread_fn+0x12f/0x180
[  463.501963]  [<ffffffff8109ecb0>] ? sort_range+0x30/0x30
[  463.507886]  [<ffffffff8109b448>] kthread+0xd8/0xf0
[  463.513323]  [<ffffffff8109b370>] ? kthread_create_on_node+0x1b0/0x1b0
[  463.520604]  [<ffffffff81683322>] ret_from_fork+0x42/0x70
[  463.526624]  [<ffffffff8109b370>] ? kthread_create_on_node+0x1b0/0x1b0
[  463.533921] Code: a4 62 67 e1 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 8b 57 30 48 8b 9f 60 01 00 00 <48> 83 ba d8 00
00 00 00 4c 8b 23 74 2c 48 8b 53 08 48 8b 52 08
[  463.555555] RIP  [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[  463.562846]  RSP <ffff880330c5fd58>
[  463.566733] CR2: 00000000000000d8
[  463.570426] ---[ end trace 4f71f94089df7bda ]---
[  465.155661] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2.541 msecs
[  469.833245] Kernel panic - not syncing: Fatal exception in interrupt
[  469.833257] ------------[ cut here ]------------

> Rough first experience with LIO ;)  So I just chalked it up to tcm_loop
> or something not being careful about device lifetime.

Think this is an issue where the cloned request doesn't have a reference
on the underlying device (with above I'm testing with old request_fn
_not_ blk-mq dm-mpath).

Christoph, if I do use blk-mq dm-multpath with:
echo Y > /sys/module/dm_mod/parameters/use_blk_mq

Then run your script; I'm able to reproduce the failure you reported as
soon as I run:

# tcm_loop --unload
sas_target_naa: naa.6001405ac0a3d2eb
sas_target_tpgt: tpgt_1
sas_target_lun: lun_0
Succesfully deleted local virtual SCSI Logical Unit from SAS Target Port
Successfully deleted virtual SCSI I_T Nexus between TCM and Linux/SCSI HBA
Successfully removed NAA based SAS Target Address: /sys/kernel/config/target/loopback/naa.6001405ac0a3d2eb//sys/kernel/config/target/loopback/naa.6001405ac0a3d2eb/tpgt_1
sas_target_naa: naa.6001405b0a839d09
sas_target_tpgt: tpgt_0
sas_target_lun: lun_0

Falls over much like you saw but then ultimately crashes with the same
as above:

[  232.179074] WARNING: CPU: 9 PID: 2836 at lib/list_debug.c:36 __list_add+0x92/0xc0()
...
[  232.443124] kobject (ffff880035625fd0): tried to init an initialized object, something is seriously wrong.
...
[  232.621489] sysfs: cannot create duplicate filename '/devices/virtual/block/dm-13/mq'
...
[  234.514036] NULL pointer dereference at 00000000000000d8
[  234.520168] IP: [<ffffffffa0000bd7>] free_rq_clone+0x17/0x90 [dm_mod]
[  234.527363] PGD 0
[  234.529610] Oops: 0000 [#1] SMP

Basically, the dm-mq error path needs proper testing (multipathd's
reloading of the associated blk-mq request-based DM table is causing
these list_add and duplicate object problems; will hopefully be easy
enough to fix).

But as for the NULL pointer crash, in both cases the underlying paths
are being ripped out from under dm-multipath.  That just should not be
possible while IO is still in-flight/completing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-28 21:54   ` Mike Snitzer
@ 2015-04-29 13:24     ` Christoph Hellwig
  2015-04-29 13:43       ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:24 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, device-mapper development, Christoph Hellwig

On Tue, Apr 28, 2015 at 05:54:57PM -0400, Mike Snitzer wrote:
> Turns out that after running hch's script, with multipathd having setup
> an mpath device ontop of tcm_loop, if I just do:
> 
> tcm_loop --unload
> 
> (tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)

That has nothing to do with tcm_loop.  SCSI devices can get unregistered
any time, and we use refcountin to ensure the structures stay in place
forever.

See my reply to Bart for what I suspect is the real issue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-29 13:24     ` Christoph Hellwig
@ 2015-04-29 13:43       ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 13:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, device-mapper development

On Wed, Apr 29 2015 at  9:24am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Apr 28, 2015 at 05:54:57PM -0400, Mike Snitzer wrote:
> > Turns out that after running hch's script, with multipathd having setup
> > an mpath device ontop of tcm_loop, if I just do:
> > 
> > tcm_loop --unload
> > 
> > (tcm_loop appears to allow me to remove the devices out from underneath dm-multipath)
> 
> That has nothing to do with tcm_loop.  SCSI devices can get unregistered
> any time, and we use refcountin to ensure the structures stay in place
> forever.
> 
> See my reply to Bart for what I suspect is the real issue.

Already saw it, see my reply ;)

Your fix only applies to he case where DM_MAPIO_REQUEUE is returned from
map_request() due to .map_rq or .clone_and_map_rq failing.  It doesn't
apply to IO completion, example stack here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00157.html

SCSI unregistering shouldn't nuke an in-flight request's queue
(refcounting should work like you say) -- so my crashes are still
unexplained, no?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
  2015-04-28 13:52 ` Mike Snitzer
@ 2015-04-29 13:20 ` Christoph Hellwig
  2015-04-29 13:34   ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: device-mapper development, Christoph Hellwig, Mike Snitzer

On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> Hello,
>
> Earlier today I started testing an SRP initiator patch series on top of 
> Linux kernel v4.1-rc1. Although that patch series works reliably on top of 
> kernel v4.0, a test during which I triggered scsi_remove_host() + relogin 
> (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait; 
> srp_daemon -oaec) triggered the following kernel oops:

Can you try the patch below?  From my cursory reading of the dm code
it can have tio->clone allocated for a while before it sets up the ->q
pointer for it:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f8c7ca3..ee74764 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
 
 	blk_rq_unprep_clone(clone);
 
-	if (clone->q->mq_ops)
+	if (clone->q && clone->q->mq_ops)
 		tio->ti->type->release_clone_rq(clone);
 	else if (!md->queue->mq_ops)
 		/* request_fn queue stacked on request_fn queue(s) */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-29 13:20 ` Christoph Hellwig
@ 2015-04-29 13:34   ` Mike Snitzer
  2015-04-29 13:37     ` Christoph Hellwig
  2015-04-29 18:53     ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 13:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Bart Van Assche, device-mapper development

On Wed, Apr 29 2015 at  9:20am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> > Hello,
> >
> > Earlier today I started testing an SRP initiator patch series on top of 
> > Linux kernel v4.1-rc1. Although that patch series works reliably on top of 
> > kernel v4.0, a test during which I triggered scsi_remove_host() + relogin 
> > (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait; 
> > srp_daemon -oaec) triggered the following kernel oops:
> 
> Can you try the patch below?  From my cursory reading of the dm code
> it can have tio->clone allocated for a while before it sets up the ->q
> pointer for it:
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index f8c7ca3..ee74764 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
>  
>  	blk_rq_unprep_clone(clone);
>  
> -	if (clone->q->mq_ops)
> +	if (clone->q && clone->q->mq_ops)
>  		tio->ti->type->release_clone_rq(clone);
>  	else if (!md->queue->mq_ops)
>  		/* request_fn queue stacked on request_fn queue(s) */

I'm seeing this same crash on the completion path (when using your
tcm_loop script).  But for Bart's case his stacktrace included
dm_requeue_unmapped_original_request() -- which if called from
map_request() implies clone->q won't have been initialized given
__multipath_map()'s code for setting up the old request_fn case.

Long story short: your fix is right for Bart's crash (but not the ones
I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
attributed to you and cc'ing stable as needed.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops
  2015-04-29 13:34   ` Mike Snitzer
@ 2015-04-29 13:37     ` Christoph Hellwig
  2015-04-29 18:53     ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:37 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, device-mapper development, Christoph Hellwig

On Wed, Apr 29, 2015 at 09:34:33AM -0400, Mike Snitzer wrote:
> I'm seeing this same crash on the completion path (when using your
> tcm_loop script).  But for Bart's case his stacktrace included
> dm_requeue_unmapped_original_request() -- which if called from
> map_request() implies clone->q won't have been initialized given
> __multipath_map()'s code for setting up the old request_fn case.
> 
> Long story short: your fix is right for Bart's crash (but not the ones
> I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
> attributed to you and cc'ing stable as needed.

I don't really have a good explanation for your crash then, but no
block driver will zero out ->q on outstanding requests, so I'd look for
the cause for your issue in device mapper.  That allocation of request
from a mempool for dm-mpath is something very special that I don't
think anyone else does.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-29 13:34   ` Mike Snitzer
  2015-04-29 13:37     ` Christoph Hellwig
@ 2015-04-29 18:53     ` Mike Snitzer
  2015-04-29 19:11       ` Bart Van Assche
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 18:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, device-mapper development, Aaro Koskinen

On Wed, Apr 29 2015 at  9:34am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Wed, Apr 29 2015 at  9:20am -0400,
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> > > Hello,
> > >
> > > Earlier today I started testing an SRP initiator patch series on top of 
> > > Linux kernel v4.1-rc1. Although that patch series works reliably on top of 
> > > kernel v4.0, a test during which I triggered scsi_remove_host() + relogin 
> > > (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait; 
> > > srp_daemon -oaec) triggered the following kernel oops:
> > 
> > Can you try the patch below?  From my cursory reading of the dm code
> > it can have tio->clone allocated for a while before it sets up the ->q
> > pointer for it:
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index f8c7ca3..ee74764 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
> >  
> >  	blk_rq_unprep_clone(clone);
> >  
> > -	if (clone->q->mq_ops)
> > +	if (clone->q && clone->q->mq_ops)
> >  		tio->ti->type->release_clone_rq(clone);
> >  	else if (!md->queue->mq_ops)
> >  		/* request_fn queue stacked on request_fn queue(s) */
> 
> I'm seeing this same crash on the completion path (when using your
> tcm_loop script).  But for Bart's case his stacktrace included
> dm_requeue_unmapped_original_request() -- which if called from
> map_request() implies clone->q won't have been initialized given
> __multipath_map()'s code for setting up the old request_fn case.
> 
> Long story short: your fix is right for Bart's crash (but not the ones
> I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
> attributed to you and cc'ing stable as needed.

Actually, here is the proper 4.1-only fix (Bart please verify this works
for you):

From: Mike Snitzer <snitzer@redhat.com>
Date: Wed, 29 Apr 2015 10:48:09 -0400
Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request

Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops.  It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2.  However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3d34b5d..5998c26 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1031,16 +1031,24 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	dm_put(md);
 }
 
-static void free_rq_clone(struct request *clone)
+static void free_rq_clone(struct request *clone, bool must_be_mapped)
 {
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 
-	if (clone->q->mq_ops)
+	WARN_ON_ONCE(must_be_mapped && !clone->q);
+
+	if (md->type == DM_TYPE_MQ_REQUEST_BASED)
+		/* stacked on blk-mq queue(s) */
 		tio->ti->type->release_clone_rq(clone);
 	else if (!md->queue->mq_ops)
 		/* request_fn queue stacked on request_fn queue(s) */
 		free_clone_request(md, clone);
+	/*
+	 * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
+	 * no need to call free_clone_request() because we leverage blk-mq by
+	 * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
+	 */
 
 	if (!md->queue->mq_ops)
 		free_rq_tio(tio);
@@ -1071,7 +1079,7 @@ static void dm_end_request(struct request *clone, int error)
 			rq->sense_len = clone->sense_len;
 	}
 
-	free_rq_clone(clone);
+	free_rq_clone(clone, true);
 	if (!rq->q->mq_ops)
 		blk_end_request_all(rq, error);
 	else
@@ -1090,7 +1098,7 @@ static void dm_unprep_request(struct request *rq)
 	}
 
 	if (clone)
-		free_rq_clone(clone);
+		free_rq_clone(clone, false);
 }
 
 /*
-- 
2.3.2 (Apple Git-55)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-29 18:53     ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
@ 2015-04-29 19:11       ` Bart Van Assche
  2015-04-29 19:53         ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-04-29 19:11 UTC (permalink / raw)
  To: Mike Snitzer, Christoph Hellwig; +Cc: device-mapper development, Aaro Koskinen

On 04/29/15 20:53, Mike Snitzer wrote:
> Actually, here is the proper 4.1-only fix (Bart please verify this works
> for you):

Hello Mike,

Thanks for the patch. But against which tree has this patch been 
generated ? It doesn't seem to apply on v4.1-rc1:

$ git reset --hard v4.1-rc1
HEAD is now at b787f68 Linux 4.1-rc1
$ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\ 
when\ requeueing\ unmapped\ request.eml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file drivers/md/dm.c
Hunk #1 FAILED at 1031.
Hunk #2 succeeded at 1124 (offset 53 lines).
Hunk #3 succeeded at 1143 (offset 53 lines).
1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-29 19:11       ` Bart Van Assche
@ 2015-04-29 19:53         ` Mike Snitzer
  2015-04-30  9:07           ` Bart Van Assche
  2015-04-30  9:11           ` Aaro Koskinen
  0 siblings, 2 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-29 19:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen

On Wed, Apr 29 2015 at  3:11P -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 04/29/15 20:53, Mike Snitzer wrote:
> >Actually, here is the proper 4.1-only fix (Bart please verify this works
> >for you):
> 
> Hello Mike,
> 
> Thanks for the patch. But against which tree has this patch been generated ?
> It doesn't seem to apply on v4.1-rc1:
> 
> $ git reset --hard v4.1-rc1
> HEAD is now at b787f68 Linux 4.1-rc1
> $ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
> when\ requeueing\ unmapped\ request.eml
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file drivers/md/dm.c
> Hunk #1 FAILED at 1031.
> Hunk #2 succeeded at 1124 (offset 53 lines).
> Hunk #3 succeeded at 1143 (offset 53 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej

It was implemented against my "private" wip2 branch (since rebased):
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2

Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
WARN_ON_ONCE but I need to research further.. if you guys think that
there are perfectly resonable ways to explain why clone->q is NULL in
the IO completion path then I'm all ears):

From: Mike Snitzer <snitzer@redhat.com>
Date: Wed, 29 Apr 2015 10:48:09 -0400
Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request

Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops.  It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2.  However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 drivers/md/dm.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 6754bbd..dfb7bde 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	dm_put(md);
 }
 
-static void free_rq_clone(struct request *clone)
+static void free_rq_clone(struct request *clone, bool must_be_mapped)
 {
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 
+	WARN_ON_ONCE(must_be_mapped && !clone->q);
+
 	blk_rq_unprep_clone(clone);
 
-	if (clone->q->mq_ops)
+	if (md->type == DM_TYPE_MQ_REQUEST_BASED)
+		/* stacked on blk-mq queue(s) */
 		tio->ti->type->release_clone_rq(clone);
 	else if (!md->queue->mq_ops)
 		/* request_fn queue stacked on request_fn queue(s) */
 		free_clone_request(md, clone);
+	/*
+	 * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
+	 * no need to call free_clone_request() because we leverage blk-mq by
+	 * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
+	 */
 
 	if (!md->queue->mq_ops)
 		free_rq_tio(tio);
@@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
 			rq->sense_len = clone->sense_len;
 	}
 
-	free_rq_clone(clone);
+	free_rq_clone(clone, true);
 	if (!rq->q->mq_ops)
 		blk_end_request_all(rq, error);
 	else
@@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
 	}
 
 	if (clone)
-		free_rq_clone(clone);
+		free_rq_clone(clone, false);
 }
 
 /*
-- 
2.3.2 (Apple Git-55)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-29 19:53         ` Mike Snitzer
@ 2015-04-30  9:07           ` Bart Van Assche
  2015-04-30 12:57             ` Mike Snitzer
  2015-04-30  9:11           ` Aaro Koskinen
  1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:07 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen

On 04/29/15 21:53, Mike Snitzer wrote:
> On Wed, Apr 29 2015 at  3:11P -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
>
>> On 04/29/15 20:53, Mike Snitzer wrote:
>>> Actually, here is the proper 4.1-only fix (Bart please verify this works
>>> for you):
>>
>> Hello Mike,
>>
>> Thanks for the patch. But against which tree has this patch been generated ?
>> It doesn't seem to apply on v4.1-rc1:
>>
>> $ git reset --hard v4.1-rc1
>> HEAD is now at b787f68 Linux 4.1-rc1
>> $ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
>> when\ requeueing\ unmapped\ request.eml
>> (Stripping trailing CRs from patch; use --binary to disable.)
>> patching file drivers/md/dm.c
>> Hunk #1 FAILED at 1031.
>> Hunk #2 succeeded at 1124 (offset 53 lines).
>> Hunk #3 succeeded at 1143 (offset 53 lines).
>> 1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
>
> It was implemented against my "private" wip2 branch (since rebased):
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
>
> Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> WARN_ON_ONCE but I need to research further.. if you guys think that
> there are perfectly resonable ways to explain why clone->q is NULL in
> the IO completion path then I'm all ears):
>
> From: Mike Snitzer <snitzer@redhat.com>
> Date: Wed, 29 Apr 2015 10:48:09 -0400
> Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
>
> Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
> using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
> before testing clone->q->mq_ops.  It was an oversight to discontinue
> that check for 1 of the 2 use-cases for free_rq_clone():
> 1) free_rq_clone() called when an unmapped original request is requeued
> 2) free_rq_clone() called in the request-based IO completion path
>
> The clone->q check made sense for case #1 but not for #2.  However, we
> cannot just reinstate the check as it'd mask a serious bug in the IO
> completion case #2 -- no in-flight request should have an uninitialized
> request_queue (basic block layer refcounting _should_ ensure this).
>
> The NULL pointer seen for case #1 is detailed here:
> https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
>
> Fix this free_rq_clone() NULL pointer by simply checking if the
> mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
> blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
> dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
> uninitialized clone request is being completed.
>
> Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> ---
>   drivers/md/dm.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 6754bbd..dfb7bde 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
>   	dm_put(md);
>   }
>
> -static void free_rq_clone(struct request *clone)
> +static void free_rq_clone(struct request *clone, bool must_be_mapped)
>   {
>   	struct dm_rq_target_io *tio = clone->end_io_data;
>   	struct mapped_device *md = tio->md;
>
> +	WARN_ON_ONCE(must_be_mapped && !clone->q);
> +
>   	blk_rq_unprep_clone(clone);
>
> -	if (clone->q->mq_ops)
> +	if (md->type == DM_TYPE_MQ_REQUEST_BASED)
> +		/* stacked on blk-mq queue(s) */
>   		tio->ti->type->release_clone_rq(clone);
>   	else if (!md->queue->mq_ops)
>   		/* request_fn queue stacked on request_fn queue(s) */
>   		free_clone_request(md, clone);
> +	/*
> +	 * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
> +	 * no need to call free_clone_request() because we leverage blk-mq by
> +	 * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
> +	 */
>
>   	if (!md->queue->mq_ops)
>   		free_rq_tio(tio);
> @@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
>   			rq->sense_len = clone->sense_len;
>   	}
>
> -	free_rq_clone(clone);
> +	free_rq_clone(clone, true);
>   	if (!rq->q->mq_ops)
>   		blk_end_request_all(rq, error);
>   	else
> @@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
>   	}
>
>   	if (clone)
> -		free_rq_clone(clone);
> +		free_rq_clone(clone, false);
>   }
>
>   /*

Hello Mike,

This patch survives my SRP initiator tests without triggering any kernel 
warning.

Thanks !

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-30  9:07           ` Bart Van Assche
@ 2015-04-30 12:57             ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-30 12:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: device-mapper development, Christoph Hellwig, Aaro Koskinen

On Thu, Apr 30 2015 at  5:07am -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 04/29/15 21:53, Mike Snitzer wrote:
> >On Wed, Apr 29 2015 at  3:11P -0400,
> >Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> >
> >>On 04/29/15 20:53, Mike Snitzer wrote:
> >>>Actually, here is the proper 4.1-only fix (Bart please verify this works
> >>>for you):
> >>
> >>Hello Mike,
> >>
> >>Thanks for the patch. But against which tree has this patch been generated ?
> >>It doesn't seem to apply on v4.1-rc1:
> >>
> >>$ git reset --hard v4.1-rc1
> >>HEAD is now at b787f68 Linux 4.1-rc1
> >>$ patch -p1 < ~/\[PATCH\]\ dm\:\ fix\ free_rq_clone\(\)\ NULL\ pointer\
> >>when\ requeueing\ unmapped\ request.eml
> >>(Stripping trailing CRs from patch; use --binary to disable.)
> >>patching file drivers/md/dm.c
> >>Hunk #1 FAILED at 1031.
> >>Hunk #2 succeeded at 1124 (offset 53 lines).
> >>Hunk #3 succeeded at 1143 (offset 53 lines).
> >>1 out of 3 hunks FAILED -- saving rejects to file drivers/md/dm.c.rej
> >
> >It was implemented against my "private" wip2 branch (since rebased):
> >http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
> >
> >Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> >WARN_ON_ONCE but I need to research further.. if you guys think that
> >there are perfectly resonable ways to explain why clone->q is NULL in
> >the IO completion path then I'm all ears):
> >
> >From: Mike Snitzer <snitzer@redhat.com>
> >Date: Wed, 29 Apr 2015 10:48:09 -0400
> >Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
> >
> >Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
> >using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
> >before testing clone->q->mq_ops.  It was an oversight to discontinue
> >that check for 1 of the 2 use-cases for free_rq_clone():
> >1) free_rq_clone() called when an unmapped original request is requeued
> >2) free_rq_clone() called in the request-based IO completion path
> >
> >The clone->q check made sense for case #1 but not for #2.  However, we
> >cannot just reinstate the check as it'd mask a serious bug in the IO
> >completion case #2 -- no in-flight request should have an uninitialized
> >request_queue (basic block layer refcounting _should_ ensure this).
> >
> >The NULL pointer seen for case #1 is detailed here:
> >https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
> >
> >Fix this free_rq_clone() NULL pointer by simply checking if the
> >mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
> >blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
> >dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
> >uninitialized clone request is being completed.
> >
> >Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> >---
> >  drivers/md/dm.c | 16 ++++++++++++----
> >  1 file changed, 12 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> >index 6754bbd..dfb7bde 100644
> >--- a/drivers/md/dm.c
> >+++ b/drivers/md/dm.c
> >@@ -1082,18 +1082,26 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
> >  	dm_put(md);
> >  }
> >
> >-static void free_rq_clone(struct request *clone)
> >+static void free_rq_clone(struct request *clone, bool must_be_mapped)
> >  {
> >  	struct dm_rq_target_io *tio = clone->end_io_data;
> >  	struct mapped_device *md = tio->md;
> >
> >+	WARN_ON_ONCE(must_be_mapped && !clone->q);
> >+
> >  	blk_rq_unprep_clone(clone);
> >
> >-	if (clone->q->mq_ops)
> >+	if (md->type == DM_TYPE_MQ_REQUEST_BASED)
> >+		/* stacked on blk-mq queue(s) */
> >  		tio->ti->type->release_clone_rq(clone);
> >  	else if (!md->queue->mq_ops)
> >  		/* request_fn queue stacked on request_fn queue(s) */
> >  		free_clone_request(md, clone);
> >+	/*
> >+	 * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
> >+	 * no need to call free_clone_request() because we leverage blk-mq by
> >+	 * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
> >+	 */
> >
> >  	if (!md->queue->mq_ops)
> >  		free_rq_tio(tio);
> >@@ -1124,7 +1132,7 @@ static void dm_end_request(struct request *clone, int error)
> >  			rq->sense_len = clone->sense_len;
> >  	}
> >
> >-	free_rq_clone(clone);
> >+	free_rq_clone(clone, true);
> >  	if (!rq->q->mq_ops)
> >  		blk_end_request_all(rq, error);
> >  	else
> >@@ -1143,7 +1151,7 @@ static void dm_unprep_request(struct request *rq)
> >  	}
> >
> >  	if (clone)
> >-		free_rq_clone(clone);
> >+		free_rq_clone(clone, false);
> >  }
> >
> >  /*
> 
> Hello Mike,
> 
> This patch survives my SRP initiator tests without triggering any
> kernel warning.

Great.

> Thanks !

No problem, thanks for testing.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-29 19:53         ` Mike Snitzer
  2015-04-30  9:07           ` Bart Van Assche
@ 2015-04-30  9:11           ` Aaro Koskinen
  2015-04-30 12:56             ` Mike Snitzer
  1 sibling, 1 reply; 15+ messages in thread
From: Aaro Koskinen @ 2015-04-30  9:11 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Bart Van Assche, device-mapper development, Christoph Hellwig

Hi,

On Wed, Apr 29, 2015 at 03:53:42PM -0400, Mike Snitzer wrote:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
> 
> Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> WARN_ON_ONCE but I need to research further.. if you guys think that
> there are perfectly resonable ways to explain why clone->q is NULL in
> the IO completion path then I'm all ears):

This fixes the crash I'm seeing, but the WARN ON is still triggering
on almost (*) every boot. I'm using rootfs where multipathd is built
and started with the default configuration it ships with, and it looks
like this:

[  OK  ] Started Device-Mapper Multipath Device Controller.
[  OK  ] Started Network Service.
         Starting Network Name Resolution...
[  OK  ] Reached target Network.
         Starting GlusterFS, a clustered file-system server...
[   16.562604] device-mapper: multipath service-time: version 0.2.0 loaded
[   16.586067] device-mapper: table: 253:0: multipath: error getting device
[   16.586428] device-mapper: ioctl: error adding target to table
[   16.679048] device-mapper: multipath: Failing path 8:16.
[  OK  ] Started Network Name Resolution.
[*     ] A start job is running for GlusterF...le-system server (13s / 5min 7s)
[...]
[   23.034550] ------------[ cut here ]------------
[   23.035525] WARNING: CPU: 0 PID: 3 at /home/aakoskin/linux/drivers/md/dm.c:1090 free_rq_clone+0xbc/0x130 [dm_mod]()
[...]
[   23.041885] Call Trace:
[   23.042064] [<ffffffff8010bc90>] show_stack+0x78/0x90
[   23.042505] [<ffffffff80133764>] warn_slowpath_common+0xa4/0xe0
[   23.043019] [<ffffffffc000e37c>] free_rq_clone+0xbc/0x130 [dm_mod]
[   23.043412] [<ffffffffc000e830>] dm_softirq_done+0x198/0x2c0 [dm_mod]
[   23.043775] [<ffffffff803388dc>] blk_done_softirq+0xac/0xc0
[   23.044076] [<ffffffff80136894>] __do_softirq+0x174/0x368
[   23.044376] [<ffffffff80136af8>] run_ksoftirqd+0x70/0xa8
[   23.044668] [<ffffffff8015604c>] smpboot_thread_fn+0x1bc/0x1c8
[   23.044980] [<ffffffff80152440>] kthread+0xe0/0xf8
[   23.045247] [<ffffffff80105768>] ret_from_kernel_thread+0x14/0x1c
[   23.045673]
[   23.045824] ---[ end trace e0e5377c5d7b858b ]---
[   23.046326] blk_update_request: I/O error, dev dm-0, sector 0
[   23.056271] blk_update_request: I/O error, dev dm-0, sector 0
[   23.056745] Buffer I/O error on dev dm-0, logical block 0, async page read
[   23.070427] blk_update_request: I/O error, dev dm-0, sector 0
[   23.070833] Buffer I/O error on dev dm-0, logical block 0, async page read

(*) Strange thing is that it only happens when my test bot is booting
the system. With interactive console it's OK without any I/O errors.

A.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request
  2015-04-30  9:11           ` Aaro Koskinen
@ 2015-04-30 12:56             ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2015-04-30 12:56 UTC (permalink / raw)
  To: Aaro Koskinen
  Cc: Bart Van Assche, device-mapper development, Christoph Hellwig

On Thu, Apr 30 2015 at  5:11am -0400,
Aaro Koskinen <aaro.koskinen@nokia.com> wrote:

> Hi,
> 
> On Wed, Apr 29, 2015 at 03:53:42PM -0400, Mike Snitzer wrote:
> > http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip2
> > 
> > Anyway, here it is rebased to 4.1-rc1 (BTW, I'm open to dropping the
> > WARN_ON_ONCE but I need to research further.. if you guys think that
> > there are perfectly resonable ways to explain why clone->q is NULL in
> > the IO completion path then I'm all ears):
> 
> This fixes the crash I'm seeing, but the WARN ON is still triggering
> on almost (*) every boot. I'm using rootfs where multipathd is built
> and started with the default configuration it ships with, and it looks
> like this:

Can you show multipath -ll for the device in question?  Are you saying
that you're using multipath for the root device?

Do you have the scsi_dh module that the device uses getting preloaded
at boot? (e.g. add "rdloaddriver=scsi_dh_alua" to the grub kernel
commandline).  Alternatively the relevant scsi_dh can just be built-in
to the kernel, that way it'll always get attached when the SCSI device
scan occurs.

> [  OK  ] Started Device-Mapper Multipath Device Controller.
> [  OK  ] Started Network Service.
>          Starting Network Name Resolution...
> [  OK  ] Reached target Network.
>          Starting GlusterFS, a clustered file-system server...
> [   16.562604] device-mapper: multipath service-time: version 0.2.0 loaded
> [   16.586067] device-mapper: table: 253:0: multipath: error getting device
> [   16.586428] device-mapper: ioctl: error adding target to table
> [   16.679048] device-mapper: multipath: Failing path 8:16.
> [  OK  ] Started Network Name Resolution.
> [*     ] A start job is running for GlusterF...le-system server (13s / 5min 7s)
> [...]
> [   23.034550] ------------[ cut here ]------------
> [   23.035525] WARNING: CPU: 0 PID: 3 at /home/aakoskin/linux/drivers/md/dm.c:1090 free_rq_clone+0xbc/0x130 [dm_mod]()
> [...]
> [   23.041885] Call Trace:
> [   23.042064] [<ffffffff8010bc90>] show_stack+0x78/0x90
> [   23.042505] [<ffffffff80133764>] warn_slowpath_common+0xa4/0xe0
> [   23.043019] [<ffffffffc000e37c>] free_rq_clone+0xbc/0x130 [dm_mod]
> [   23.043412] [<ffffffffc000e830>] dm_softirq_done+0x198/0x2c0 [dm_mod]
> [   23.043775] [<ffffffff803388dc>] blk_done_softirq+0xac/0xc0
> [   23.044076] [<ffffffff80136894>] __do_softirq+0x174/0x368
> [   23.044376] [<ffffffff80136af8>] run_ksoftirqd+0x70/0xa8
> [   23.044668] [<ffffffff8015604c>] smpboot_thread_fn+0x1bc/0x1c8
> [   23.044980] [<ffffffff80152440>] kthread+0xe0/0xf8
> [   23.045247] [<ffffffff80105768>] ret_from_kernel_thread+0x14/0x1c
> [   23.045673]
> [   23.045824] ---[ end trace e0e5377c5d7b858b ]---
> [   23.046326] blk_update_request: I/O error, dev dm-0, sector 0
> [   23.056271] blk_update_request: I/O error, dev dm-0, sector 0
> [   23.056745] Buffer I/O error on dev dm-0, logical block 0, async page read
> [   23.070427] blk_update_request: I/O error, dev dm-0, sector 0
> [   23.070833] Buffer I/O error on dev dm-0, logical block 0, async page read
> 
> (*) Strange thing is that it only happens when my test bot is booting
> the system. With interactive console it's OK without any I/O errors.
> 
> A.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-04-30 12:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-28 11:52 Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops Bart Van Assche
2015-04-28 13:52 ` Mike Snitzer
2015-04-28 21:54   ` Mike Snitzer
2015-04-29 13:24     ` Christoph Hellwig
2015-04-29 13:43       ` Mike Snitzer
2015-04-29 13:20 ` Christoph Hellwig
2015-04-29 13:34   ` Mike Snitzer
2015-04-29 13:37     ` Christoph Hellwig
2015-04-29 18:53     ` [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request Mike Snitzer
2015-04-29 19:11       ` Bart Van Assche
2015-04-29 19:53         ` Mike Snitzer
2015-04-30  9:07           ` Bart Van Assche
2015-04-30 12:57             ` Mike Snitzer
2015-04-30  9:11           ` Aaro Koskinen
2015-04-30 12:56             ` Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.