From: yizhan@redhat.com (Yi Zhang)
Subject: nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?)
Date: Sun, 24 Sep 2017 17:28:30 +0800 [thread overview]
Message-ID: <e8aa9f2e-14b1-23f0-f672-07a913842c9d@redhat.com> (raw)
In-Reply-To: <47493aa0-4cad-721b-4ea2-c3b2293340aa@grimberg.me>
> Is it possible that ib_dereg_mr failed?
>
It seems not, and finally the system get panic, here is the log:
[ 104.373784] nvme nvme0: new ctrl: NQN
"nqn.2014-08.org.nvmexpress.discovery", addr 172.31.0.90:4420
[ 104.564001] nvme nvme0: creating 40 I/O queues.
[ 105.070022] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.0.90:4420
[ 144.135070] nvme nvme0: rescanning
[ 204.383678] nvme nvme0: Reconnecting in 10 seconds...
[ 214.506489] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 214.513996] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 214.520426] nvme nvme0: Failed reconnect attempt 1
[ 214.525788] nvme nvme0: Reconnecting in 10 seconds...
[ 224.733962] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 224.741464] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 224.747898] nvme nvme0: Failed reconnect attempt 2
[ 224.753301] nvme nvme0: Reconnecting in 10 seconds...
[ 234.973834] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 234.981335] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 234.987768] nvme nvme0: Failed reconnect attempt 3
[ 234.993150] nvme nvme0: Reconnecting in 10 seconds...
[ 245.233395] nvme nvme0: creating 40 I/O queues.
[ 245.238480] DMAR: ERROR: DMA PTE for vPFN 0xe109b already set (to
10098cc002 not 103b85e003)
[ 245.247940] ------------[ cut here ]------------
[ 245.253110] WARNING: CPU: 38 PID: 6 at
drivers/iommu/intel-iommu.c:2305 __domain_mapping+0x367/0x380
[ 245.263329] Modules linked in: nvme_rdma nvme_fabrics nvme_core
sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert
iscsi_target_mod ibd
[ 245.342493] mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp
libata crc32c_intel i2c_core pps_core devlink dm_mirror dm_region_hash dmd
[ 245.364191] CPU: 38 PID: 6 Comm: kworker/u368:0 Not tainted
4.14.0-rc1+ #7
[ 245.371880] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS
1.6.2 01/08/2016
[ 245.380265] Workqueue: ib_addr process_one_req [ib_core]
[ 245.386211] task: ffff88018cb245c0 task.stack: ffffc9000009c000
[ 245.392836] RIP: 0010:__domain_mapping+0x367/0x380
[ 245.398194] RSP: 0018:ffffc9000009fa98 EFLAGS: 00010202
[ 245.404039] RAX: 0000000000000004 RBX: 000000103b85e003 RCX:
0000000000000000
[ 245.412018] RDX: 0000000000000000 RSI: ffff88103eace038 RDI:
ffff88103eace038
[ 245.420001] RBP: ffffc9000009faf8 R08: 0000000000000000 R09:
0000000000000000
[ 245.427983] R10: 00000000000002f7 R11: 000000000103b85e R12:
ffff881009bc74d8
[ 245.436711] R13: 0000000000000001 R14: 0000000000000001 R15:
00000000000e109b
[ 245.445419] FS: 0000000000000000(0000) GS:ffff88103eac0000(0000)
knlGS:0000000000000000
[ 245.455199] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 245.462357] CR2: 000014940a9b7140 CR3: 00000010119b5000 CR4:
00000000001606e0
[ 245.471074] Call Trace:
[ 245.474549] __intel_map_single+0xeb/0x180
[ 245.479868] intel_alloc_coherent+0xb5/0x130
[ 245.485388] mlx4_buf_alloc+0xe5/0x1c0 [mlx4_core]
[ 245.491482] mlx4_ib_alloc_cq_buf.isra.9+0x38/0xd0 [mlx4_ib]
[ 245.498540] mlx4_ib_create_cq+0x223/0x450 [mlx4_ib]
[ 245.504822] ib_alloc_cq+0x49/0x170 [ib_core]
[ 245.510413] nvme_rdma_cm_handler+0x3a2/0x7ab [nvme_rdma]
[ 245.517179] ? cma_acquire_dev+0x1e3/0x3b0 [rdma_cm]
[ 245.523456] addr_handler+0xa4/0x1c0 [rdma_cm]
[ 245.529147] process_one_req+0x8d/0x120 [ib_core]
[ 245.535132] process_one_work+0x149/0x360
[ 245.540334] worker_thread+0x4d/0x3c0
[ 245.545145] kthread+0x109/0x140
[ 245.549462] ? rescuer_thread+0x380/0x380
[ 245.554654] ? kthread_park+0x60/0x60
[ 245.559456] ret_from_fork+0x25/0x30
[ 245.564153] Code: fe aa 81 4c 89 5d a0 4c 89 4d a8 e8 87 e1 c0 ff 8b
05 fe 6e 87 00 4c 8b 4d a8 4c 8b 5d a0 85 c0 74 09 83 e8 01 89 05 e9 6e
87 00 <0f> ff e9 b8 fd ff ff e8 8d c7 ba ff 0f 1f 00 66 2e 0f 1f 8
[ 245.586712] ---[ end trace 56749c1831388ff8 ]---
[ 245.592920] mlx4_core 0000:04:00.0: dma_pool_free mlx4_cmd,
cccccccccccccccc/ccd80eccccccf203 (bad dma)
[ 245.604179] mlx4_core 0000:04:00.0: dma_pool_free mlx4_cmd,
cccccccccccccccc/cccccccccccccccc (bad dma)
[ 245.615647] general protection fault: 0000 [#1] SMP
[ 245.621836] Modules linked in: nvme_rdma nvme_fabrics nvme_core
sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert
iscsi_target_mod ibd
[ 245.706171] mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp
libata crc32c_intel i2c_core pps_core devlink dm_mirror dm_region_hash dmd
[ 245.729344] CPU: 38 PID: 6 Comm: kworker/u368:0 Tainted: G W
4.14.0-rc1+ #7
[ 245.739128] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS
1.6.2 01/08/2016
[ 245.748234] Workqueue: ib_addr process_one_req [ib_core]
[ 245.754905] task: ffff88018cb245c0 task.stack: ffffc9000009c000
[ 245.762256] RIP: 0010:prefetch_freepointer.isra.65+0x11/0x20
[ 245.769313] RSP: 0018:ffffc9000009fcc0 EFLAGS: 00010286
[ 245.775881] RAX: 0000000000000000 RBX: cccccccccccccccc RCX:
0000000000001793
[ 245.784591] RDX: 0000000000001792 RSI: cccccccccccccccc RDI:
ffff88018fc07aa0
[ 245.793294] RBP: ffffc9000009fcc0 R08: 000000000001ed40 R09:
ffff8810098cccc0
[ 245.802002] R10: ffffffff818a99e0 R11: 00000000010098cd R12:
00000000014080c0
[ 245.810706] R13: ffffffffa07bd1e0 R14: ffff88018fc07a80 R15:
ffff88018fc07a80
[ 245.819409] FS: 0000000000000000(0000) GS:ffff88103eac0000(0000)
knlGS:0000000000000000
[ 245.829184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 245.836342] CR2: 000014940a9b7140 CR3: 00000010119b5000 CR4:
00000000001606e0
[ 245.845056] Call Trace:
[ 245.848524] kmem_cache_alloc_trace+0xa0/0x1c0
[ 245.854220] nvme_rdma_cm_handler+0x4e0/0x7ab [nvme_rdma]
[ 245.860990] addr_handler+0xa4/0x1c0 [rdma_cm]
[ 245.866694] process_one_req+0x8d/0x120 [ib_core]
[ 245.872687] process_one_work+0x149/0x360
[ 245.877899] worker_thread+0x4d/0x3c0
[ 245.882720] kthread+0x109/0x140
[ 245.887051] ? rescuer_thread+0x380/0x380
[ 245.892244] ? kthread_park+0x60/0x60
[ 245.897054] ret_from_fork+0x25/0x30
[ 245.901760] Code: 31 d2 e8 b3 ea ff ff 5b 41 5c 5d c3 0f 1f 40 00 66
2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 85 f6 48 89 e5 74 0a 48
63 07 <48> 8b 04 06 0f 18 08 5d c3 66 0f 1f 44 00 00 0f 1f 44 00 0
[ 245.924349] RIP: prefetch_freepointer.isra.65+0x11/0x20 RSP:
ffffc9000009fcc0
[ 245.933145] ---[ end trace 56749c1831388ff9 ]---
[ 245.942680] Kernel panic - not syncing: Fatal exception
[ 245.950207] Kernel Offset: disabled
[ 245.958566] ---[ end Kernel panic - not syncing: Fatal exception
[ 245.966082] ------------[ cut here ]------------
[ 245.972014] WARNING: CPU: 38 PID: 6 at kernel/sched/core.c:1179
set_task_cpu+0x191/0x1a0
[ 245.981822] Modules linked in: nvme_rdma nvme_fabrics nvme_core
sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter bridge 8021q garp mrp stp llc rpcrdma ib_isert
iscsi_target_mod ibd
[ 246.066533] mgag200 i2c_algo_bit drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core tg3 ahci libahci ptp
libata crc32c_intel i2c_core pps_core devlink dm_mirror dm_region_hash dmd
[ 246.089836] CPU: 38 PID: 6 Comm: kworker/u368:0 Tainted: G D
W 4.14.0-rc1+ #7
[ 246.099683] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS
1.6.2 01/08/2016
[ 246.108849] Workqueue: ib_addr process_one_req [ib_core]
[ 246.115566] task: ffff88018cb245c0 task.stack: ffffc9000009c000
[ 246.122948] RIP: 0010:set_task_cpu+0x191/0x1a0
[ 246.128668] RSP: 0018:ffff88103eac3c38 EFLAGS: 00010046
[ 246.135255] RAX: 0000000000000100 RBX: ffff88207bf445c0 RCX:
0000000000000001
[ 246.143978] RDX: 0000000000000001 RSI: 0000000000000001 RDI:
ffff88207bf445c0
[ 246.152699] RBP: ffff88103eac3c58 R08: 0000000000000001 R09:
0000000000000000
[ 246.161418] R10: 0000000000000001 R11: 0000000003e236eb R12:
ffff88207bf4516c
[ 246.170137] R13: 0000000000000001 R14: 0000000000000001 R15:
000000000001b900
[ 246.178854] FS: 0000000000000000(0000) GS:ffff88103eac0000(0000)
knlGS:0000000000000000
[ 246.188644] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 246.195812] CR2: 000014940a9b7140 CR3: 00000010119b5000 CR4:
00000000001606e0
[ 246.204540] Call Trace:
[ 246.208027] <IRQ>
[ 246.211016] try_to_wake_up+0x166/0x470
[ 246.216036] default_wake_function+0x12/0x20
[ 246.221537] __wake_up_common+0x8a/0x160
[ 246.226641] __wake_up_locked+0x16/0x20
[ 246.231643] ep_poll_callback+0xd0/0x300
[ 246.236727] __wake_up_common+0x8a/0x160
[ 246.241817] __wake_up_common_lock+0x7e/0xc0
[ 246.247291] __wake_up+0x13/0x20
[ 246.251596] wake_up_klogd_work_func+0x40/0x60
[ 246.257265] irq_work_run_list+0x4d/0x70
[ 246.262353] ? tick_sched_do_timer+0x70/0x70
[ 246.267830] irq_work_tick+0x40/0x50
[ 246.272530] update_process_times+0x42/0x60
[ 246.277912] tick_sched_handle+0x2d/0x60
[ 246.282987] tick_sched_timer+0x39/0x70
[ 246.287945] __hrtimer_run_queues+0xe5/0x230
[ 246.293371] hrtimer_interrupt+0xa8/0x1a0
[ 246.298509] smp_apic_timer_interrupt+0x5f/0x130
[ 246.304322] apic_timer_interrupt+0x9d/0xb0
[ 246.309640] </IRQ>
[ 246.312633] RIP: 0010:panic+0x1fd/0x245
[ 246.317554] RSP: 0018:ffffc9000009fb18 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff10
[ 246.326659] RAX: 0000000000000034 RBX: 0000000000000200 RCX:
0000000000000006
[ 246.335268] RDX: 0000000000000000 RSI: 0000000000000086 RDI:
ffff88103eace030
[ 246.343856] RBP: ffffc9000009fb88 R08: 0000000000000000 R09:
0000000000000877
[ 246.352424] R10: 00000000000003ff R11: 0000000000000001 R12:
ffffffff81a3e1d8
[ 246.360975] R13: 0000000000000000 R14: 0000000000000000 R15:
ffff88018fc07a80
[ 246.369508] ? panic+0x1f6/0x245
[ 246.373657] oops_end+0xb8/0xd0
[ 246.377676] die+0x42/0x50
[ 246.381194] do_general_protection+0xd2/0x160
[ 246.386540] ? nvme_rdma_cm_handler+0x4e0/0x7ab [nvme_rdma]
[ 246.393238] general_protection+0x22/0x30
[ 246.398181] RIP: 0010:prefetch_freepointer.isra.65+0x11/0x20
[ 246.404964] RSP: 0018:ffffc9000009fcc0 EFLAGS: 00010286
[ 246.411258] RAX: 0000000000000000 RBX: cccccccccccccccc RCX:
0000000000001793
[ 246.419692] RDX: 0000000000001792 RSI: cccccccccccccccc RDI:
ffff88018fc07aa0
[ 246.428115] RBP: ffffc9000009fcc0 R08: 000000000001ed40 R09:
ffff8810098cccc0
[ 246.436543] R10: ffffffff818a99e0 R11: 00000000010098cd R12:
00000000014080c0
[ 246.444970] R13: ffffffffa07bd1e0 R14: ffff88018fc07a80 R15:
ffff88018fc07a80
[ 246.453402] ? nvme_rdma_cm_handler+0x4e0/0x7ab [nvme_rdma]
[ 246.460087] kmem_cache_alloc_trace+0xa0/0x1c0
[ 246.465511] nvme_rdma_cm_handler+0x4e0/0x7ab [nvme_rdma]
[ 246.472004] addr_handler+0xa4/0x1c0 [rdma_cm]
[ 246.477424] process_one_req+0x8d/0x120 [ib_core]
[ 246.483128] process_one_work+0x149/0x360
[ 246.488045] worker_thread+0x4d/0x3c0
[ 246.492577] kthread+0x109/0x140
[ 246.496620] ? rescuer_thread+0x380/0x380
[ 246.501540] ? kthread_park+0x60/0x60
[ 246.506070] ret_from_fork+0x25/0x30
[ 246.510496] Code: ff 80 8b ac 08 00 00 04 e9 23 ff ff ff 0f ff e9 bf
fe ff ff f7 83 84 00 00 00 fd ff ff ff 0f 84 c9 fe ff ff 0f ff e9 c2 fe
ff ff <0f> ff e9 d1 fe ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 0
[ 246.532545] ---[ end trace 56749c1831388ffa ]---
> can you please apply the following patch and report if you see a warning?
> --
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 92a03ff5fb4d..ef50b58b0bb6 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -274,7 +274,7 @@ static int nvme_rdma_reinit_request(void *data,
> struct request *rq)
> struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
> int ret = 0;
>
> - ib_dereg_mr(req->mr);
> + WARN_ON_ONCE(ib_dereg_mr(req->mr));
>
> req->mr = ib_alloc_mr(dev->pd, IB_MR_TYPE_MEM_REG,
> ctrl->max_fr_pages);
> --
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2017-09-24 9:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1735134433.8514119.1505997532669.JavaMail.zimbra@redhat.com>
2017-09-21 12:47 ` nvmeof rdma regression issue on 4.14.0-rc1 Yi Zhang
2017-09-21 14:44 ` nvmeof rdma regression issue on 4.14.0-rc1 (or maybe mlx4?) Christoph Hellwig
2017-09-24 7:38 ` Sagi Grimberg
2017-09-24 9:28 ` Yi Zhang [this message]
[not found] ` <20170924103426.GB25094@mtr-leonro.local>
2017-09-25 5:20 ` Yi Zhang
2017-09-25 7:06 ` Sagi Grimberg
2017-09-25 11:01 ` Yi Zhang
2017-09-25 11:15 ` Sagi Grimberg
2017-10-02 12:51 ` Sagi Grimberg
2017-10-12 8:24 ` Yi Zhang
2017-10-16 10:21 ` Sagi Grimberg
2017-10-19 6:33 ` Yi Zhang
2017-10-19 6:55 ` Sagi Grimberg
2017-10-19 8:23 ` Yi Zhang
2017-10-19 9:44 ` Sagi Grimberg
2017-10-19 11:13 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8aa9f2e-14b1-23f0-f672-07a913842c9d@redhat.com \
--to=yizhan@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).