* [resend] Crash in nvmet_req_init() - null req->rsp pointer
[not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
@ 2018-08-20 20:47 ` Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
0 siblings, 2 replies; 6+ messages in thread
From: Sagi Grimberg @ 2018-08-20 20:47 UTC (permalink / raw)
> Resending in plain text...
>
> ----
>
> Hey guys,
>
> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
> running, I see an Oops [1] when the target is processing a new ingress
> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>
> 493 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
> 494 struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops)
> 495 {
> 496 u8 flags = req->cmd->common.flags;
> 497 u16 status;
> 498
> 499 req->cq = cq;
> 500 req->sq = sq;
> 501 req->ops = ops;
> 502 req->sg = NULL;
> 503 req->sg_cnt = 0;
> 504 req->rsp->status = 0; <-- HERE
>
> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
> But it seems very bad that req->rsp is NULL! :)
>
> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
Is it possible that you ran out of rsps and got a corrupted rsp?
How about trying out this patch to add more information:
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e7f43d1e1779..890d9c45ca33 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {
struct nvmet_req req;
+ bool allocated;
u8 n_rdma;
u32 flags;
u32 invalidate_rkey;
@@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
unsigned long flags;
spin_lock_irqsave(&queue->rsps_lock, flags);
- rsp = list_first_entry(&queue->free_rsps,
+ rsp = list_first_entry_or_null(&queue->free_rsps,
struct nvmet_rdma_rsp, free_list);
- list_del(&rsp->free_list);
+ if (rsp) {
+ list_del(&rsp->free_list);
+ rsp->allocated = false;
+ }
spin_unlock_irqrestore(&queue->rsps_lock, flags);
+ if (!rsp) {
+ pr_debug("dynamically allocated rsp\n");
+ rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
+ rsp->allocated = true;
+ }
+
return rsp;
}
@@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
{
unsigned long flags;
+ if (rsp->allocated) {
+ kfree(rsp);
+ return;
+ }
+
spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
--
>
> Thanks,
>
> Steve.
>
> [1]? Oops stack trace:
>
> [45259.588347] BUG: unable to handle kernel NULL pointer dereference at
> 000000000000000e
> [45259.597112] IP: nvmet_req_init+0x3d/0x120 [nvmet]
> [45259.602687] PGD 0 P4D 0
> [45259.606083] Oops: 0002 [#1] SMP PTI
> [45259.610421] Modules linked in: tcp_lp t4_tom(OE) toecore(OE)
> nvmet_rdma nvmet rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
> scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
> ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
> iw_cxgb4(OE) ib_core cxgb4(OE) dm_mirror dm_region_hash dm_log
> intel_rapl vfat nls_utf8 fat isofs x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm loop irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper dm_mod
> iTCO_wdt iTCO_vendor_support pcspkr lpc_ich joydev shpchp i2c_i801 sg
> ipmi_si wmi ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd
> grace sunrpc ip_tables ext4 mbcache jbd2 ast drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops sd_mod ttm igb drm ptp ahci
> [45259.687761]? libahci pps_core libata crc32c_intel dca i2c_algo_bit
> i2c_core nvme nvme_core
> [45259.697067] CPU: 15 PID: 4259 Comm: kworker/15:1H Tainted:
> G?????????? OE?? 4.14.52 #1
> [45259.706042] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b
> 06/12/2017
> [45259.714204] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [45259.720873] task: ffff8bb54c2c0000 task.stack: ffffadd60a334000
> [45259.727802] RIP: 0010:nvmet_req_init+0x3d/0x120 [nvmet]
> [45259.734140] RSP: 0000:ffffadd60a337dc8 EFLAGS: 00010246
> [45259.740373] RAX: 0000000000000040 RBX: ffff8bb3ecf4f748 RCX:
> 0000000000000000
> [45259.748508] RDX: 0000000000000000 RSI: ffff8bb3ecf4f830 RDI:
> ffff8bb3ecf4f748
> [45259.756662] RBP: ffff8bb3ecf4f838 R08: 0000000000000000 R09:
> ffffadd60a337d4c
> [45259.764818] R10: 0000000000000000 R11: 0000000000001000 R12:
> ffff8bb3ecf4f748
> [45259.773006] R13: 0000000000000000 R14: 0000000000000010 R15:
> ffff8bb498ec7400
> [45259.781187] FS:? 0000000000000000(0000) GS:ffff8bb5575c0000(0000)
> knlGS:0000000000000000
> [45259.790256] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [45259.796985] CR2: 000000000000000e CR3: 000000070980a003 CR4:
> 00000000003606e0
> [45259.805113] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [45259.813256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [45259.821382] Call Trace:
> [45259.824794]? nvmet_rdma_handle_command+0xab/0x2a0 [nvmet_rdma]
> [45259.831603]? __ib_process_cq+0x71/0xc0 [ib_core]
> [45259.837219]? ib_cq_poll_work+0x24/0x60 [ib_core]
> [45259.842829]? process_one_work+0x14f/0x360
> [45259.847800]? worker_thread+0x47/0x3e0
> [45259.852470]? kthread+0xfc/0x130
> [45259.856538]? ? max_active_store+0x80/0x80
> [45259.861476]? ? __kthread_parkme+0x90/0x90
> [45259.866427]? ret_from_fork+0x35/0x40
> [45259.870910] Code: 40 01 48 89 77 18 48 89 53 10 48 8b 57 08 48 89 8f
> 58 01 00 00 31 c9 48 c7 47 28 00 00 00 00 c7 87 38 01 00 00 00 00 00 00
> a8 03 <66> 89 4a 0e 75 51 83 e0 c0 3c 40 75 4a 48 8b 47 10 48 8b 10 48
> [45259.891681] RIP: nvmet_req_init+0x3d/0x120 [nvmet] RSP: ffffadd60a337dc8
> [45259.899369] CR2: 000000000000000e
> [45259.903653] ---[ end trace 3bf559cbf4cd1ab5 ]---
>
> [2]? various structs:
>
> crash> nvmet_req ffff8bb3ecf4f748
> struct nvmet_req {
> ? cmd = 0xffff8bb4cfd64b40,
> ? rsp = 0x0,
> ? sq = 0xffff8bb3ecf4f838,
> ? cq = 0xffff8bb3ecf4f830,
> ? ns = 0xb5000000b92,
> ? sg = 0x0,
> ? inline_bio = {
> ??? bi_next = 0x3c0,
> ??? bi_disk = 0x1000f0,
> ??? bi_opf = 0,
> ??? bi_flags = 24216,
> ??? bi_ioprio = 41863,
> ??? bi_write_hint = 0,
> ??? bi_status = 0 '\000',
> ??? bi_partno = 0 '\000',
> ??? bi_phys_segments = 9,
> ??? bi_seg_front_size = 0,
> ??? bi_seg_back_size = 0,
> ??? bi_iter = {
> ????? bi_sector = 0,
> ????? bi_size = 15728648,
> ????? bi_idx = 13500636,
> ????? bi_done = 272,
> ????? bi_bvec_done = 704
> ??? },
> ??? __bi_remaining = {
> ????? counter = 1426761728
> ??? },
> ??? bi_end_io = 0xffff8bb4550aa8ce,
> ??? bi_private = 0x100001050,
> ??? bi_ioc = 0x1,
> ??? bi_css = 0x0,
> ??? {
> ????? bi_integrity = 0x0
> ??? },
> ??? bi_vcnt = 0,
> ??? bi_max_vecs = 0,
> ??? __bi_cnt = {
> ????? counter = 0
> ??? },
> ??? bi_io_vec = 0x0,
> ??? bi_pool = 0x0,
> ??? bi_inline_vecs = 0xffff8bb3ecf4f800
> ? },
> ? inline_bvec = {{
> ????? bv_page = 0xffff8bb498ec4c00,
> ????? bv_len = 2832033792,
> ????? bv_offset = 4294937516
> ??? }, {
> ????? bv_page = 0xffff8bb46af86800,
> ????? bv_len = 7,
> ????? bv_offset = 0
> ??? }, {
> ????? bv_page = 0xffff8bb4039c9900,
> ????? bv_len = 0,
> ????? bv_offset = 1
> ??? }, {
> ????? bv_page = 0x200003,
> ????? bv_len = 3463016448,
> ????? bv_offset = 4294937524
> ??? }, {
> ????? bv_page = 0x8000000000000001,
> ????? bv_len = 2822996336,
> ????? bv_offset = 16928
> ??? }, {
> ????? bv_page = 0xffffffffc0c530f0 <nvmet_sq_free>,
> ????? bv_len = 0,
> ????? bv_offset = 0
> ??? }, {
> ????? bv_page = 0x0,
> ????? bv_len = 0,
> ????? bv_offset = 0
> ??? }, {
> ????? bv_page = 0x0,
> ????? bv_len = 2097155,
> ????? bv_offset = 19
> ??? }},
> ? sg_cnt = 0,
> ? data_len = 0,
> ? port = 0xffff8baca8cd6800,
> ? execute = 0xffff8bb3ecf4f890,
> ? ops = 0xffffffffc08a3100 <nvmet_rdma_ops>
> }
> crash> nvmet_rdma_cmd 0xffff8bb50b83d200
> struct nvmet_rdma_cmd {
> ? sge = {{
> ????? addr = 42141633344,
> ????? length = 64,
> ????? lkey = 0
> ??? }, {
> ????? addr = 40797528064,
> ????? length = 4096,
> ????? lkey = 0
> ??? }},
> ? cqe = {
> ??? done = 0xffffffffc08a05f0 <nvmet_rdma_recv_done>
> ? },
> ? wr = {
> ??? next = 0x0,
> ??? {
> ????? wr_id = 18446616208431370784,
> ????? wr_cqe = 0xffff8bb50b83d220
> ??? },
> ??? sg_list = 0xffff8bb50b83d200,
> ??? num_sge = 2
> ? },
> ? inline_sg = {
> ??? page_link = 18446694492170806146,
> ??? offset = 0,
> ??? length = 4096,
> ??? dma_address = 0,
> ??? dma_length = 0
> ? },
> ? inline_page = 0xffffd2e7e5fee380,
> ? nvme_cmd = 0xffff8bb4cfd64b40,
> ? queue = 0xffff8bb3ecf4f800
> }
> crash> nvme_command.rw 0xffff8bb4cfd64b40
> ??? rw = {
> ????? opcode = 1 '\001',
> ????? flags = 64 '@',
> ????? command_id = 27,
> ????? nsid = 38,
> ????? rsvd2 = 0,
> ????? metadata = 0,
> ????? dptr = {
> ??????? {
> ????????? prp1 = 30590726144,
> ????????? prp2 = 5692787825389355008
> ??????? },
> ??????? sgl = {
> ????????? addr = 30590726144,
> ????????? length = 2449489920,
> ????????? rsvd = <incomplete sequence \330>,
> ????????? type = 79 'O'
> ??????? },
> ??????? ksgl = {
> ????????? addr = 30590726144,
> ????????? length = "\000@",
> ????????? key = "\222", <incomplete sequence \330>,
> ????????? type = 79 'O'
> ??????? }
> ????? },
> ????? slba = 1001032,
> ????? length = 31,
> ????? control = 0,
> ????? dsmgmt = 0,
> ????? reftag = 0,
> ????? apptag = 0,
> ????? appmask = 0
> ??? }
> crash> nvmet_rdma_queue 0xffff8bb3ecf4f800
> struct nvmet_rdma_queue {
> ? cm_id = 0xffff8bb498ec4c00,
> ? port = 0xffff8baca8cd6800,
> ? cq = 0xffff8bb46af86800,
> ? sq_wr_avail = {
> ??? counter = 7
> ? },
> ? dev = 0xffff8bb4039c9900,
> ? state_lock = {
> ??? {
> ????? rlock = {
> ??????? raw_lock = {
> ????????? val = {
> ??????????? counter = 0
> ????????? }
> ??????? }
> ????? }
> ??? }
> ? },
> ? state = NVMET_RDMA_Q_LIVE,
> ? nvme_cq = {
> ??? qid = 3,
> ??? size = 32
> ? },
> ? nvme_sq = {
> ??? ctrl = 0xffff8bb4ce697000,
> ??? ref = {
> ????? count = {
> ??????? counter = -9223372036854775807
> ????? },
> ????? percpu_count_ptr = 72708029383024,
> ????? release = 0xffffffffc0c530f0 <nvmet_sq_free>,
> ????? confirm_switch = 0x0,
> ????? force_atomic = false,
> ????? rcu = {
> ??????? next = 0x0,
> ??????? func = 0x0
> ????? }
> ??? },
> ??? qid = 3,
> ??? size = 32,
> ??? sqhd = 19,
> ??? free_done = {
> ????? done = 0,
> ????? wait = {
> ??????? lock = {
> ????????? {
> ??????????? rlock = {
> ????????????? raw_lock = {
> ??????????????? val = {
> ????????????????? counter = 0
> ??????????????? }
> ????????????? }
> ??????????? }
> ????????? }
> ??????? },
> ??????? head = {
> ????????? next = 0xffff8baca8cd6800,
> ????????? prev = 0xffff8bb3ecf4f890
> ??????? }
> ????? }
> ??? },
> ??? confirm_done = {
> ????? done = 3230281984,
> ????? wait = {
> ??????? lock = {
> ????????? {
> ??????????? rlock = {
> ????????????? raw_lock = {
> ??????????????? val = {
> ????????????????? counter = 0
> ??????????????? }
> ????????????? }
> ??????????? }
> ????????? }
> ??????? },
> ??????? head = {
> ????????? next = 0xffff8bb3ecf4f8b0,
> ????????? prev = 0xffff8bb3ecf4f8b0
> ??????? }
> ????? }
> ??? }
> ? },
> ? rsps = 0xffff8bb4cd9c0000,
> ? free_rsps = {
> ??? next = 0xdead000000000100,
> ??? prev = 0xdead000000000200
> ? },
> ? rsps_lock = {
> ??? {
> ????? rlock = {
> ??????? raw_lock = {
> ????????? val = {
> ??????????? counter = 0
> ????????? }
> ??????? }
> ????? }
> ??? }
> ? },
> ? cmds = 0xffff8bb50b83d000,
> ? release_work = {
> ??? data = {
> ????? counter = 68719476704
> ??? },
> ??? entry = {
> ????? next = 0xffff8bb3ecf4f8f0,
> ????? prev = 0xffff8bb3ecf4f8f0
> ??? },
> ??? func = 0xffffffffc089f0d0 <nvmet_rdma_release_queue_work>
> ? },
> ? rsp_wait_list = {
> ??? next = 0xffff8bb3ecf4f908,
> ??? prev = 0xffff8bb3ecf4f908
> ? },
> ? rsp_wr_wait_list = {
> ??? next = 0xffff8bb3ecf4f918,
> ??? prev = 0xffff8bb3ecf4f918
> ? },
> ? rsp_wr_wait_lock = {
> ??? {
> ????? rlock = {
> ??????? raw_lock = {
> ????????? val = {
> ??????????? counter = 0
> ????????? }
> ??????? }
> ????? }
> ??? }
> ? },
> ? idx = 151,
> ? host_qid = 3,
> ? recv_queue_size = 32,
> ? send_queue_size = 32,
> ? queue_list = {
> ??? next = 0xffff8bb464686540,
> ??? prev = 0xffff8bb3ecf4d140
> ? }
> }
> crash> ib_wc ffff8bb498ec7400
> struct ib_wc {
> ? {
> ??? wr_id = 18446616208431370784,
> ??? wr_cqe = 0xffff8bb50b83d220
> ? },
> ? status = IB_WC_SUCCESS,
> ? opcode = IB_WC_RECV,
> ? vendor_err = 0,
> ? byte_len = 64,
> ? qp = 0xffff8bb4d9759c00,
> ? ex = {
> ??? imm_data = 282903233,
> ??? invalidate_rkey = 282903233
> ? },
> ? src_qp = 283165377,
> ? slid = 283427521,
> ? wc_flags = 0,
> ? pkey_index = 49857,
> ? sl = 236 '\354',
> ? dlid_path_bits = 16 '\020',
> ? port_num = 193 '\301',
> ? smac = "\302\360\020\301\302", <incomplete sequence \364>,
> ? vlan_id = 49857,
> ? network_hdr_type = 248 '\370'
> }
>
>
>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [resend] Crash in nvmet_req_init() - null req->rsp pointer
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
@ 2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
1 sibling, 0 replies; 6+ messages in thread
From: Steve Wise @ 2018-08-22 15:08 UTC (permalink / raw)
On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
>
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>> ?? 495?? {
>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>> ?? 497?????????? u16 status;
>> ?? 498
>> ?? 499?????????? req->cq = cq;
>> ?? 500?????????? req->sq = sq;
>> ?? 501?????????? req->ops = ops;
>> ?? 502?????????? req->sg = NULL;
>> ?? 503?????????? req->sg_cnt = 0;
>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>
>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
>
> Is it possible that you ran out of rsps and got a corrupted rsp?
That is what it looks like. Because the nvmet_rdma_rsp pointer returned
from nvmet_rdma_get_rsp() was not within the bounds of the response
struct array allocated for that queue. I wasn't sure how that could
happen? I would think first_entry() would return NULL if the list was
empty, but I guess not.
>
> How about trying out this patch to add more information:
I'll try this out. Thanks Sagi!
> --
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index e7f43d1e1779..890d9c45ca33 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {
>
> ??????? struct nvmet_req??????? req;
>
> +?????? bool??????????????????? allocated;
> ??????? u8????????????????????? n_rdma;
> ??????? u32???????????????????? flags;
> ??????? u32???????????????????? invalidate_rkey;
> @@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
> ??????? unsigned long flags;
>
> ??????? spin_lock_irqsave(&queue->rsps_lock, flags);
> -?????? rsp = list_first_entry(&queue->free_rsps,
> +?????? rsp = list_first_entry_or_null(&queue->free_rsps,
> ??????????????????????????????? struct nvmet_rdma_rsp, free_list);
> -?????? list_del(&rsp->free_list);
> +?????? if (rsp) {
> +?????????????? list_del(&rsp->free_list);
> +?????????????? rsp->allocated = false;
> +?????? }
> ??????? spin_unlock_irqrestore(&queue->rsps_lock, flags);
>
> +?????? if (!rsp) {
> +?????????????? pr_debug("dynamically allocated rsp\n");
> +?????????????? rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
> +?????????????? rsp->allocated = true;
> +?????? }
> +
> ??????? return rsp;
> ?}
>
> @@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
> ?{
> ??????? unsigned long flags;
>
> +?????? if (rsp->allocated) {
> +?????????????? kfree(rsp);
> +?????????????? return;
> +?????? }
> +
> ??????? spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
> ??????? list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
> ??????? spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
> --
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Crash in nvmet_req_init() - null req->rsp pointer
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
@ 2018-08-27 18:24 ` Steve Wise
2018-08-27 20:29 ` Steve Wise
1 sibling, 1 reply; 6+ messages in thread
From: Steve Wise @ 2018-08-27 18:24 UTC (permalink / raw)
On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
>
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>> ?? 495?? {
>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>> ?? 497?????????? u16 status;
>> ?? 498
>> ?? 499?????????? req->cq = cq;
>> ?? 500?????????? req->sq = sq;
>> ?? 501?????????? req->ops = ops;
>> ?? 502?????????? req->sg = NULL;
>> ?? 503?????????? req->sg_cnt = 0;
>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>
>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
>
> Is it possible that you ran out of rsps and got a corrupted rsp?
>
> How about trying out this patch to add more information:
> --
Hey Sagi, it hits the empty rsp list path often with your debug patch.
I added code to BUG_ON() after 10 times and I have a crash dump I'm
looking at.
Isn't the rsp list supposed to be sized such that it will never be empty
when a new rsp is needed? I wonder if there is a leak.
Steve.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Crash in nvmet_req_init() - null req->rsp pointer
2018-08-27 18:24 ` Steve Wise
@ 2018-08-27 20:29 ` Steve Wise
2018-08-31 0:31 ` Sagi Grimberg
0 siblings, 1 reply; 6+ messages in thread
From: Steve Wise @ 2018-08-27 20:29 UTC (permalink / raw)
On 8/27/2018 1:24 PM, Steve Wise wrote:
>
>
> On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
>>
>>> Resending in plain text...
>>>
>>> ----
>>>
>>> Hey guys,
>>>
>>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>>> running, I see an Oops [1] when the target is processing a new ingress
>>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>>
>>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>>> *ops)
>>> ?? 495?? {
>>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>>> ?? 497?????????? u16 status;
>>> ?? 498
>>> ?? 499?????????? req->cq = cq;
>>> ?? 500?????????? req->sq = sq;
>>> ?? 501?????????? req->ops = ops;
>>> ?? 502?????????? req->sg = NULL;
>>> ?? 503?????????? req->sg_cnt = 0;
>>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>>
>>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>>> But it seems very bad that req->rsp is NULL! :)
>>>
>>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
>>
>> Is it possible that you ran out of rsps and got a corrupted rsp?
>>
>> How about trying out this patch to add more information:
>> --
>
> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> looking at.
>
> Isn't the rsp list supposed to be sized such that it will never be empty
> when a new rsp is needed? I wonder if there is a leak.
>
> Steve.
>
I do see that during this heavy load, the rdma send queue "full"
condition gets hit often:
static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
{
struct nvmet_rdma_queue *queue = rsp->queue;
if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
&queue->sq_wr_avail) < 0)) {
pr_debug("IB send queue full (needed %d): queue %u
cntlid %u\n",
1 + rsp->n_rdma, queue->idx,
queue->nvme_sq.ctrl->cntlid);
atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
return false;
}
...
So commands are getting added to the wr_wait list:
static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
struct nvmet_rdma_rsp *cmd)
{
...
if (unlikely(!nvmet_rdma_execute_command(cmd))) {
spin_lock(&queue->rsp_wr_wait_lock);
list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
spin_unlock(&queue->rsp_wr_wait_lock);
}
...
Perhaps there's some bug in the wr_wait_list processing of deferred
commands? I don't see anything though.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Crash in nvmet_req_init() - null req->rsp pointer
2018-08-27 20:29 ` Steve Wise
@ 2018-08-31 0:31 ` Sagi Grimberg
2018-08-31 13:01 ` Steve Wise
0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2018-08-31 0:31 UTC (permalink / raw)
>> Hey Sagi, it hits the empty rsp list path often with your debug patch.
>> I added code to BUG_ON() after 10 times and I have a crash dump I'm
>> looking at.
>>
>> Isn't the rsp list supposed to be sized such that it will never be empty
>> when a new rsp is needed? I wonder if there is a leak.
Doesn't look from my scan..
> I do see that during this heavy load, the rdma send queue "full"
> condition gets hit often:
>
> static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> {
> struct nvmet_rdma_queue *queue = rsp->queue;
>
> if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> &queue->sq_wr_avail) < 0)) {
> pr_debug("IB send queue full (needed %d): queue %u
> cntlid %u\n",
> 1 + rsp->n_rdma, queue->idx,
> queue->nvme_sq.ctrl->cntlid);
> atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> return false;
> }
>
> ...
>
> So commands are getting added to the wr_wait list:
>
> static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
> struct nvmet_rdma_rsp *cmd)
> {
> ...
> if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> spin_lock(&queue->rsp_wr_wait_lock);
> list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> spin_unlock(&queue->rsp_wr_wait_lock);
> }
> ...
>
>
> Perhaps there's some bug in the wr_wait_list processing of deferred
> commands? I don't see anything though.
I assume this could happen if under heavy load the device send
completions are slower than the rate incoming commands arrival
(perhaps device and/or sw).
Because we post recv before sending the response back, there is
a window where host can send us a new command before the send completion
arrived, this is why we allocate more.
However, I think that nothing prevents that under heavy load the gap
is growing until we exhaust 2x rsps.
So perhaps this is something we actually need to account for it...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Crash in nvmet_req_init() - null req->rsp pointer
2018-08-31 0:31 ` Sagi Grimberg
@ 2018-08-31 13:01 ` Steve Wise
0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2018-08-31 13:01 UTC (permalink / raw)
>
>
> >> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> >> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> >> looking at.
> >>
> >> Isn't the rsp list supposed to be sized such that it will never be empty
> >> when a new rsp is needed? I wonder if there is a leak.
>
> Doesn't look from my scan..
>
> > I do see that during this heavy load, the rdma send queue "full"
> > condition gets hit often:
> >
> > static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> > {
> > struct nvmet_rdma_queue *queue = rsp->queue;
> >
> > if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> > &queue->sq_wr_avail) < 0)) {
> > pr_debug("IB send queue full (needed %d): queue %u
> > cntlid %u\n",
> > 1 + rsp->n_rdma, queue->idx,
> > queue->nvme_sq.ctrl->cntlid);
> > atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> > return false;
> > }
> >
> > ...
> >
> > So commands are getting added to the wr_wait list:
> >
> > static void nvmet_rdma_handle_command(struct nvmet_rdma_queue
> *queue,
> > struct nvmet_rdma_rsp *cmd)
> > {
> > ...
> > if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> > spin_lock(&queue->rsp_wr_wait_lock);
> > list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> > spin_unlock(&queue->rsp_wr_wait_lock);
> > }
> > ...
> >
> >
> > Perhaps there's some bug in the wr_wait_list processing of deferred
> > commands? I don't see anything though.
>
> I assume this could happen if under heavy load the device send
> completions are slower than the rate incoming commands arrival
> (perhaps device and/or sw).
>
> Because we post recv before sending the response back, there is
> a window where host can send us a new command before the send
> completion
> arrived, this is why we allocate more.
>
> However, I think that nothing prevents that under heavy load the gap
> is growing until we exhaust 2x rsps.
>
> So perhaps this is something we actually need to account for it...
Thanks for the explanation. Yes, I believe we do. Will you post the formal patch? If it is the same as the one I already confirmed, you can add my test-by tag.
Thanks,
Steve.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-08-31 13:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
[not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
2018-08-27 20:29 ` Steve Wise
2018-08-31 0:31 ` Sagi Grimberg
2018-08-31 13:01 ` Steve Wise
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).