linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [resend] Crash in nvmet_req_init() - null req->rsp pointer
       [not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
@ 2018-08-20 20:47   ` Sagi Grimberg
  2018-08-22 15:08     ` Steve Wise
  2018-08-27 18:24     ` Steve Wise
  0 siblings, 2 replies; 6+ messages in thread
From: Sagi Grimberg @ 2018-08-20 20:47 UTC (permalink / raw)



> Resending in plain text...
> 
> ----
> 
> Hey guys,
> 
> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
> running, I see an Oops [1] when the target is processing a new ingress
> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
> 
>    493   bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>    494                   struct nvmet_sq *sq, struct nvmet_fabrics_ops *ops)
>    495   {
>    496           u8 flags = req->cmd->common.flags;
>    497           u16 status;
>    498
>    499           req->cq = cq;
>    500           req->sq = sq;
>    501           req->ops = ops;
>    502           req->sg = NULL;
>    503           req->sg_cnt = 0;
>    504           req->rsp->status = 0; <-- HERE
> 
> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
> But it seems very bad that req->rsp is NULL! :)
> 
> Any thoughts?? I didn't see anything like this in recent nvmf fixes...

Is it possible that you ran out of rsps and got a corrupted rsp?

How about trying out this patch to add more information:
--
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index e7f43d1e1779..890d9c45ca33 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {

         struct nvmet_req        req;

+       bool                    allocated;
         u8                      n_rdma;
         u32                     flags;
         u32                     invalidate_rkey;
@@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
         unsigned long flags;

         spin_lock_irqsave(&queue->rsps_lock, flags);
-       rsp = list_first_entry(&queue->free_rsps,
+       rsp = list_first_entry_or_null(&queue->free_rsps,
                                 struct nvmet_rdma_rsp, free_list);
-       list_del(&rsp->free_list);
+       if (rsp) {
+               list_del(&rsp->free_list);
+               rsp->allocated = false;
+       }
         spin_unlock_irqrestore(&queue->rsps_lock, flags);

+       if (!rsp) {
+               pr_debug("dynamically allocated rsp\n");
+               rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
+               rsp->allocated = true;
+       }
+
         return rsp;
  }

@@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
  {
         unsigned long flags;

+       if (rsp->allocated) {
+               kfree(rsp);
+               return;
+       }
+
         spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
         list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
         spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
--

> 
> Thanks,
> 
> Steve.
> 
> [1]? Oops stack trace:
> 
> [45259.588347] BUG: unable to handle kernel NULL pointer dereference at
> 000000000000000e
> [45259.597112] IP: nvmet_req_init+0x3d/0x120 [nvmet]
> [45259.602687] PGD 0 P4D 0
> [45259.606083] Oops: 0002 [#1] SMP PTI
> [45259.610421] Modules linked in: tcp_lp t4_tom(OE) toecore(OE)
> nvmet_rdma nvmet rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
> scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
> ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
> iw_cxgb4(OE) ib_core cxgb4(OE) dm_mirror dm_region_hash dm_log
> intel_rapl vfat nls_utf8 fat isofs x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm loop irqbypass crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper dm_mod
> iTCO_wdt iTCO_vendor_support pcspkr lpc_ich joydev shpchp i2c_i801 sg
> ipmi_si wmi ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd
> grace sunrpc ip_tables ext4 mbcache jbd2 ast drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops sd_mod ttm igb drm ptp ahci
> [45259.687761]? libahci pps_core libata crc32c_intel dca i2c_algo_bit
> i2c_core nvme nvme_core
> [45259.697067] CPU: 15 PID: 4259 Comm: kworker/15:1H Tainted:
> G?????????? OE?? 4.14.52 #1
> [45259.706042] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0b
> 06/12/2017
> [45259.714204] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> [45259.720873] task: ffff8bb54c2c0000 task.stack: ffffadd60a334000
> [45259.727802] RIP: 0010:nvmet_req_init+0x3d/0x120 [nvmet]
> [45259.734140] RSP: 0000:ffffadd60a337dc8 EFLAGS: 00010246
> [45259.740373] RAX: 0000000000000040 RBX: ffff8bb3ecf4f748 RCX:
> 0000000000000000
> [45259.748508] RDX: 0000000000000000 RSI: ffff8bb3ecf4f830 RDI:
> ffff8bb3ecf4f748
> [45259.756662] RBP: ffff8bb3ecf4f838 R08: 0000000000000000 R09:
> ffffadd60a337d4c
> [45259.764818] R10: 0000000000000000 R11: 0000000000001000 R12:
> ffff8bb3ecf4f748
> [45259.773006] R13: 0000000000000000 R14: 0000000000000010 R15:
> ffff8bb498ec7400
> [45259.781187] FS:? 0000000000000000(0000) GS:ffff8bb5575c0000(0000)
> knlGS:0000000000000000
> [45259.790256] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [45259.796985] CR2: 000000000000000e CR3: 000000070980a003 CR4:
> 00000000003606e0
> [45259.805113] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [45259.813256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [45259.821382] Call Trace:
> [45259.824794]? nvmet_rdma_handle_command+0xab/0x2a0 [nvmet_rdma]
> [45259.831603]? __ib_process_cq+0x71/0xc0 [ib_core]
> [45259.837219]? ib_cq_poll_work+0x24/0x60 [ib_core]
> [45259.842829]? process_one_work+0x14f/0x360
> [45259.847800]? worker_thread+0x47/0x3e0
> [45259.852470]? kthread+0xfc/0x130
> [45259.856538]? ? max_active_store+0x80/0x80
> [45259.861476]? ? __kthread_parkme+0x90/0x90
> [45259.866427]? ret_from_fork+0x35/0x40
> [45259.870910] Code: 40 01 48 89 77 18 48 89 53 10 48 8b 57 08 48 89 8f
> 58 01 00 00 31 c9 48 c7 47 28 00 00 00 00 c7 87 38 01 00 00 00 00 00 00
> a8 03 <66> 89 4a 0e 75 51 83 e0 c0 3c 40 75 4a 48 8b 47 10 48 8b 10 48
> [45259.891681] RIP: nvmet_req_init+0x3d/0x120 [nvmet] RSP: ffffadd60a337dc8
> [45259.899369] CR2: 000000000000000e
> [45259.903653] ---[ end trace 3bf559cbf4cd1ab5 ]---
> 
> [2]? various structs:
> 
> crash> nvmet_req ffff8bb3ecf4f748
> struct nvmet_req {
>  ? cmd = 0xffff8bb4cfd64b40,
>  ? rsp = 0x0,
>  ? sq = 0xffff8bb3ecf4f838,
>  ? cq = 0xffff8bb3ecf4f830,
>  ? ns = 0xb5000000b92,
>  ? sg = 0x0,
>  ? inline_bio = {
>  ??? bi_next = 0x3c0,
>  ??? bi_disk = 0x1000f0,
>  ??? bi_opf = 0,
>  ??? bi_flags = 24216,
>  ??? bi_ioprio = 41863,
>  ??? bi_write_hint = 0,
>  ??? bi_status = 0 '\000',
>  ??? bi_partno = 0 '\000',
>  ??? bi_phys_segments = 9,
>  ??? bi_seg_front_size = 0,
>  ??? bi_seg_back_size = 0,
>  ??? bi_iter = {
>  ????? bi_sector = 0,
>  ????? bi_size = 15728648,
>  ????? bi_idx = 13500636,
>  ????? bi_done = 272,
>  ????? bi_bvec_done = 704
>  ??? },
>  ??? __bi_remaining = {
>  ????? counter = 1426761728
>  ??? },
>  ??? bi_end_io = 0xffff8bb4550aa8ce,
>  ??? bi_private = 0x100001050,
>  ??? bi_ioc = 0x1,
>  ??? bi_css = 0x0,
>  ??? {
>  ????? bi_integrity = 0x0
>  ??? },
>  ??? bi_vcnt = 0,
>  ??? bi_max_vecs = 0,
>  ??? __bi_cnt = {
>  ????? counter = 0
>  ??? },
>  ??? bi_io_vec = 0x0,
>  ??? bi_pool = 0x0,
>  ??? bi_inline_vecs = 0xffff8bb3ecf4f800
>  ? },
>  ? inline_bvec = {{
>  ????? bv_page = 0xffff8bb498ec4c00,
>  ????? bv_len = 2832033792,
>  ????? bv_offset = 4294937516
>  ??? }, {
>  ????? bv_page = 0xffff8bb46af86800,
>  ????? bv_len = 7,
>  ????? bv_offset = 0
>  ??? }, {
>  ????? bv_page = 0xffff8bb4039c9900,
>  ????? bv_len = 0,
>  ????? bv_offset = 1
>  ??? }, {
>  ????? bv_page = 0x200003,
>  ????? bv_len = 3463016448,
>  ????? bv_offset = 4294937524
>  ??? }, {
>  ????? bv_page = 0x8000000000000001,
>  ????? bv_len = 2822996336,
>  ????? bv_offset = 16928
>  ??? }, {
>  ????? bv_page = 0xffffffffc0c530f0 <nvmet_sq_free>,
>  ????? bv_len = 0,
>  ????? bv_offset = 0
>  ??? }, {
>  ????? bv_page = 0x0,
>  ????? bv_len = 0,
>  ????? bv_offset = 0
>  ??? }, {
>  ????? bv_page = 0x0,
>  ????? bv_len = 2097155,
>  ????? bv_offset = 19
>  ??? }},
>  ? sg_cnt = 0,
>  ? data_len = 0,
>  ? port = 0xffff8baca8cd6800,
>  ? execute = 0xffff8bb3ecf4f890,
>  ? ops = 0xffffffffc08a3100 <nvmet_rdma_ops>
> }
> crash> nvmet_rdma_cmd 0xffff8bb50b83d200
> struct nvmet_rdma_cmd {
>  ? sge = {{
>  ????? addr = 42141633344,
>  ????? length = 64,
>  ????? lkey = 0
>  ??? }, {
>  ????? addr = 40797528064,
>  ????? length = 4096,
>  ????? lkey = 0
>  ??? }},
>  ? cqe = {
>  ??? done = 0xffffffffc08a05f0 <nvmet_rdma_recv_done>
>  ? },
>  ? wr = {
>  ??? next = 0x0,
>  ??? {
>  ????? wr_id = 18446616208431370784,
>  ????? wr_cqe = 0xffff8bb50b83d220
>  ??? },
>  ??? sg_list = 0xffff8bb50b83d200,
>  ??? num_sge = 2
>  ? },
>  ? inline_sg = {
>  ??? page_link = 18446694492170806146,
>  ??? offset = 0,
>  ??? length = 4096,
>  ??? dma_address = 0,
>  ??? dma_length = 0
>  ? },
>  ? inline_page = 0xffffd2e7e5fee380,
>  ? nvme_cmd = 0xffff8bb4cfd64b40,
>  ? queue = 0xffff8bb3ecf4f800
> }
> crash> nvme_command.rw 0xffff8bb4cfd64b40
>  ??? rw = {
>  ????? opcode = 1 '\001',
>  ????? flags = 64 '@',
>  ????? command_id = 27,
>  ????? nsid = 38,
>  ????? rsvd2 = 0,
>  ????? metadata = 0,
>  ????? dptr = {
>  ??????? {
>  ????????? prp1 = 30590726144,
>  ????????? prp2 = 5692787825389355008
>  ??????? },
>  ??????? sgl = {
>  ????????? addr = 30590726144,
>  ????????? length = 2449489920,
>  ????????? rsvd = <incomplete sequence \330>,
>  ????????? type = 79 'O'
>  ??????? },
>  ??????? ksgl = {
>  ????????? addr = 30590726144,
>  ????????? length = "\000@",
>  ????????? key = "\222", <incomplete sequence \330>,
>  ????????? type = 79 'O'
>  ??????? }
>  ????? },
>  ????? slba = 1001032,
>  ????? length = 31,
>  ????? control = 0,
>  ????? dsmgmt = 0,
>  ????? reftag = 0,
>  ????? apptag = 0,
>  ????? appmask = 0
>  ??? }
> crash> nvmet_rdma_queue 0xffff8bb3ecf4f800
> struct nvmet_rdma_queue {
>  ? cm_id = 0xffff8bb498ec4c00,
>  ? port = 0xffff8baca8cd6800,
>  ? cq = 0xffff8bb46af86800,
>  ? sq_wr_avail = {
>  ??? counter = 7
>  ? },
>  ? dev = 0xffff8bb4039c9900,
>  ? state_lock = {
>  ??? {
>  ????? rlock = {
>  ??????? raw_lock = {
>  ????????? val = {
>  ??????????? counter = 0
>  ????????? }
>  ??????? }
>  ????? }
>  ??? }
>  ? },
>  ? state = NVMET_RDMA_Q_LIVE,
>  ? nvme_cq = {
>  ??? qid = 3,
>  ??? size = 32
>  ? },
>  ? nvme_sq = {
>  ??? ctrl = 0xffff8bb4ce697000,
>  ??? ref = {
>  ????? count = {
>  ??????? counter = -9223372036854775807
>  ????? },
>  ????? percpu_count_ptr = 72708029383024,
>  ????? release = 0xffffffffc0c530f0 <nvmet_sq_free>,
>  ????? confirm_switch = 0x0,
>  ????? force_atomic = false,
>  ????? rcu = {
>  ??????? next = 0x0,
>  ??????? func = 0x0
>  ????? }
>  ??? },
>  ??? qid = 3,
>  ??? size = 32,
>  ??? sqhd = 19,
>  ??? free_done = {
>  ????? done = 0,
>  ????? wait = {
>  ??????? lock = {
>  ????????? {
>  ??????????? rlock = {
>  ????????????? raw_lock = {
>  ??????????????? val = {
>  ????????????????? counter = 0
>  ??????????????? }
>  ????????????? }
>  ??????????? }
>  ????????? }
>  ??????? },
>  ??????? head = {
>  ????????? next = 0xffff8baca8cd6800,
>  ????????? prev = 0xffff8bb3ecf4f890
>  ??????? }
>  ????? }
>  ??? },
>  ??? confirm_done = {
>  ????? done = 3230281984,
>  ????? wait = {
>  ??????? lock = {
>  ????????? {
>  ??????????? rlock = {
>  ????????????? raw_lock = {
>  ??????????????? val = {
>  ????????????????? counter = 0
>  ??????????????? }
>  ????????????? }
>  ??????????? }
>  ????????? }
>  ??????? },
>  ??????? head = {
>  ????????? next = 0xffff8bb3ecf4f8b0,
>  ????????? prev = 0xffff8bb3ecf4f8b0
>  ??????? }
>  ????? }
>  ??? }
>  ? },
>  ? rsps = 0xffff8bb4cd9c0000,
>  ? free_rsps = {
>  ??? next = 0xdead000000000100,
>  ??? prev = 0xdead000000000200
>  ? },
>  ? rsps_lock = {
>  ??? {
>  ????? rlock = {
>  ??????? raw_lock = {
>  ????????? val = {
>  ??????????? counter = 0
>  ????????? }
>  ??????? }
>  ????? }
>  ??? }
>  ? },
>  ? cmds = 0xffff8bb50b83d000,
>  ? release_work = {
>  ??? data = {
>  ????? counter = 68719476704
>  ??? },
>  ??? entry = {
>  ????? next = 0xffff8bb3ecf4f8f0,
>  ????? prev = 0xffff8bb3ecf4f8f0
>  ??? },
>  ??? func = 0xffffffffc089f0d0 <nvmet_rdma_release_queue_work>
>  ? },
>  ? rsp_wait_list = {
>  ??? next = 0xffff8bb3ecf4f908,
>  ??? prev = 0xffff8bb3ecf4f908
>  ? },
>  ? rsp_wr_wait_list = {
>  ??? next = 0xffff8bb3ecf4f918,
>  ??? prev = 0xffff8bb3ecf4f918
>  ? },
>  ? rsp_wr_wait_lock = {
>  ??? {
>  ????? rlock = {
>  ??????? raw_lock = {
>  ????????? val = {
>  ??????????? counter = 0
>  ????????? }
>  ??????? }
>  ????? }
>  ??? }
>  ? },
>  ? idx = 151,
>  ? host_qid = 3,
>  ? recv_queue_size = 32,
>  ? send_queue_size = 32,
>  ? queue_list = {
>  ??? next = 0xffff8bb464686540,
>  ??? prev = 0xffff8bb3ecf4d140
>  ? }
> }
> crash> ib_wc ffff8bb498ec7400
> struct ib_wc {
>  ? {
>  ??? wr_id = 18446616208431370784,
>  ??? wr_cqe = 0xffff8bb50b83d220
>  ? },
>  ? status = IB_WC_SUCCESS,
>  ? opcode = IB_WC_RECV,
>  ? vendor_err = 0,
>  ? byte_len = 64,
>  ? qp = 0xffff8bb4d9759c00,
>  ? ex = {
>  ??? imm_data = 282903233,
>  ??? invalidate_rkey = 282903233
>  ? },
>  ? src_qp = 283165377,
>  ? slid = 283427521,
>  ? wc_flags = 0,
>  ? pkey_index = 49857,
>  ? sl = 236 '\354',
>  ? dlid_path_bits = 16 '\020',
>  ? port_num = 193 '\301',
>  ? smac = "\302\360\020\301\302", <incomplete sequence \364>,
>  ? vlan_id = 49857,
>  ? network_hdr_type = 248 '\370'
> }
> 
> 
> 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [resend] Crash in nvmet_req_init() - null req->rsp pointer
  2018-08-20 20:47   ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
@ 2018-08-22 15:08     ` Steve Wise
  2018-08-27 18:24     ` Steve Wise
  1 sibling, 0 replies; 6+ messages in thread
From: Steve Wise @ 2018-08-22 15:08 UTC (permalink / raw)




On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
> 
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>> ?? 495?? {
>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>> ?? 497?????????? u16 status;
>> ?? 498
>> ?? 499?????????? req->cq = cq;
>> ?? 500?????????? req->sq = sq;
>> ?? 501?????????? req->ops = ops;
>> ?? 502?????????? req->sg = NULL;
>> ?? 503?????????? req->sg_cnt = 0;
>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>
>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
> 
> Is it possible that you ran out of rsps and got a corrupted rsp?

That is what it looks like.  Because the nvmet_rdma_rsp pointer returned
from nvmet_rdma_get_rsp() was not within the bounds of the response
struct array allocated for that queue.  I wasn't sure how that could
happen?  I would think first_entry() would return NULL if the list was
empty, but I guess not.


> 
> How about trying out this patch to add more information:

I'll try this out.  Thanks Sagi!

> -- 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index e7f43d1e1779..890d9c45ca33 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {
> 
> ??????? struct nvmet_req??????? req;
> 
> +?????? bool??????????????????? allocated;
> ??????? u8????????????????????? n_rdma;
> ??????? u32???????????????????? flags;
> ??????? u32???????????????????? invalidate_rkey;
> @@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
> ??????? unsigned long flags;
> 
> ??????? spin_lock_irqsave(&queue->rsps_lock, flags);
> -?????? rsp = list_first_entry(&queue->free_rsps,
> +?????? rsp = list_first_entry_or_null(&queue->free_rsps,
> ??????????????????????????????? struct nvmet_rdma_rsp, free_list);
> -?????? list_del(&rsp->free_list);
> +?????? if (rsp) {
> +?????????????? list_del(&rsp->free_list);
> +?????????????? rsp->allocated = false;
> +?????? }
> ??????? spin_unlock_irqrestore(&queue->rsps_lock, flags);
> 
> +?????? if (!rsp) {
> +?????????????? pr_debug("dynamically allocated rsp\n");
> +?????????????? rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
> +?????????????? rsp->allocated = true;
> +?????? }
> +
> ??????? return rsp;
> ?}
> 
> @@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
> ?{
> ??????? unsigned long flags;
> 
> +?????? if (rsp->allocated) {
> +?????????????? kfree(rsp);
> +?????????????? return;
> +?????? }
> +
> ??????? spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
> ??????? list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
> ??????? spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
> -- 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Crash in nvmet_req_init() - null req->rsp pointer
  2018-08-20 20:47   ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
  2018-08-22 15:08     ` Steve Wise
@ 2018-08-27 18:24     ` Steve Wise
  2018-08-27 20:29       ` Steve Wise
  1 sibling, 1 reply; 6+ messages in thread
From: Steve Wise @ 2018-08-27 18:24 UTC (permalink / raw)




On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
> 
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>> ?? 495?? {
>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>> ?? 497?????????? u16 status;
>> ?? 498
>> ?? 499?????????? req->cq = cq;
>> ?? 500?????????? req->sq = sq;
>> ?? 501?????????? req->ops = ops;
>> ?? 502?????????? req->sg = NULL;
>> ?? 503?????????? req->sg_cnt = 0;
>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>
>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
> 
> Is it possible that you ran out of rsps and got a corrupted rsp?
> 
> How about trying out this patch to add more information:
> -- 

Hey Sagi, it hits the empty rsp list path often with your debug patch.
I added code to BUG_ON() after 10 times and I have a crash dump I'm
looking at.

Isn't the rsp list supposed to be sized such that it will never be empty
when a new rsp is needed?  I wonder if there is a leak.

Steve.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Crash in nvmet_req_init() - null req->rsp pointer
  2018-08-27 18:24     ` Steve Wise
@ 2018-08-27 20:29       ` Steve Wise
  2018-08-31  0:31         ` Sagi Grimberg
  0 siblings, 1 reply; 6+ messages in thread
From: Steve Wise @ 2018-08-27 20:29 UTC (permalink / raw)




On 8/27/2018 1:24 PM, Steve Wise wrote:
> 
> 
> On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
>>
>>> Resending in plain text...
>>>
>>> ----
>>>
>>> Hey guys,
>>>
>>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>>> running, I see an Oops [1] when the target is processing a new ingress
>>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>>
>>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>>> *ops)
>>> ?? 495?? {
>>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>>> ?? 497?????????? u16 status;
>>> ?? 498
>>> ?? 499?????????? req->cq = cq;
>>> ?? 500?????????? req->sq = sq;
>>> ?? 501?????????? req->ops = ops;
>>> ?? 502?????????? req->sg = NULL;
>>> ?? 503?????????? req->sg_cnt = 0;
>>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>>
>>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>>> But it seems very bad that req->rsp is NULL! :)
>>>
>>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
>>
>> Is it possible that you ran out of rsps and got a corrupted rsp?
>>
>> How about trying out this patch to add more information:
>> -- 
> 
> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> looking at.
> 
> Isn't the rsp list supposed to be sized such that it will never be empty
> when a new rsp is needed?  I wonder if there is a leak.
> 
> Steve.
> 

I do see that during this heavy load, the rdma send queue "full"
condition gets hit often:

static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
{
        struct nvmet_rdma_queue *queue = rsp->queue;

        if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
                        &queue->sq_wr_avail) < 0)) {
                pr_debug("IB send queue full (needed %d): queue %u
cntlid %u\n",
                                1 + rsp->n_rdma, queue->idx,
                                queue->nvme_sq.ctrl->cntlid);
                atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
                return false;
        }

...

So commands are getting added to the wr_wait list:

static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
                struct nvmet_rdma_rsp *cmd)
{
...
        if (unlikely(!nvmet_rdma_execute_command(cmd))) {
                spin_lock(&queue->rsp_wr_wait_lock);
                list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
                spin_unlock(&queue->rsp_wr_wait_lock);
        }
...


Perhaps there's some bug in the wr_wait_list processing of deferred
commands?  I don't see anything though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Crash in nvmet_req_init() - null req->rsp pointer
  2018-08-27 20:29       ` Steve Wise
@ 2018-08-31  0:31         ` Sagi Grimberg
  2018-08-31 13:01           ` Steve Wise
  0 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2018-08-31  0:31 UTC (permalink / raw)



>> Hey Sagi, it hits the empty rsp list path often with your debug patch.
>> I added code to BUG_ON() after 10 times and I have a crash dump I'm
>> looking at.
>>
>> Isn't the rsp list supposed to be sized such that it will never be empty
>> when a new rsp is needed?  I wonder if there is a leak.

Doesn't look from my scan..

> I do see that during this heavy load, the rdma send queue "full"
> condition gets hit often:
> 
> static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> {
>          struct nvmet_rdma_queue *queue = rsp->queue;
> 
>          if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
>                          &queue->sq_wr_avail) < 0)) {
>                  pr_debug("IB send queue full (needed %d): queue %u
> cntlid %u\n",
>                                  1 + rsp->n_rdma, queue->idx,
>                                  queue->nvme_sq.ctrl->cntlid);
>                  atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
>                  return false;
>          }
> 
> ...
> 
> So commands are getting added to the wr_wait list:
> 
> static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
>                  struct nvmet_rdma_rsp *cmd)
> {
> ...
>          if (unlikely(!nvmet_rdma_execute_command(cmd))) {
>                  spin_lock(&queue->rsp_wr_wait_lock);
>                  list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
>                  spin_unlock(&queue->rsp_wr_wait_lock);
>          }
> ...
> 
> 
> Perhaps there's some bug in the wr_wait_list processing of deferred
> commands?  I don't see anything though.

I assume this could happen if under heavy load the device send
completions are slower than the rate incoming commands arrival
(perhaps device and/or sw).

Because we post recv before sending the response back, there is
a window where host can send us a new command before the send completion
arrived, this is why we allocate more.

However, I think that nothing prevents that under heavy load the gap
is growing until we exhaust 2x rsps.

So perhaps this is something we actually need to account for it...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Crash in nvmet_req_init() - null req->rsp pointer
  2018-08-31  0:31         ` Sagi Grimberg
@ 2018-08-31 13:01           ` Steve Wise
  0 siblings, 0 replies; 6+ messages in thread
From: Steve Wise @ 2018-08-31 13:01 UTC (permalink / raw)


> 
> 
> >> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> >> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> >> looking at.
> >>
> >> Isn't the rsp list supposed to be sized such that it will never be empty
> >> when a new rsp is needed?  I wonder if there is a leak.
> 
> Doesn't look from my scan..
> 
> > I do see that during this heavy load, the rdma send queue "full"
> > condition gets hit often:
> >
> > static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> > {
> >          struct nvmet_rdma_queue *queue = rsp->queue;
> >
> >          if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> >                          &queue->sq_wr_avail) < 0)) {
> >                  pr_debug("IB send queue full (needed %d): queue %u
> > cntlid %u\n",
> >                                  1 + rsp->n_rdma, queue->idx,
> >                                  queue->nvme_sq.ctrl->cntlid);
> >                  atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> >                  return false;
> >          }
> >
> > ...
> >
> > So commands are getting added to the wr_wait list:
> >
> > static void nvmet_rdma_handle_command(struct nvmet_rdma_queue
> *queue,
> >                  struct nvmet_rdma_rsp *cmd)
> > {
> > ...
> >          if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> >                  spin_lock(&queue->rsp_wr_wait_lock);
> >                  list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> >                  spin_unlock(&queue->rsp_wr_wait_lock);
> >          }
> > ...
> >
> >
> > Perhaps there's some bug in the wr_wait_list processing of deferred
> > commands?  I don't see anything though.
> 
> I assume this could happen if under heavy load the device send
> completions are slower than the rate incoming commands arrival
> (perhaps device and/or sw).
> 
> Because we post recv before sending the response back, there is
> a window where host can send us a new command before the send
> completion
> arrived, this is why we allocate more.
> 
> However, I think that nothing prevents that under heavy load the gap
> is growing until we exhaust 2x rsps.
> 
> So perhaps this is something we actually need to account for it...

Thanks for the explanation.  Yes, I believe we do.  Will you post the formal patch?  If it is the same as the one I already confirmed, you can add my test-by tag.

Thanks,

Steve.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-31 13:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
     [not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47   ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08     ` Steve Wise
2018-08-27 18:24     ` Steve Wise
2018-08-27 20:29       ` Steve Wise
2018-08-31  0:31         ` Sagi Grimberg
2018-08-31 13:01           ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).