* target crash / host hang with nvme-all.3 branch of nvme-fabrics @ 2016-06-16 14:53 Steve Wise 2016-06-16 14:57 ` Christoph Hellwig 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 14:53 UTC (permalink / raw) Hey Christoph and Sagi, We have a setup using chelsio T580s and HGST SSDs, and they crash using nvmf-all.3 under fio load. Have either of you seen these signatures so far? I think this is a regression from the pre-public code base... The target side gets a keep-alive timeout and then Oopes: [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! [11436.609866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 [nvmet_rdma] [11436.625467] PGD 0 [11436.627509] Oops: 0000 [#1] SMP [11436.630657] Modules linked in: oprofile nvme_loop nvme_fabrics brd null_blk nvmet_rdma nvmet nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc intel_rapl iosf_mbi sb_edac edac_core rpcrdma x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm sunrpc ib_isert iscsi_target_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec ib_iser libiscsi scsi_transport_iscsi ib_srpt eeepc_wmi target_core_mod asus_wmi irqbypass crct10dif_pclmul sparse_keymap crc32_pclmul snd_hda_core crc32c_intel rfkill ib_srp ghash_clmulni_intel scsi_transport_srp video snd_hwdep aesni_intel snd_seq lrw gf128mul glue_helper ablk_helper ib_ipoib cryptd snd_seq_device snd_pcm iTCO_wdt mei_me rdma_ucm acpi_pad iTCO_vendor_support pcspkr ipmi_si mxm_wmi ib_ucm acpi_power_meter serio_raw ipmi_msghandler ib_uverbs mei snd_timer snd shpchp soundcore i2c_i801 ib_umad wmi rdma_cm lpc_ich mfd_core ib_cm iw_cm uinput xfs libcrc32c mlx4_ib mlx4_en iw_cxgb4 ib_core usb_storage sd_mod sr_mod cdrom ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb mlx4_core ahci libahci ptp cxgb4 libata nvme pps_core nvme_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: oprofile] [11436.763684] CPU: 0 PID: 18956 Comm: kworker/0:0 Not tainted 4.7.0-rc2+ #1 [11436.770482] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3101 11/04/2015 [11436.779296] Workqueue: events nvmet_keep_alive_timer [nvmet] [11436.784986] task: ffff880853990000 ti: ffff880806aa0000 task.ti: ffff880806aa0000 [11436.792479] RIP: 0010:[<ffffffffa09c6dff>] [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 [nvmet_rdma] [11436.802615] RSP: 0018:ffff880806aa3de0 EFLAGS: 00010287 [11436.807944] RAX: 0000000000000000 RBX: ffff8808527d2e00 RCX: 0000000000000001 [11436.815090] RDX: ffff8808527d2e00 RSI: ffff880804d15120 RDI: ffff880804d14720 [11436.822977] RBP: ffff880806aa3df8 R08: ffffffffa09ca000 R09: 0000000000000000 [11436.830876] R10: 0000000000000005 R11: 00000000000005af R12: ffff8808527d1f20 [11436.838797] R13: ffff8807fb44c400 R14: ffff88085fc1c200 R15: 0000000000000000 [11436.846734] FS: 0000000000000000(0000) GS:ffff88085fc00000(0000) knlGS:0000000000000000 [11436.855638] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11436.862223] CR2: 0000000000000050 CR3: 0000000001c06000 CR4: 00000000001406f0 [11436.870207] Stack: [11436.873079] ffff8807fb44c4c8 ffff8807fdbeb180 ffff88085fc176c0 ffff880806aa3e10 [11436.881457] ffffffffa0a8f0c3 ffff8807fb44c4c8 ffff880806aa3e58 ffffffff8109b842 [11436.889835] 00000000fdbeb1b0 0000000000000000 ffff88085fc176e0 ffff8807fdbeb1b0 [11436.898277] Call Trace: [11436.901663] [<ffffffffa0a8f0c3>] nvmet_keep_alive_timer+0x33/0x40 [nvmet] [11436.909510] [<ffffffff8109b842>] process_one_work+0x152/0x400 [11436.916314] [<ffffffff8109c135>] worker_thread+0x125/0x4b0 [11436.922865] [<ffffffff8109c010>] ? rescuer_thread+0x380/0x380 [11436.929691] [<ffffffff810a1c68>] kthread+0xd8/0xf0 [11436.935583] [<ffffffff816b013f>] ret_from_fork+0x1f/0x40 The host side is stuck here and maybe this triggers the target keepalive timer? This stack doesn't look like it is in the NVMF path though. CPUs 6 and 7 are stuck: [ 3211.677301] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 20.0.0.34:4420 [ 3211.679215] nvme nvme0: creating 12 I/O queues. [ 3211.709939] nvme nvme0: new ctrl: NQN "testnqn", addr 20.0.0.34:4420 ... [11209.649435] nvme nvme0: RECV for CQE 0xffff88083da56890 failed with status WR flushed (5) [11209.649467] nvme nvme0: reconnecting in 10 seconds [11210.265801] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [fio:708] [11210.265806] Modules linked in: oprofile nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nvme_rdma nvme_fabrics nvme nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ib_srp scsi_transport_srp ib_ipoib kvm_intel kvm rdma_ucm ib_ucm ib_uverbs snd_hda_intel snd_hda_codec ib_umad rdma_cm snd_hda_core ib_cm iw_cm irqbypass snd_hwdep crct10dif_pclmul crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel snd_seq_device snd_pcm [11210.265867] aesni_intel iTCO_wdt lrw gf128mul iTCO_vendor_support snd_timer glue_helper snd ablk_helper cryptd mei_me lpc_ich serio_raw pcspkr mei ipmi_si shpchp i2c_i801 soundcore mfd_core wmi ipmi_msghandler tpm_infineon uinput xfs libcrc32c mlx4_ib mlx4_en sd_mod iw_cxgb4 ib_core sr_mod cdrom usb_storage ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm isci drm libsas mlx4_core igb ahci libahci scsi_transport_sas cxgb4 libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: oprofile] [11210.265905] CPU: 6 PID: 708 Comm: fio Not tainted 4.7.0-rc2+ #1 [11210.265909] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D16 Series/Z9PE-D16 Series, BIOS 5506 06/03/2015 [11210.265911] task: ffff88105267c140 ti: ffff881050314000 task.ti: ffff881050314000 [11210.265912] RIP: 0010:[<ffffffff81103845>] [<ffffffff81103845>] smp_call_function_single+0xd5/0x120 [11210.265921] RSP: 0000:ffff881050317a30 EFLAGS: 00000202 [11210.265922] RAX: 0000000000000000 RBX: 0000000000000008 RCX: ffff880805deed50 [11210.265923] RDX: ffff88107fc98f80 RSI: ffff881050317a40 RDI: ffff881050317a40 [11210.265925] RBP: ffff881050317a78 R08: 0000000000000100 R09: 0000000000000000 [11210.265926] R10: ffff881051285018 R11: 0000000000000000 R12: ffffffff810710d0 [11210.265927] R13: 0000000000000006 R14: 0000000000000008 R15: ffff8810520b0368 [11210.265929] FS: 00007f7281ca2740(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 [11210.265930] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11210.265931] CR2: 000000000071d230 CR3: 000000104d5ef000 CR4: 00000000001406e0 [11210.265933] Stack: [11210.265934] 0000000000000400 ffff8810158e9000 ffff880805deed50 ffffffff810710d0 [11210.265937] ffff881050317ad8 0000000000000003 0000000041548ca1 ffffffff810710d0 [11210.265939] ffff881050317ad8 ffff881050317ac0 ffffffff81103c5c 0000000000000006 [11210.265941] Call Trace: [11210.265949] [<ffffffff810710d0>] ? do_flush_tlb_all+0x50/0x50 [11210.265952] [<ffffffff810710d0>] ? do_flush_tlb_all+0x50/0x50 [11210.265957] [<ffffffff81103c5c>] smp_call_function_many+0x20c/0x250 [11210.265959] [<ffffffff81071571>] native_flush_tlb_others+0x131/0x1a0 [11210.265962] [<ffffffff81071834>] flush_tlb_page+0x54/0x90 [11210.265967] [<ffffffff811ce548>] ptep_clear_flush+0x48/0x60 [11210.265971] [<ffffffff811c95e7>] try_to_unmap_one+0x167/0x5f0 [11210.265973] [<ffffffff811c80cf>] rmap_walk_anon+0xef/0x210 [11210.265976] [<ffffffff811ca2c5>] rmap_walk+0x45/0x60 [11210.265978] [<ffffffff811ca679>] try_to_unmap+0xb9/0x150 [11210.265981] [<ffffffff811c9480>] ? page_remove_rmap+0x220/0x220 [11210.265983] [<ffffffff811c7ea0>] ? invalid_page_referenced_vma+0x90/0x90 [11210.265986] [<ffffffff811c9d20>] ? page_get_anon_vma+0x90/0x90 [11210.265988] [<ffffffff811c7d00>] ? invalid_mkclean_vma+0x20/0x20 [11210.265993] [<ffffffff811f2e1d>] migrate_pages+0x76d/0x8f0 [11210.265995] [<ffffffff811f0ad0>] ? compound_head+0x20/0x20 [11210.265997] [<ffffffff811f38a5>] migrate_misplaced_page+0x125/0x1b0 [11210.266009] [<ffffffff811bc1b2>] handle_pte_fault+0x642/0x1760 [11210.266015] [<ffffffff81310f46>] ? blk_flush_plug_list+0xb6/0x220 [11210.266017] [<ffffffff8131147c>] ? blk_finish_plug+0x2c/0x40 [11210.266019] [<ffffffff811be74b>] handle_mm_fault+0x2bb/0x660 [11210.266022] [<ffffffff8106a3ae>] __do_page_fault+0x1ce/0x4a0 [11210.266025] [<ffffffff8106a6b0>] do_page_fault+0x30/0x80 [11210.266031] [<ffffffff81003bb2>] ? do_syscall_64+0x102/0x110 [11210.266035] [<ffffffff816b2188>] page_fault+0x28/0x30 [11210.266036] Code: 00 00 00 75 68 48 83 c4 38 5b 41 5c 5d c3 48 8d 75 c8 48 89 d1 89 df 4c 89 e2 e8 17 fe ff ff f6 45 e0 01 74 08 f3 90 f6 45 e0 01 <75> f8 eb c7 8b 05 71 55 ed 00 85 c0 75 8b 80 3d 4d ce c1 00 00 [11222.265928] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [fio:705] [11222.265932] Modules linked in: oprofile nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nvme_rdma nvme_fabrics nvme nvme_core xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ib_srp scsi_transport_srp ib_ipoib kvm_intel kvm rdma_ucm ib_ucm ib_uverbs snd_hda_intel snd_hda_codec ib_umad rdma_cm snd_hda_core ib_cm iw_cm irqbypass snd_hwdep crct10dif_pclmul crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel snd_seq_device snd_pcm [11222.265982] aesni_intel iTCO_wdt lrw gf128mul iTCO_vendor_support snd_timer glue_helper snd ablk_helper cryptd mei_me lpc_ich serio_raw pcspkr mei ipmi_si shpchp i2c_i801 soundcore mfd_core wmi ipmi_msghandler tpm_infineon uinput xfs libcrc32c mlx4_ib mlx4_en sd_mod iw_cxgb4 ib_core sr_mod cdrom usb_storage ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm isci drm libsas mlx4_core igb ahci libahci scsi_transport_sas cxgb4 libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: oprofile] [11222.266013] CPU: 7 PID: 705 Comm: fio Tainted: G L 4.7.0-rc2+ #1 [11222.266015] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D16 Series/Z9PE-D16 Series, BIOS 5506 06/03/2015 [11222.266018] task: ffff88085c320000 ti: ffff880806388000 task.ti: ffff880806388000 [11222.266020] RIP: 0010:[<ffffffff810cc399>] [<ffffffff810cc399>] native_queued_spin_lock_slowpath+0x179/0x1a0 [11222.266026] RSP: 0000:ffff88080638bb18 EFLAGS: 00000202 [11222.266028] RAX: 0000000000000101 RBX: ffffea0020280580 RCX: 0000000000000001 [11222.266029] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffea0041503af0 [11222.266030] RBP: ffff88080638bb18 R08: 0000000000000101 R09: 00000010540eb000 [11222.266032] R10: ffff881051285018 R11: 0000000000000000 R12: ffff88080638bba8 [11222.266033] R13: 0000000000000000 R14: ffff8810540eb8d0 R15: ffffea0041503af0 [11222.266035] FS: 00007f7281ca2740(0000) GS:ffff88107fc40000(0000) knlGS:0000000000000000 [11222.266036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11222.266037] CR2: 000000000071a980 CR3: 0000000850d57000 CR4: 00000000001406e0 [11222.266039] Stack: [11222.266040] ffff88080638bb28 ffffffff81187380 ffff88080638bb38 ffffffff816afcc0 [11222.266042] ffff88080638bb70 ffffffff811c87f4 0000000000000302 ffffea0020280580 [11222.266045] ffff8810520b0000 000000000071a000 ffff88105181d450 ffff88080638bbe0 [11222.266047] Call Trace: [11222.266055] [<ffffffff81187380>] queued_spin_lock_slowpath+0xb/0xf [11222.266060] [<ffffffff816afcc0>] _raw_spin_lock+0x20/0x30 [11222.266066] [<ffffffff811c87f4>] __page_check_address+0xe4/0x1d0 [11222.266069] [<ffffffff811c951d>] try_to_unmap_one+0x9d/0x5f0 [11222.266072] [<ffffffff811c80cf>] rmap_walk_anon+0xef/0x210 [11222.266074] [<ffffffff811ca2c5>] rmap_walk+0x45/0x60 [11222.266077] [<ffffffff811ca679>] try_to_unmap+0xb9/0x150 [11222.266079] [<ffffffff811c9480>] ? page_remove_rmap+0x220/0x220 [11222.266082] [<ffffffff811c7ea0>] ? invalid_page_referenced_vma+0x90/0x90 [11222.266084] [<ffffffff811c9d20>] ? page_get_anon_vma+0x90/0x90 [11222.266087] [<ffffffff811c7d00>] ? invalid_mkclean_vma+0x20/0x20 [11222.266090] [<ffffffff811f2e1d>] migrate_pages+0x76d/0x8f0 [11222.266092] [<ffffffff811f0ad0>] ? compound_head+0x20/0x20 [11222.266095] [<ffffffff811f38a5>] migrate_misplaced_page+0x125/0x1b0 [11222.266097] [<ffffffff811bc1b2>] handle_pte_fault+0x642/0x1760 [11222.266100] [<ffffffff811be74b>] handle_mm_fault+0x2bb/0x660 [11222.266104] [<ffffffff81212c53>] ? __fput+0x193/0x220 [11222.266108] [<ffffffff8106a3ae>] __do_page_fault+0x1ce/0x4a0 [11222.266110] [<ffffffff8106a6b0>] do_page_fault+0x30/0x80 [11222.266116] [<ffffffff81003ba5>] ? do_syscall_64+0xf5/0x110 [11222.266118] [<ffffffff816b2188>] page_fault+0x28/0x30 [11222.266120] Code: d0 66 31 c0 39 c1 74 e7 4d 85 c9 c6 07 01 74 2b 41 c7 41 08 01 00 00 00 e9 51 ff ff ff 83 fa 01 75 07 e9 ac fe ff ff f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4d 8b 08 4d 85 ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 14:53 target crash / host hang with nvme-all.3 branch of nvme-fabrics Steve Wise @ 2016-06-16 14:57 ` Christoph Hellwig 2016-06-16 15:10 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Christoph Hellwig @ 2016-06-16 14:57 UTC (permalink / raw) On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: > [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! > [11436.609866] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000050 > [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 Can you check using gdb where in the code this is? This is the obvious crash we'll need to fix first. Then we'll need to figure out why the keep alive timer times out under this workload. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 14:57 ` Christoph Hellwig @ 2016-06-16 15:10 ` Christoph Hellwig 2016-06-16 15:17 ` Steve Wise 2016-06-16 19:11 ` Sagi Grimberg 2016-06-16 15:24 ` Steve Wise 2016-06-16 15:56 ` Steve Wise 2 siblings, 2 replies; 50+ messages in thread From: Christoph Hellwig @ 2016-06-16 15:10 UTC (permalink / raw) I think nvmet_rdma_delete_ctrl is getting the exlusion vs other calls or __nvmet_rdma_queue_disconnect wrong as we rely on a queue that is undergoing deletion to not be on any list. Additionally it also check the cntlid instead of the pointer, which would be harmful if multiple subsystems have the same cntlid. Does the following patch help? diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index b1c6e5b..9ae65a7 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1293,19 +1293,21 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) { - struct nvmet_rdma_queue *queue, *next; - static LIST_HEAD(del_list); + struct nvmet_rdma_queue *queue, *found = NULL; mutex_lock(&nvmet_rdma_queue_mutex); - list_for_each_entry_safe(queue, next, - &nvmet_rdma_queue_list, queue_list) { - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) - list_move_tail(&queue->queue_list, &del_list); + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { + if (queue->nvme_sq.ctrl == ctrl) { + list_del_init(&queue->queue_list); + found = queue; + break; + } } + mutex_unlock(&nvmet_rdma_queue_mutex); - list_for_each_entry_safe(queue, next, &del_list, queue_list) - nvmet_rdma_queue_disconnect(queue); + if (found) + __nvmet_rdma_queue_disconnect(queue); } static int nvmet_rdma_add_port(struct nvmet_port *port) ^ permalink raw reply related [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 15:10 ` Christoph Hellwig @ 2016-06-16 15:17 ` Steve Wise 2016-06-16 19:11 ` Sagi Grimberg 1 sibling, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 15:17 UTC (permalink / raw) Hey Yoichi, can you please try this out on your setup? I'm still trying to reproduce this on mine. Thanks! Steve. > I think nvmet_rdma_delete_ctrl is getting the exlusion vs other calls > or __nvmet_rdma_queue_disconnect wrong as we rely on a queue that > is undergoing deletion to not be on any list. Additionally it also > check the cntlid instead of the pointer, which would be harmful if > multiple subsystems have the same cntlid. > > Does the following patch help? > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index b1c6e5b..9ae65a7 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1293,19 +1293,21 @@ static int nvmet_rdma_cm_handler(struct > rdma_cm_id *cm_id, > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > { > - struct nvmet_rdma_queue *queue, *next; > - static LIST_HEAD(del_list); > + struct nvmet_rdma_queue *queue, *found = NULL; > > mutex_lock(&nvmet_rdma_queue_mutex); > - list_for_each_entry_safe(queue, next, > - &nvmet_rdma_queue_list, queue_list) { > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > - list_move_tail(&queue->queue_list, &del_list); > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > + if (queue->nvme_sq.ctrl == ctrl) { > + list_del_init(&queue->queue_list); > + found = queue; > + break; > + } > } > + > mutex_unlock(&nvmet_rdma_queue_mutex); > > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > - nvmet_rdma_queue_disconnect(queue); > + if (found) > + __nvmet_rdma_queue_disconnect(queue); > } > > static int nvmet_rdma_add_port(struct nvmet_port *port) ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 15:10 ` Christoph Hellwig 2016-06-16 15:17 ` Steve Wise @ 2016-06-16 19:11 ` Sagi Grimberg 2016-06-16 20:38 ` Christoph Hellwig 1 sibling, 1 reply; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 19:11 UTC (permalink / raw) > I think nvmet_rdma_delete_ctrl is getting the exlusion vs other calls > or __nvmet_rdma_queue_disconnect wrong as we rely on a queue that > is undergoing deletion to not be on any list. How do we rely on that? __nvmet_rdma_queue_disconnect callers are responsible for queue_list deletion and queue the release. I don't see where are we getting it wrong. Additionally it also > check the cntlid instead of the pointer, which would be harmful if > multiple subsystems have the same cntlid. That's true, we need to compare pointers... > > Does the following patch help? > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index b1c6e5b..9ae65a7 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1293,19 +1293,21 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > { > - struct nvmet_rdma_queue *queue, *next; > - static LIST_HEAD(del_list); > + struct nvmet_rdma_queue *queue, *found = NULL; > > mutex_lock(&nvmet_rdma_queue_mutex); > - list_for_each_entry_safe(queue, next, > - &nvmet_rdma_queue_list, queue_list) { > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > - list_move_tail(&queue->queue_list, &del_list); > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > + if (queue->nvme_sq.ctrl == ctrl) { > + list_del_init(&queue->queue_list); > + found = queue; > + break; > + } > } > + > mutex_unlock(&nvmet_rdma_queue_mutex); > > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > - nvmet_rdma_queue_disconnect(queue); > + if (found) > + __nvmet_rdma_queue_disconnect(queue); > } > > static int nvmet_rdma_add_port(struct nvmet_port *port) > Umm, this looks wrong to me. delete_controller should delete _all_ the ctrl queues (which will usually involve more than 1), what about all the other queues? what am I missing? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 19:11 ` Sagi Grimberg @ 2016-06-16 20:38 ` Christoph Hellwig 2016-06-16 21:37 ` Sagi Grimberg 0 siblings, 1 reply; 50+ messages in thread From: Christoph Hellwig @ 2016-06-16 20:38 UTC (permalink / raw) On Thu, Jun 16, 2016@10:11:30PM +0300, Sagi Grimberg wrote: >> I think nvmet_rdma_delete_ctrl is getting the exlusion vs other calls >> or __nvmet_rdma_queue_disconnect wrong as we rely on a queue that >> is undergoing deletion to not be on any list. > > How do we rely on that? __nvmet_rdma_queue_disconnect callers are > responsible for queue_list deletion and queue the release. I don't > see where are we getting it wrong. Thread 1: Moves the queues off nvmet_rdma_queue_list and and onto the local list in nvmet_rdma_delete_ctrl Thread 2: Gets into nvmet_rdma_cm_handler -> nvmet_rdma_queue_disconnect for one of the queues now on the local list. list_empty(&queue->queue_list) evaluates to false because the queue is on the local list, and now we have thread 1 and 2 racing for disconnecting the queue. >> static int nvmet_rdma_add_port(struct nvmet_port *port) >> > > Umm, this looks wrong to me. delete_controller should delete _all_ > the ctrl queues (which will usually involve more than 1), what about > all the other queues? what am I missing? Yes, it should - see the patch I just posted. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:38 ` Christoph Hellwig @ 2016-06-16 21:37 ` Sagi Grimberg 2016-06-16 21:40 ` Sagi Grimberg 2016-06-21 16:01 ` Christoph Hellwig 0 siblings, 2 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 21:37 UTC (permalink / raw) >> How do we rely on that? __nvmet_rdma_queue_disconnect callers are >> responsible for queue_list deletion and queue the release. I don't >> see where are we getting it wrong. > > Thread 1: > > Moves the queues off nvmet_rdma_queue_list and and onto the > local list in nvmet_rdma_delete_ctrl > > Thread 2: > > Gets into nvmet_rdma_cm_handler -> nvmet_rdma_queue_disconnect for one > of the queues now on the local list. list_empty(&queue->queue_list) evaluates > to false because the queue is on the local list, and now we have thread 1 > and 2 racing for disconnecting the queue. But the list removal and list_empty evaluation is still under a mutex, isn't that sufficient to avoid the race? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:37 ` Sagi Grimberg @ 2016-06-16 21:40 ` Sagi Grimberg 2016-06-21 16:01 ` Christoph Hellwig 1 sibling, 0 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 21:40 UTC (permalink / raw) >>> How do we rely on that? __nvmet_rdma_queue_disconnect callers are >>> responsible for queue_list deletion and queue the release. I don't >>> see where are we getting it wrong. >> >> Thread 1: >> >> Moves the queues off nvmet_rdma_queue_list and and onto the >> local list in nvmet_rdma_delete_ctrl >> >> Thread 2: >> >> Gets into nvmet_rdma_cm_handler -> nvmet_rdma_queue_disconnect for one >> of the queues now on the local list. list_empty(&queue->queue_list) >> evaluates >> to false because the queue is on the local list, and now we have thread 1 >> and 2 racing for disconnecting the queue. > > But the list removal and list_empty evaluation is still under a mutex, > isn't that sufficient to avoid the race? And we also have a mutual exclusion point inside __nvmet_rdma_queue_disconnect with queue->state_lock... ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:37 ` Sagi Grimberg 2016-06-16 21:40 ` Sagi Grimberg @ 2016-06-21 16:01 ` Christoph Hellwig 2016-06-22 10:22 ` Sagi Grimberg 1 sibling, 1 reply; 50+ messages in thread From: Christoph Hellwig @ 2016-06-21 16:01 UTC (permalink / raw) On Fri, Jun 17, 2016@12:37:18AM +0300, Sagi Grimberg wrote: >> to false because the queue is on the local list, and now we have thread 1 >> and 2 racing for disconnecting the queue. > > But the list removal and list_empty evaluation is still under a mutex, > isn't that sufficient to avoid the race? If only once side takes the lock it's not very helpful. We can execute nvmet_rdma_queue_disconnect from the CM handler at the same time that the queue is on the to be removed list, which creates two issues: a) we manipulate local del_list without any knowledge of the thread calling nvmet_rdma_delete_ctrl, leading to potential list corruption, and b) we can call into __nvmet_rdma_queue_disconnect concurrently. As you pointed out we still have the per-queue state_lock inside __nvmet_rdma_queue_disconnect so b) probably is harmless at the moment as long as the queue hasn't been freed yet by one of the racing threads, which is fairly unlikely. Either way - using list_empty to check if something is still alive due to being linked in a list and using a local dispose list simply don't mix. Both are useful patterns on their own, but should not be mixed. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-21 16:01 ` Christoph Hellwig @ 2016-06-22 10:22 ` Sagi Grimberg 0 siblings, 0 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-22 10:22 UTC (permalink / raw) >>> to false because the queue is on the local list, and now we have thread 1 >>> and 2 racing for disconnecting the queue. >> >> But the list removal and list_empty evaluation is still under a mutex, >> isn't that sufficient to avoid the race? > > If only once side takes the lock it's not very helpful. We can > execute nvmet_rdma_queue_disconnect from the CM handler at the > same time that the queue is on the to be removed list, which creates > two issues: a) we manipulate local del_list without any knowledge > of the thread calling nvmet_rdma_delete_ctrl, leading to potential > list corruption It's not racy because the two call sites are mutually exclusive (calling nvmet_rdma_queue_disconnect which is protected by a mutex) and the iteration on del_list is safe, but I'm not sure how safe it is because it's a local list and another context may manipulate it. However, note that this is not the case here. The report did not include a CM thread in the mix. , and b) we can call into __nvmet_rdma_queue_disconnect > concurrently. As you pointed out we still have the per-queue > state_lock inside __nvmet_rdma_queue_disconnect so b) probably > is harmless at the moment as long as the queue hasn't been freed > yet by one of the racing threads, which is fairly unlikely. I don't think we can end up with a use-after-free condition, we terminate both the CM events and the list removal so I don't see how that can happen. > Either way - using list_empty to check if something is still alive due > to being linked in a list and using a local dispose list simply don't > mix. Both are useful patterns on their own, but should not be mixed. I guess you are correct. I just don't like lock manipulations inside the list iterations, it can be an opening for future bugs as it requires a very strict understanding of what is going on... ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 14:57 ` Christoph Hellwig 2016-06-16 15:10 ` Christoph Hellwig @ 2016-06-16 15:24 ` Steve Wise 2016-06-16 16:41 ` Steve Wise 2016-06-16 15:56 ` Steve Wise 2 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 15:24 UTC (permalink / raw) > > On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: > > [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! > > [11436.609866] BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000050 > > [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 > > Can you check using gdb where in the code this is? > > This is the obvious crash we'll need to fix first. Then we'll need to > figure out why the keep alive timer times out under this workload. > While Yoichi is gathering this on his setup, I'm trying to reproduce it on mine. I hit a similar crash by loading up a fio job, and then bringing down the interface of the port used on the host node, let the target timer expire, then bring the host interface back up. The target freed the queues, and eventually the host reconnected, and the test continued. But shortly after that I hit this on the target. It looks related: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa0203b69>] nvmet_rdma_queue_disconnect+0x49/0x90 [nvmet_rdma] PGD 102f0d1067 PUD 102ccc5067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: iw_cxgb4 ib_isert iscsi_target_mod target_core_user uio target_core_pscsi target_core_file target_core_iblock target_core_mod udp_tunnel ip6_udp_tunnel rdma_ucm cxgb4 nvmet_rdma rdma_cm iw_cm nvmet null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_core acpi_cpufreq nvme nvme_core ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: ib_rxe] CPU: 5 PID: 106 Comm: kworker/5:1 Tainted: G E 4.7.0-rc2-nvmf-all.3+rxe+ #83 Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 Workqueue: events nvmet_keep_alive_timer [nvmet] task: ffff88103f3e8e00 ti: ffff88103f3ec000 task.ti: ffff88103f3ec000 RIP: 0010:[<ffffffffa0203b69>] [<ffffffffa0203b69>] nvmet_rdma_queue_disconnect+0x49/0x90 [nvmet_rdma] RSP: 0018:ffff88103f3efb98 EFLAGS: 00010282 RAX: ffff88102ebe4320 RBX: ffff88102ebe4200 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88103f3e8e80 RDI: ffffffffa02061e0 RBP: ffff88103f3efbd8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000d28 R11: 0000000000000001 R12: ffff88107f355c40 R13: ffffe8ffffb41a00 R14: 0000000000000000 R15: ffffe8ffffb41a05 FS: 0000000000000000(0000) GS:ffff88107f340000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000102f0f2000 CR4: 00000000000406e0 Stack: ffff88102ebe4200 ffff88103f3e8e80 ffff88102ebe4200 ffffffffffffff10 0000000000000000 0000000000000010 0000000000000292 ffff88102ebe4200 ffff88103f3efc18 ffffffffa0203c9e ffffffffa0206210 0000000000000001 Call Trace: [<ffffffffa0203c9e>] nvmet_rdma_delete_ctrl+0xee/0x120 [nvmet_rdma] [<ffffffffa01d4237>] nvmet_keep_alive_timer+0x37/0x40 [nvmet] [<ffffffff8107cb5b>] process_one_work+0x17b/0x510 ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 15:24 ` Steve Wise @ 2016-06-16 16:41 ` Steve Wise 0 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 16:41 UTC (permalink / raw) > > > > On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: > > > [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! > > > [11436.609866] BUG: unable to handle kernel NULL pointer dereference at > > > 0000000000000050 > > > [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 > > > > Can you check using gdb where in the code this is? > > > > This is the obvious crash we'll need to fix first. Then we'll need to > > figure out why the keep alive timer times out under this workload. > > > > While Yoichi is gathering this on his setup, I'm trying to reproduce it on mine. > I hit a similar crash by loading up a fio job, and then bringing down the > interface of the port used on the host node, let the target timer expire, then > bring the host interface back up. The target freed the queues, and eventually > the host reconnected, and the test continued. But shortly after that I hit this > on the target. It looks related: > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > IP: [<ffffffffa0203b69>] nvmet_rdma_queue_disconnect+0x49/0x90 [nvmet_rdma] > PGD 102f0d1067 PUD 102ccc5067 PMD 0 > Oops: 0002 [#1] SMP Your patch you sent out seems to resolve my crash. We'll see if Yoichi has the same results. Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 14:57 ` Christoph Hellwig 2016-06-16 15:10 ` Christoph Hellwig 2016-06-16 15:24 ` Steve Wise @ 2016-06-16 15:56 ` Steve Wise 2016-06-16 19:55 ` Sagi Grimberg 2 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 15:56 UTC (permalink / raw) > On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: > > [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! > > [11436.609866] BUG: unable to handle kernel NULL pointer dereference at > > 0000000000000050 > > [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 > > Can you check using gdb where in the code this is? nvmet_rdma_delete_ctrl(): /root/nvmef/nvme-fabrics/drivers/nvme/target/rdma.c:1302 &nvmet_rdma_queue_list, queue_list) { if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) df6: 48 8b 40 38 mov 0x38(%rax),%rax dfa: 41 0f b7 4d 50 movzwl 0x50(%r13),%ecx dff: 66 39 48 50 cmp %cx,0x50(%rax) <=========== here e03: 75 cd jne dd2 <nvmet_rdma_delete_ctrl+0x42> ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 15:56 ` Steve Wise @ 2016-06-16 19:55 ` Sagi Grimberg 2016-06-16 19:59 ` Steve Wise ` (3 more replies) 0 siblings, 4 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 19:55 UTC (permalink / raw) >> On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: >>> [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! >>> [11436.609866] BUG: unable to handle kernel NULL pointer dereference at >>> 0000000000000050 >>> [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 >> >> Can you check using gdb where in the code this is? > > > nvmet_rdma_delete_ctrl(): > /root/nvmef/nvme-fabrics/drivers/nvme/target/rdma.c:1302 > &nvmet_rdma_queue_list, queue_list) { > if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > df6: 48 8b 40 38 mov 0x38(%rax),%rax > dfa: 41 0f b7 4d 50 movzwl 0x50(%r13),%ecx > dff: 66 39 48 50 cmp %cx,0x50(%rax) <=========== > here > e03: 75 cd jne dd2 <nvmet_rdma_delete_ctrl+0x42> Umm, I think this might be happening because we get to delete_ctrl when one of our queues has a NULL ctrl. This means that either: 1. we never got a chance to initialize it, or 2. we already freed it. (1) doesn't seem possible as we have a very short window (that we're better off eliminating) between when we start the keep-alive timer (in alloc_ctrl) and the time we assign the sq->ctrl (install_queue). (2) doesn't seem likely either to me at least as from what I followed, delete_ctrl should be mutual exclusive with other deletions, moreover, I didn't see an indication in the logs that any other deletions are happening. Steve, is this something that started happening recently? does the 4.6-rc3 tag suffer from the same phenomenon? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 19:55 ` Sagi Grimberg @ 2016-06-16 19:59 ` Steve Wise 2016-06-16 20:07 ` Sagi Grimberg 2016-06-16 20:01 ` Steve Wise ` (2 subsequent siblings) 3 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 19:59 UTC (permalink / raw) > > >> On Thu, Jun 16, 2016@09:53:45AM -0500, Steve Wise wrote: > >>> [11436.603807] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! > >>> [11436.609866] BUG: unable to handle kernel NULL pointer dereference at > >>> 0000000000000050 > >>> [11436.617764] IP: [<ffffffffa09c6dff>] nvmet_rdma_delete_ctrl+0x6f/0x100 > >> > >> Can you check using gdb where in the code this is? > > > > > > nvmet_rdma_delete_ctrl(): > > /root/nvmef/nvme-fabrics/drivers/nvme/target/rdma.c:1302 > > &nvmet_rdma_queue_list, queue_list) { > > if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > > df6: 48 8b 40 38 mov 0x38(%rax),%rax > > dfa: 41 0f b7 4d 50 movzwl 0x50(%r13),%ecx > > dff: 66 39 48 50 cmp %cx,0x50(%rax) <=========== > > here > > e03: 75 cd jne dd2 <nvmet_rdma_delete_ctrl+0x42> > > Umm, I think this might be happening because we get to delete_ctrl when > one of our queues has a NULL ctrl. This means that either: > 1. we never got a chance to initialize it, or > 2. we already freed it. > > (1) doesn't seem possible as we have a very short window (that we're > better off eliminating) between when we start the keep-alive timer (in > alloc_ctrl) and the time we assign the sq->ctrl (install_queue). > > (2) doesn't seem likely either to me at least as from what I followed, > delete_ctrl should be mutual exclusive with other deletions, moreover, > I didn't see an indication in the logs that any other deletions are > happening. > > Steve, is this something that started happening recently? does the > 4.6-rc3 tag suffer from the same phenomenon? I'll try and reproduce this on the older code, but the keep-alive timer fired for some other reason, so I'm not sure the target side keep-alive has been tested until now. But it is easy to test over iWARP, just do this while a heavy fio is running: ifconfig ethX down; sleep 15; ifconfig ethX <ipaddr>/<mask> up ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 19:59 ` Steve Wise @ 2016-06-16 20:07 ` Sagi Grimberg 2016-06-16 20:12 ` Steve Wise 2016-06-16 20:35 ` Steve Wise 0 siblings, 2 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 20:07 UTC (permalink / raw) >> >> Umm, I think this might be happening because we get to delete_ctrl when >> one of our queues has a NULL ctrl. This means that either: >> 1. we never got a chance to initialize it, or >> 2. we already freed it. >> >> (1) doesn't seem possible as we have a very short window (that we're >> better off eliminating) between when we start the keep-alive timer (in >> alloc_ctrl) and the time we assign the sq->ctrl (install_queue). >> >> (2) doesn't seem likely either to me at least as from what I followed, >> delete_ctrl should be mutual exclusive with other deletions, moreover, >> I didn't see an indication in the logs that any other deletions are >> happening. >> >> Steve, is this something that started happening recently? does the >> 4.6-rc3 tag suffer from the same phenomenon? > > I'll try and reproduce this on the older code, but the keep-alive timer fired > for some other reason, My assumption was that it fired because it didn't get a keep-alive from the host which is exactly what it's supposed to do? > so I'm not sure the target side keep-alive has been > tested until now. I tested it, and IIRC the original patch had Ming's tested-by tag. > But it is easy to test over iWARP, just do this while a heavy > fio is running: > > ifconfig ethX down; sleep 15; ifconfig ethX <ipaddr>/<mask> up So this is related to I/O load then? Does it happen when you just do it without any I/O? (or small load)? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:07 ` Sagi Grimberg @ 2016-06-16 20:12 ` Steve Wise 2016-06-16 20:27 ` Ming Lin 2016-06-16 20:35 ` Steve Wise 1 sibling, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 20:12 UTC (permalink / raw) > >> > >> Umm, I think this might be happening because we get to delete_ctrl when > >> one of our queues has a NULL ctrl. This means that either: > >> 1. we never got a chance to initialize it, or > >> 2. we already freed it. > >> > >> (1) doesn't seem possible as we have a very short window (that we're > >> better off eliminating) between when we start the keep-alive timer (in > >> alloc_ctrl) and the time we assign the sq->ctrl (install_queue). > >> > >> (2) doesn't seem likely either to me at least as from what I followed, > >> delete_ctrl should be mutual exclusive with other deletions, moreover, > >> I didn't see an indication in the logs that any other deletions are > >> happening. > >> > >> Steve, is this something that started happening recently? does the > >> 4.6-rc3 tag suffer from the same phenomenon? > > > > I'll try and reproduce this on the older code, but the keep-alive timer fired > > for some other reason, > > My assumption was that it fired because it didn't get a keep-alive from > the host which is exactly what it's supposed to do? > Yes, in the original email I started this thread with, I show that on the host, 2 cpus were stuck, and I surmise that the host node was stuck NVMF-wise and thus the target timer kicked and crashed the target. > > so I'm not sure the target side keep-alive has been > > tested until now. > > I tested it, and IIRC the original patch had Ming's tested-by tag. > How did you test it? > > But it is easy to test over iWARP, just do this while a heavy > > fio is running: > > > > ifconfig ethX down; sleep 15; ifconfig ethX <ipaddr>/<mask> up > > So this is related to I/O load then? Does it happen when > you just do it without any I/O? (or small load)? I'll try this. Note there are two sets of crashes discussed in this thread: the one Yoichi saw on his nodes where the host hung causing the target keep-alive to fire and crash. That is the crash with stack traces I included in the original email starting this thread. And then there is a repeatable crash on my setup, which looks the same, that happens when I bring the interface down long enough to kick the keep-alive. Since I can reproduce the latter easily I'm continuing with this debug. Here is the fio command I use: fio --bs=1k --time_based --runtime=2000 --numjobs=8 --name=TEST-1k-8g-20-8-32 --direct=1 --iodepth=32 -rw=randread --randrepeat=0 --norandommap --loops=1 --exitall --ioengine=libaio --filename=/dev/nvme1n1 ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:12 ` Steve Wise @ 2016-06-16 20:27 ` Ming Lin 2016-06-16 20:28 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-16 20:27 UTC (permalink / raw) On Thu, Jun 16, 2016@1:12 PM, Steve Wise <swise@opengridcomputing.com> wrote: > > >> >> >> >> Umm, I think this might be happening because we get to delete_ctrl when >> >> one of our queues has a NULL ctrl. This means that either: >> >> 1. we never got a chance to initialize it, or >> >> 2. we already freed it. >> >> >> >> (1) doesn't seem possible as we have a very short window (that we're >> >> better off eliminating) between when we start the keep-alive timer (in >> >> alloc_ctrl) and the time we assign the sq->ctrl (install_queue). >> >> >> >> (2) doesn't seem likely either to me at least as from what I followed, >> >> delete_ctrl should be mutual exclusive with other deletions, moreover, >> >> I didn't see an indication in the logs that any other deletions are >> >> happening. >> >> >> >> Steve, is this something that started happening recently? does the >> >> 4.6-rc3 tag suffer from the same phenomenon? >> > >> > I'll try and reproduce this on the older code, but the keep-alive timer > fired >> > for some other reason, >> >> My assumption was that it fired because it didn't get a keep-alive from >> the host which is exactly what it's supposed to do? >> > > Yes, in the original email I started this thread with, I show that on the host, > 2 cpus were stuck, and I surmise that the host node was stuck NVMF-wise and thus > the target timer kicked and crashed the target. > >> > so I'm not sure the target side keep-alive has been >> > tested until now. >> >> I tested it, and IIRC the original patch had Ming's tested-by tag. >> > > How did you test it? > >> > But it is easy to test over iWARP, just do this while a heavy >> > fio is running: >> > >> > ifconfig ethX down; sleep 15; ifconfig ethX <ipaddr>/<mask> up >> >> So this is related to I/O load then? Does it happen when >> you just do it without any I/O? (or small load)? > > I'll try this. > > Note there are two sets of crashes discussed in this thread: the one Yoichi saw > on his nodes where the host hung causing the target keep-alive to fire and > crash. That is the crash with stack traces I included in the original email > starting this thread. And then there is a repeatable crash on my setup, which > looks the same, that happens when I bring the interface down long enough to kick > the keep-alive. Since I can reproduce the latter easily I'm continuing with > this debug. > > Here is the fio command I use: > > fio --bs=1k --time_based --runtime=2000 --numjobs=8 --name=TEST-1k-8g-20-8-32 > --direct=1 --iodepth=32 -rw=randread --randrepeat=0 --norandommap --loops=1 > --exitall --ioengine=libaio --filename=/dev/nvme1n1 Hi Steve, Just to follow, does Christoph's patch fix the crash? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:27 ` Ming Lin @ 2016-06-16 20:28 ` Steve Wise 2016-06-16 20:34 ` 'Christoph Hellwig' 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 20:28 UTC (permalink / raw) > On Thu, Jun 16, 2016 at 1:12 PM, Steve Wise <swise at opengridcomputing.com> > wrote: > > > > > >> >> > >> >> Umm, I think this might be happening because we get to delete_ctrl when > >> >> one of our queues has a NULL ctrl. This means that either: > >> >> 1. we never got a chance to initialize it, or > >> >> 2. we already freed it. > >> >> > >> >> (1) doesn't seem possible as we have a very short window (that we're > >> >> better off eliminating) between when we start the keep-alive timer (in > >> >> alloc_ctrl) and the time we assign the sq->ctrl (install_queue). > >> >> > >> >> (2) doesn't seem likely either to me at least as from what I followed, > >> >> delete_ctrl should be mutual exclusive with other deletions, moreover, > >> >> I didn't see an indication in the logs that any other deletions are > >> >> happening. > >> >> > >> >> Steve, is this something that started happening recently? does the > >> >> 4.6-rc3 tag suffer from the same phenomenon? > >> > > >> > I'll try and reproduce this on the older code, but the keep-alive timer > > fired > >> > for some other reason, > >> > >> My assumption was that it fired because it didn't get a keep-alive from > >> the host which is exactly what it's supposed to do? > >> > > > > Yes, in the original email I started this thread with, I show that on the host, > > 2 cpus were stuck, and I surmise that the host node was stuck NVMF-wise and thus > > the target timer kicked and crashed the target. > > > >> > so I'm not sure the target side keep-alive has been > >> > tested until now. > >> > >> I tested it, and IIRC the original patch had Ming's tested-by tag. > >> > > > > How did you test it? > > > >> > But it is easy to test over iWARP, just do this while a heavy > >> > fio is running: > >> > > >> > ifconfig ethX down; sleep 15; ifconfig ethX <ipaddr>/<mask> up > >> > >> So this is related to I/O load then? Does it happen when > >> you just do it without any I/O? (or small load)? > > > > I'll try this. > > > > Note there are two sets of crashes discussed in this thread: the one Yoichi saw > > on his nodes where the host hung causing the target keep-alive to fire and > > crash. That is the crash with stack traces I included in the original email > > starting this thread. And then there is a repeatable crash on my setup, which > > looks the same, that happens when I bring the interface down long enough to kick > > the keep-alive. Since I can reproduce the latter easily I'm continuing with > > this debug. > > > > Here is the fio command I use: > > > > fio --bs=1k --time_based --runtime=2000 --numjobs=8 --name=TEST-1k-8g-20-8-32 > > --direct=1 --iodepth=32 -rw=randread --randrepeat=0 --norandommap --loops=1 > > --exitall --ioengine=libaio --filename=/dev/nvme1n1 > > Hi Steve, > > Just to follow, does Christoph's patch fix the crash? It does. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:28 ` Steve Wise @ 2016-06-16 20:34 ` 'Christoph Hellwig' 2016-06-16 20:49 ` Steve Wise 2016-06-27 17:26 ` Ming Lin 0 siblings, 2 replies; 50+ messages in thread From: 'Christoph Hellwig' @ 2016-06-16 20:34 UTC (permalink / raw) On Thu, Jun 16, 2016@03:28:06PM -0500, Steve Wise wrote: > > Just to follow, does Christoph's patch fix the crash? > > It does. Unfortunately I think it's still wrong because it will only delete a single queue per controller. We'll probably need something like this instead, which does the same think but also has a retry loop for additional queues: diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index b1c6e5b..425b55c 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) { - struct nvmet_rdma_queue *queue, *next; - static LIST_HEAD(del_list); + struct nvmet_rdma_queue *queue; +restart: mutex_lock(&nvmet_rdma_queue_mutex); - list_for_each_entry_safe(queue, next, - &nvmet_rdma_queue_list, queue_list) { - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) - list_move_tail(&queue->queue_list, &del_list); + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { + if (queue->nvme_sq.ctrl == ctrl) { + list_del_init(&queue->queue_list); + mutex_unlock(&nvmet_rdma_queue_mutex); + + __nvmet_rdma_queue_disconnect(queue); + goto restart; + } } mutex_unlock(&nvmet_rdma_queue_mutex); - - list_for_each_entry_safe(queue, next, &del_list, queue_list) - nvmet_rdma_queue_disconnect(queue); } static int nvmet_rdma_add_port(struct nvmet_port *port) ^ permalink raw reply related [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:34 ` 'Christoph Hellwig' @ 2016-06-16 20:49 ` Steve Wise 2016-06-16 21:06 ` Steve Wise 2016-06-27 17:26 ` Ming Lin 1 sibling, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-16 20:49 UTC (permalink / raw) > Unfortunately I think it's still wrong because it will only delete > a single queue per controller. We'll probably need something > like this instead, which does the same think but also has a retry > loop for additional queues: > > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index b1c6e5b..425b55c 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct > rdma_cm_id *cm_id, > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > { > - struct nvmet_rdma_queue *queue, *next; > - static LIST_HEAD(del_list); > + struct nvmet_rdma_queue *queue; > > +restart: > mutex_lock(&nvmet_rdma_queue_mutex); > - list_for_each_entry_safe(queue, next, > - &nvmet_rdma_queue_list, queue_list) { > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > - list_move_tail(&queue->queue_list, &del_list); > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > + if (queue->nvme_sq.ctrl == ctrl) { > + list_del_init(&queue->queue_list); > + mutex_unlock(&nvmet_rdma_queue_mutex); > + > + __nvmet_rdma_queue_disconnect(queue); > + goto restart; > + } > } > mutex_unlock(&nvmet_rdma_queue_mutex); > - > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > - nvmet_rdma_queue_disconnect(queue); > } > > static int nvmet_rdma_add_port(struct nvmet_port *port) > This patch works. Tested-by: Steve Wise <swise at opengridcomputing.com> ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:49 ` Steve Wise @ 2016-06-16 21:06 ` Steve Wise 2016-06-16 21:42 ` Sagi Grimberg ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 21:06 UTC (permalink / raw) > > Unfortunately I think it's still wrong because it will only delete > > a single queue per controller. We'll probably need something > > like this instead, which does the same think but also has a retry > > loop for additional queues: > > > > > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > > index b1c6e5b..425b55c 100644 > > --- a/drivers/nvme/target/rdma.c > > +++ b/drivers/nvme/target/rdma.c > > @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct > > rdma_cm_id *cm_id, > > > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > > { > > - struct nvmet_rdma_queue *queue, *next; > > - static LIST_HEAD(del_list); > > + struct nvmet_rdma_queue *queue; > > > > +restart: > > mutex_lock(&nvmet_rdma_queue_mutex); > > - list_for_each_entry_safe(queue, next, > > - &nvmet_rdma_queue_list, queue_list) { > > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > > - list_move_tail(&queue->queue_list, &del_list); > > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > > + if (queue->nvme_sq.ctrl == ctrl) { > > + list_del_init(&queue->queue_list); > > + mutex_unlock(&nvmet_rdma_queue_mutex); > > + > > + __nvmet_rdma_queue_disconnect(queue); > > + goto restart; > > + } > > } > > mutex_unlock(&nvmet_rdma_queue_mutex); > > - > > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > > - nvmet_rdma_queue_disconnect(queue); > > } > > > > static int nvmet_rdma_add_port(struct nvmet_port *port) > > > > This patch works. > > Tested-by: Steve Wise <swise at opengridcomputing.com> > > hrm... Forcing more reconnects, I just hit this. It looks different from the other issue: general protection fault: 0000 [#1] SMP Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm nvmet null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E 4.7.0-rc2-nvmf-all.3+rxe+ #84 Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 Stack: ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 Call Trace: [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107cb5b>] process_one_work+0x17b/0x510 [<ffffffff8161495c>] ? __schedule+0x23c/0x630 [<ffffffff810c6c4c>] ? del_timer_sync+0x4c/0x60 [<ffffffff8107da0b>] ? maybe_create_worker+0x8b/0x110 [<ffffffff81614eb0>] ? schedule+0x40/0xb0 [<ffffffff8107dbf6>] worker_thread+0x166/0x580 [<ffffffff8161495c>] ? __schedule+0x23c/0x630 [<ffffffff8108e162>] ? default_wake_function+0x12/0x20 [<ffffffff8109fc26>] ? __wake_up_common+0x56/0x90 [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 [<ffffffff81614eb0>] ? schedule+0x40/0xb0 [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 [<ffffffff8108255c>] kthread+0xcc/0xf0 [<ffffffff8108cade>] ? schedule_tail+0x1e/0xc0 [<ffffffff816186cf>] ret_from_fork+0x1f/0x40 [<ffffffff81082490>] ? kthread_freezable_should_stop+0x70/0x70 Code: b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de 83 c3 01 49 81 c4 38 02 00 00 48 8b 86 28 02 00 00 48 8b 96 30 02 00 00 <48> 89 50 08 48 89 45 c0 48 89 02 48 89 be 30 02 00 00 4c 89 ff RIP [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] RSP <ffff880fdf7a7bb8> ---[ end trace a30265f72371b5ce ]--- ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:06 ` Steve Wise @ 2016-06-16 21:42 ` Sagi Grimberg 2016-06-16 21:47 ` Ming Lin 2016-06-16 21:46 ` Steve Wise 2016-06-27 22:29 ` Ming Lin 2 siblings, 1 reply; 50+ messages in thread From: Sagi Grimberg @ 2016-06-16 21:42 UTC (permalink / raw) > hrm... > > Forcing more reconnects, I just hit this. It looks different from the other > issue: > > general protection fault: 0000 [#1] SMP > Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm nvmet > null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ > ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q > garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad > iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca dm_mirror > dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm > irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib ib_core > ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core > acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) > mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] > CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E > 4.7.0-rc2-nvmf-all.3+rxe+ #84 > Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 > Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] > task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 > RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] > nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 > RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 > RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 > RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 > R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 > R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 > FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 > Stack: > ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 > ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 > ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 > Call Trace: > [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] > [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] This looks more like a double-free/use-after-free condition... I'll try to reproduce this next week. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:42 ` Sagi Grimberg @ 2016-06-16 21:47 ` Ming Lin 2016-06-16 21:53 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-16 21:47 UTC (permalink / raw) On Thu, Jun 16, 2016@2:42 PM, Sagi Grimberg <sagi@grimberg.me> wrote: > >> hrm... >> >> Forcing more reconnects, I just hit this. It looks different from the >> other >> issue: >> >> general protection fault: 0000 [#1] SMP >> Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm nvmet >> null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables >> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT >> nf_reject_ >> ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 >> 8021q >> garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs >> ib_umad >> iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca >> dm_mirror >> dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm >> irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib >> ib_core >> ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core >> acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi >> ext4(E) >> mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] >> CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E >> 4.7.0-rc2-nvmf-all.3+rxe+ #84 >> Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 >> Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] >> task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 >> RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] >> nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] >> RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 >> RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 >> RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 >> RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 >> R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 >> R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 >> FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) >> knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 >> Stack: >> ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 >> ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 >> ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 >> Call Trace: >> [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] >> [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 >> [nvmet_rdma] > > > This looks more like a double-free/use-after-free condition... > I'll try to reproduce this next week. Interesting. I'll also try to reproduce it. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:47 ` Ming Lin @ 2016-06-16 21:53 ` Steve Wise 0 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 21:53 UTC (permalink / raw) > > On Thu, Jun 16, 2016@2:42 PM, Sagi Grimberg <sagi@grimberg.me> wrote: > > > >> hrm... > >> > >> Forcing more reconnects, I just hit this. It looks different from the > >> other > >> issue: > >> > >> general protection fault: 0000 [#1] SMP > >> Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm > nvmet > >> null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables > >> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > >> nf_reject_ > >> ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 > >> 8021q > >> garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs > >> ib_umad > >> iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca > >> dm_mirror > >> dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm > >> irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib > >> ib_core > >> ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core > >> acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi > >> ext4(E) > >> mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] > >> CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E > >> 4.7.0-rc2-nvmf-all.3+rxe+ #84 > >> Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 > >> Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] > >> task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 > >> RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] > >> nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > >> RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 > >> RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 > >> RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 > >> RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 > >> R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 > >> R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 > >> FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) > >> knlGS:0000000000000000 > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 > >> Stack: > >> ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 > >> ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 > >> ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 > >> Call Trace: > >> [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] > >> [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 > >> [nvmet_rdma] > > > > > > This looks more like a double-free/use-after-free condition... > > I'll try to reproduce this next week. > > Interesting. I'll also try to reproduce it. > Just keep doing this while a device is under fio load. On my setup, eth3 port1 of my T580, and is connected via a 40GbE switch and not point-to-point. This might matter or not, I don't know. And I'm doing the down/sleep/up on the host machine. If you don't hit it, try variying the sleep time to 5, 10, and 20 seconds. ifconfig eth3 down ; sleep 15; ifconfig eth3 up ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:06 ` Steve Wise 2016-06-16 21:42 ` Sagi Grimberg @ 2016-06-16 21:46 ` Steve Wise 2016-06-27 22:29 ` Ming Lin 2 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 21:46 UTC (permalink / raw) > hrm... > > Forcing more reconnects, I just hit this. It looks different from the other > issue: > > general protection fault: 0000 [#1] SMP > Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm > nvmet > null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ > ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q > garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad > iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca > dm_mirror > dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm > irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib > ib_core > ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core > acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) > mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] > CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E > 4.7.0-rc2-nvmf-all.3+rxe+ #84 > Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 > Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] > task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 > RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] > nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 > RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 > RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 > RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 > R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 > R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 > FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 > Stack: > ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 > ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 > ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 > Call Trace: > [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] > [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] > [<ffffffff8107cb5b>] process_one_work+0x17b/0x510 > [<ffffffff8161495c>] ? __schedule+0x23c/0x630 > [<ffffffff810c6c4c>] ? del_timer_sync+0x4c/0x60 > [<ffffffff8107da0b>] ? maybe_create_worker+0x8b/0x110 > [<ffffffff81614eb0>] ? schedule+0x40/0xb0 > [<ffffffff8107dbf6>] worker_thread+0x166/0x580 > [<ffffffff8161495c>] ? __schedule+0x23c/0x630 > [<ffffffff8108e162>] ? default_wake_function+0x12/0x20 > [<ffffffff8109fc26>] ? __wake_up_common+0x56/0x90 > [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 > [<ffffffff81614eb0>] ? schedule+0x40/0xb0 > [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 > [<ffffffff8108255c>] kthread+0xcc/0xf0 > [<ffffffff8108cade>] ? schedule_tail+0x1e/0xc0 > [<ffffffff816186cf>] ret_from_fork+0x1f/0x40 > [<ffffffff81082490>] ? kthread_freezable_should_stop+0x70/0x70 > Code: b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de 83 c3 01 49 81 c4 > 38 02 00 00 48 8b 86 28 02 00 00 48 8b 96 30 02 00 00 <48> 89 50 08 48 89 45 c0 > 48 89 02 48 89 be 30 02 00 00 4c 89 ff > RIP [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > RSP <ffff880fdf7a7bb8> > ---[ end trace a30265f72371b5ce ]--- > It crashed in list_del(). I assume rsp was garbage since it was a GPF. ---- static void nvmet_rdma_free_rsps(struct nvmet_rdma_queue *queue) { struct nvmet_rdma_device *ndev = queue->dev; int i, nr_rsps = queue->recv_queue_size * 2; for (i = 0; i < nr_rsps; i++) { struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; list_del(&rsp->free_list); <----- HERE nvmet_rdma_free_rsp(ndev, rsp); } kfree(queue->rsps); } --- Here is the assembler: static inline void __list_del(struct list_head * prev, struct list_head * next) { next->prev = prev; 15b7: 48 89 50 08 mov %rdx,0x8(%rax) And RAX/RDX are list poison values, I think: RAX: dead000000000100 RDX: dead000000000200 So rsp was already deleted off its list? Or has rsp been freed? ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 21:06 ` Steve Wise 2016-06-16 21:42 ` Sagi Grimberg 2016-06-16 21:46 ` Steve Wise @ 2016-06-27 22:29 ` Ming Lin 2016-06-28 9:14 ` 'Christoph Hellwig' 2 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-27 22:29 UTC (permalink / raw) On Thu, 2016-06-16@16:06 -0500, Steve Wise wrote: > > > Unfortunately I think it's still wrong because it will only delete > > > a single queue per controller. We'll probably need something > > > like this instead, which does the same think but also has a retry > > > loop for additional queues: > > > > > > > > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > > > index b1c6e5b..425b55c 100644 > > > --- a/drivers/nvme/target/rdma.c > > > +++ b/drivers/nvme/target/rdma.c > > > @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct > > > rdma_cm_id *cm_id, > > > > > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > > > { > > > - struct nvmet_rdma_queue *queue, *next; > > > - static LIST_HEAD(del_list); > > > + struct nvmet_rdma_queue *queue; > > > > > > +restart: > > > mutex_lock(&nvmet_rdma_queue_mutex); > > > - list_for_each_entry_safe(queue, next, > > > - &nvmet_rdma_queue_list, queue_list) { > > > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > > > - list_move_tail(&queue->queue_list, &del_list); > > > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > > > + if (queue->nvme_sq.ctrl == ctrl) { > > > + list_del_init(&queue->queue_list); > > > + mutex_unlock(&nvmet_rdma_queue_mutex); > > > + > > > + __nvmet_rdma_queue_disconnect(queue); > > > + goto restart; > > > + } > > > } > > > mutex_unlock(&nvmet_rdma_queue_mutex); > > > - > > > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > > > - nvmet_rdma_queue_disconnect(queue); > > > } > > > > > > static int nvmet_rdma_add_port(struct nvmet_port *port) > > > > > > > This patch works. > > > > Tested-by: Steve Wise <swise at opengridcomputing.com> > > > > > > hrm... > > Forcing more reconnects, I just hit this. It looks different from the other > issue: > > general protection fault: 0000 [#1] SMP > Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm nvmet > null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ > ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q > garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad > iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca dm_mirror > dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm > irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib ib_core > ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core > acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E) > mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4] > CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E > 4.7.0-rc2-nvmf-all.3+rxe+ #84 > Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015 > Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] > task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000 > RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>] > nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202 > RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001 > RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200 > RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140 > R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8 > R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0 > FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0 > Stack: > ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00 > ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05 > ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0 > Call Trace: > [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma] > [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] > [<ffffffff8107cb5b>] process_one_work+0x17b/0x510 > [<ffffffff8161495c>] ? __schedule+0x23c/0x630 > [<ffffffff810c6c4c>] ? del_timer_sync+0x4c/0x60 > [<ffffffff8107da0b>] ? maybe_create_worker+0x8b/0x110 > [<ffffffff81614eb0>] ? schedule+0x40/0xb0 > [<ffffffff8107dbf6>] worker_thread+0x166/0x580 > [<ffffffff8161495c>] ? __schedule+0x23c/0x630 > [<ffffffff8108e162>] ? default_wake_function+0x12/0x20 > [<ffffffff8109fc26>] ? __wake_up_common+0x56/0x90 > [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 > [<ffffffff81614eb0>] ? schedule+0x40/0xb0 > [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110 > [<ffffffff8108255c>] kthread+0xcc/0xf0 > [<ffffffff8108cade>] ? schedule_tail+0x1e/0xc0 > [<ffffffff816186cf>] ret_from_fork+0x1f/0x40 > [<ffffffff81082490>] ? kthread_freezable_should_stop+0x70/0x70 > Code: b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de 83 c3 01 49 81 c4 > 38 02 00 00 48 8b 86 28 02 00 00 48 8b 96 30 02 00 00 <48> 89 50 08 48 89 45 c0 > 48 89 02 48 89 be 30 02 00 00 4c 89 ff > RIP [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma] > RSP <ffff880fdf7a7bb8> > ---[ end trace a30265f72371b5ce ]--- Hi Steve, Now I can reproduce this with Chelsio card and below script: root at host:~# cat loop.sh #!/bin/bash ETH=eth3 while [ 1 ] ; do ifconfig $ETH down ; sleep $(( 10 + ($RANDOM & 0x7) )); ifconfig $ETH up ;sleep $(( 10 + ($RANDOM & 0x7) )) done And below patch fixed it. Could you also help to test it? The root cause is the "rsp" was already removed from the free_list by nvmet_rdma_get_rsp(). diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 425b55c..627942c 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -425,7 +425,15 @@ static void nvmet_rdma_free_rsps(struct nvmet_rdma_queue *queue) for (i = 0; i < nr_rsps; i++) { struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; - list_del(&rsp->free_list); + /* + * Don't call "list_del(&rsp->free_list)", because: + * It could be already removed from the free list by + * nvmet_rdma_get_rsp(), or it's on the queue::rsp_wait_list + * + * It's safe we just free it because at this point the queue + * was already disconnected so nvmet_rdma_get_rsp() won't be + * called any more. + */ nvmet_rdma_free_rsp(ndev, rsp); } kfree(queue->rsps); ^ permalink raw reply related [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-27 22:29 ` Ming Lin @ 2016-06-28 9:14 ` 'Christoph Hellwig' 2016-06-28 14:15 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: 'Christoph Hellwig' @ 2016-06-28 9:14 UTC (permalink / raw) On Mon, Jun 27, 2016@03:29:42PM -0700, Ming Lin wrote: > root at host:~# cat loop.sh > #!/bin/bash > > ETH=eth3 > > while [ 1 ] ; do > ifconfig $ETH down ; sleep $(( 10 + ($RANDOM & 0x7) )); ifconfig $ETH up ;sleep $(( 10 + ($RANDOM & 0x7) )) > done Can you send a patch for the nvmf-selftests branch to add this test? Thanks! > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index 425b55c..627942c 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -425,7 +425,15 @@ static void nvmet_rdma_free_rsps(struct nvmet_rdma_queue *queue) > for (i = 0; i < nr_rsps; i++) { > struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; > > - list_del(&rsp->free_list); > + /* > + * Don't call "list_del(&rsp->free_list)", because: > + * It could be already removed from the free list by > + * nvmet_rdma_get_rsp(), or it's on the queue::rsp_wait_list > + * > + * It's safe we just free it because at this point the queue > + * was already disconnected so nvmet_rdma_get_rsp() won't be > + * called any more. > + */ > nvmet_rdma_free_rsp(ndev, rsp); > } > kfree(queue->rsps); That seems like another symptom of not flushing unsignalled requests. At the time we call nvmet_rdma_free_rsps none of the rsp structures should be in use. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 9:14 ` 'Christoph Hellwig' @ 2016-06-28 14:15 ` Steve Wise 2016-06-28 15:51 ` 'Christoph Hellwig' 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-28 14:15 UTC (permalink / raw) > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > > index 425b55c..627942c 100644 > > --- a/drivers/nvme/target/rdma.c > > +++ b/drivers/nvme/target/rdma.c > > @@ -425,7 +425,15 @@ static void nvmet_rdma_free_rsps(struct > nvmet_rdma_queue *queue) > > for (i = 0; i < nr_rsps; i++) { > > struct nvmet_rdma_rsp *rsp = &queue->rsps[i]; > > > > - list_del(&rsp->free_list); > > + /* > > + * Don't call "list_del(&rsp->free_list)", because: > > + * It could be already removed from the free list by > > + * nvmet_rdma_get_rsp(), or it's on the queue::rsp_wait_list > > + * > > + * It's safe we just free it because at this point the queue > > + * was already disconnected so nvmet_rdma_get_rsp() won't be > > + * called any more. > > + */ > > nvmet_rdma_free_rsp(ndev, rsp); > > } > > kfree(queue->rsps); > > That seems like another symptom of not flushing unsignalled requests. I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ. It either posts chains via RDMA-RW and the last in the chain is always signaled (I think), or it posts signaled IO responses. > At the time we call nvmet_rdma_free_rsps none of the rsp structures > should be in use. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 14:15 ` Steve Wise @ 2016-06-28 15:51 ` 'Christoph Hellwig' 2016-06-28 16:31 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: 'Christoph Hellwig' @ 2016-06-28 15:51 UTC (permalink / raw) On Tue, Jun 28, 2016@09:15:22AM -0500, Steve Wise wrote: > I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ. > It either posts chains via RDMA-RW and the last in the chain is always > signaled (I think), or it posts signaled IO responses. Indeed. So we need to figure out where we don't release a rsp. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 15:51 ` 'Christoph Hellwig' @ 2016-06-28 16:31 ` Steve Wise 2016-06-28 16:49 ` Ming Lin 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-28 16:31 UTC (permalink / raw) > On Tue, Jun 28, 2016@09:15:22AM -0500, Steve Wise wrote: > > I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ. > > It either posts chains via RDMA-RW and the last in the chain is always > > signaled (I think), or it posts signaled IO responses. > > Indeed. So we need to figure out where we don't release a rsp. > Hey Ming, For what its worth, the change you proposed in this thread isn't working for me. I see maybe one or two recoveries successful, then the target gets stuck. I see several workq threads stuck destroying various qps, one thread stuck draining a qp. If this change is not the proper fix, then I'm not going to debug this further. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 16:31 ` Steve Wise @ 2016-06-28 16:49 ` Ming Lin 2016-06-28 19:20 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-28 16:49 UTC (permalink / raw) On Tue, 2016-06-28@11:31 -0500, Steve Wise wrote: > > On Tue, Jun 28, 2016@09:15:22AM -0500, Steve Wise wrote: > > > I'm not so sure. I don't see where nvmet leaves unsignaled wrs on the SQ. > > > It either posts chains via RDMA-RW and the last in the chain is always > > > signaled (I think), or it posts signaled IO responses. > > > > Indeed. So we need to figure out where we don't release a rsp. > > > > Hey Ming, > > For what its worth, the change you proposed in this thread isn't working for me. > I see maybe one or two recoveries successful, then the target gets stuck. I see > several workq threads stuck destroying various qps, one thread stuck draining a > qp. If this change is not the proper fix, then I'm not going to debug this > further. I didn't see this during overnight test. Possibly another bug. Could you post the stuck call stack? I assume you are still doing below tests on host: run fio test Then, while [ 1 ] ; do ifconfig $ETH down ; sleep $(( 10 + ($RANDOM & 0x7) )); ifconfig $ETH up ;sleep $(( 10 + ($RANDOM & 0x7) )) done ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 16:49 ` Ming Lin @ 2016-06-28 19:20 ` Steve Wise 2016-06-28 19:43 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-28 19:20 UTC (permalink / raw) > > Hey Ming, > > > > For what its worth, the change you proposed in this thread isn't working for me. > > I see maybe one or two recoveries successful, then the target gets stuck. I see > > several workq threads stuck destroying various qps, one thread stuck draining a > > qp. If this change is not the proper fix, then I'm not going to debug this > > further. > > I didn't see this during overnight test. Possibly another bug. > Could you post the stuck call stack? > below... > I assume you are still doing below tests on host: > yes. It could be another bug, but it happens so quickly I assumed it was introduced by your change. Maybe not... Here are the (several) stuck threads. They're stuck destroying the iwarp qps, which usually means the lower level connections are stuck somehow. This is a familiar stall point when things go bad on the cxgb4 connections below the QP. But the link Is up and running, so it seems like a deadlock somewhere. I'm using a ram disk for the target. Perhaps before I was using a real nvme device. I'll try that too and see if I still hit this deadlock/stall... INFO: task kworker/3:0:28 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:0 D ffff88102fc73988 0 28 2 0x10000000 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff88102fc73988 ffff88102fc54340 ffff88102f900340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff881000000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:1:124 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:1 D ffff88102ea93988 0 124 2 0x10000000 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff88102ea93988 ffff88102f900340 ffff88101d8fa340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff881000000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810b640c>] ? wake_up_klogd+0x3c/0x40 [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/u32:4:475 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u32:4 D ffff8810286876b8 0 475 2 0x10000000 Workqueue: iw_cm_wq cm_work_handler [iw_cm] ffff8810286876b8 ffff881028fd2340 ffff8810300be340 ffff881028a75d90 ffff88102fc03e00 ffff881000000000 0000000000000046 ffffffff00000000 0000000000000000 ffff881000000000 ffffffff81119f54 ffff881000000001 Call Trace: [<ffffffff81119f54>] ? ring_buffer_unlock_commit+0x24/0xb0 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffff81131826>] ? trace_event_buffer_commit+0x146/0x1d0 [<ffffffff8162753f>] schedule_timeout+0x16f/0x1f0 [<ffffffff81091eb6>] ? trace_event_raw_event_sched_wakeup_template+0x96/0xe0 [<ffffffff81624c93>] wait_for_completion+0xa3/0x120 [<ffffffff81090130>] ? try_to_wake_up+0x230/0x230 [<ffffffffa02b4670>] c4iw_drain_sq+0x50/0x60 [iw_cxgb4] [<ffffffffa05f503a>] ib_drain_sq+0x1a/0x30 [ib_core] [<ffffffffa05f5066>] ib_drain_qp+0x16/0x40 [ib_core] [<ffffffffa02511aa>] __nvmet_rdma_queue_disconnect+0x5a/0xc0 [nvmet_rdma] [<ffffffffa0252651>] nvmet_rdma_cm_handler+0xe1/0x1d0 [nvmet_rdma] [<ffffffff8162620e>] ? mutex_lock+0x1e/0x40 [<ffffffffa023e27e>] ? cma_disable_callback+0x2e/0x60 [rdma_cm] [<ffffffffa0242d12>] cma_iw_handler+0xf2/0x1b0 [rdma_cm] [<ffffffffa02301b3>] cm_close_handler+0x93/0xc0 [iw_cm] [<ffffffffa0232177>] process_event+0xd7/0xf0 [iw_cm] [<ffffffffa02322de>] cm_work_handler+0x14e/0x1d0 [iw_cm] [<ffffffff81119e9d>] ? rb_commit+0x10d/0x1a0 [<ffffffff8107f1e6>] ? trace_event_raw_event_workqueue_execute_start+0x66/0xa0 [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff812eac7f>] ? __delay+0xf/0x20 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:2:9626 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:2 D ffff881006fe7988 0 9626 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff881006fe7988 ffff881028f34340 ffff88102e6bc340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff881000000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810b640c>] ? wake_up_klogd+0x3c/0x40 [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81119f54>] ? ring_buffer_unlock_commit+0x24/0xb0 [<ffffffff81120391>] ? trace_buffer_unlock_commit_regs+0x61/0x80 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:3:9853 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:3 D ffff880f780fb988 0 9853 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880f780fb988 ffff8810118b0340 ffff8810284a6340 ffff881028a75d90 ffff881076c01280 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff810a23e3>] ? __wake_up+0x53/0x70 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:4:9854 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:4 D ffff880ff1997988 0 9854 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880ff1997988 ffff88102561b340 ffff88102e6bc340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:5:9855 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:5 D ffff880f792d3988 0 9855 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880f792d3988 ffff88102e6bc340 ffff88102a009340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:6:9856 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:6 D ffff880f79363988 0 9856 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880f79363988 ffff88101d8fa340 ffff88101dbba340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810b640c>] ? wake_up_klogd+0x3c/0x40 [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:7:9857 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:7 D ffff880f780ff988 0 9857 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880f780ff988 ffff8810284a6340 ffff88101dbba340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810b640c>] ? wake_up_klogd+0x3c/0x40 [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff812d8f94>] ? cfq_completed_request+0x4c4/0x7e0 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 INFO: task kworker/3:8:9858 blocked for more than 120 seconds. Tainted: G E 4.7.0-rc2-nvmf-all.3-debug+ #97 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/3:8 D ffff880f78877988 0 9858 2 0x10000080 Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma] ffff880f78877988 ffff88102a009340 ffff880ff18cd340 ffff881028a75d90 ffffffff811b45c9 0000000000000000 ffff88101afd4b88 ffff880f00000000 0000000000000292 ffff880f00000000 0000000000000002 ffff881000000001 Call Trace: [<ffffffff811b45c9>] ? cache_free_debugcheck+0x1b9/0x290 [<ffffffff81623c90>] schedule+0x40/0xb0 [<ffffffffa02b7aa8>] c4iw_destroy_qp+0x148/0x290 [iw_cxgb4] [<ffffffffa02b93bb>] ? c4iw_pblpool_free+0x7b/0xb0 [iw_cxgb4] [<ffffffffa02b10f6>] ? c4iw_dereg_mr+0x106/0x1d0 [iw_cxgb4] [<ffffffff810a1f20>] ? woken_wake_function+0x20/0x20 [<ffffffffa05feacb>] ? ib_mr_pool_destroy+0x9b/0xb0 [ib_core] [<ffffffff811798bd>] ? pcpu_free_area+0x15d/0x1d0 [<ffffffffa05f49d0>] ib_destroy_qp+0x120/0x1a0 [ib_core] [<ffffffffa0241021>] rdma_destroy_qp+0x31/0x50 [rdma_cm] [<ffffffffa0251926>] nvmet_rdma_free_queue+0x36/0xa0 [nvmet_rdma] [<ffffffffa02519c3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma] [<ffffffff8107df1b>] process_one_work+0x18b/0x520 [<ffffffff810ca283>] ? try_to_del_timer_sync+0x53/0x70 [<ffffffff8162806f>] ? _raw_spin_lock_irq+0xf/0x20 [<ffffffff8107f550>] ? maybe_create_worker+0x90/0x110 [<ffffffff8162504b>] ? __mutex_unlock_slowpath+0xbb/0x160 [<ffffffff8107f736>] worker_thread+0x166/0x580 [<ffffffff8162372f>] ? __schedule+0x23f/0x640 [<ffffffff81090142>] ? default_wake_function+0x12/0x20 [<ffffffff810a1de5>] ? __wake_up_common+0x55/0x80 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff81623c90>] ? schedule+0x40/0xb0 [<ffffffff8107f5d0>] ? maybe_create_worker+0x110/0x110 [<ffffffff810843de>] kthread+0xde/0x100 [<ffffffff8108e59e>] ? schedule_tail+0x1e/0xc0 [<ffffffff8162838f>] ret_from_fork+0x1f/0x40 [<ffffffff81084300>] ? __init_kthread_worker+0x40/0x40 ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 19:20 ` Steve Wise @ 2016-06-28 19:43 ` Steve Wise 2016-06-28 21:04 ` Ming Lin 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-28 19:43 UTC (permalink / raw) > I'm using a ram disk for the target. Perhaps before > I was using a real nvme device. I'll try that too and see if I still hit this > deadlock/stall... > Hey Ming, Seems using a real nvme device at the target vs a ram device, avoids this new deadlock issue. And I'm running so-far w/o the usual touch-after-free crash. Usually I hit it quickly. It looks like your patch did indeed fix that. So: 1) We need to address Christoph's concern that your fix isn't the ideal/correct solution. How do you want to proceed on that angle? How can I help? 2) the deadlock below is probably some other issue. Looks more like a cxgb4 problem at first glance. I'll look into this one... Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 19:43 ` Steve Wise @ 2016-06-28 21:04 ` Ming Lin 2016-06-29 14:11 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-28 21:04 UTC (permalink / raw) On Tue, 2016-06-28@14:43 -0500, Steve Wise wrote: > > I'm using a ram disk for the target. Perhaps before > > I was using a real nvme device. I'll try that too and see if I still hit this > > deadlock/stall... > > > > Hey Ming, > > Seems using a real nvme device at the target vs a ram device, avoids this new > deadlock issue. And I'm running so-far w/o the usual touch-after-free crash. > Usually I hit it quickly. It looks like your patch did indeed fix that. So: > > 1) We need to address Christoph's concern that your fix isn't the ideal/correct > solution. How do you want to proceed on that angle? How can I help? This one should be more correct. Actually, the rsp was leaked when queue->state is NVMET_RDMA_Q_DISCONNECTING. So we should put it back. It works for me. Could you help to verify? diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 425b55c..ee8b85e 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -727,6 +727,8 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc) spin_lock_irqsave(&queue->state_lock, flags); if (queue->state == NVMET_RDMA_Q_CONNECTING) list_add_tail(&rsp->wait_list, &queue->rsp_wait_list); + else + nvmet_rdma_put_rsp(rsp); spin_unlock_irqrestore(&queue->state_lock, flags); return; } > > 2) the deadlock below is probably some other issue. Looks more like a cxgb4 > problem at first glance. I'll look into this one... > > Steve. > > ^ permalink raw reply related [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 21:04 ` Ming Lin @ 2016-06-29 14:11 ` Steve Wise 0 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-29 14:11 UTC (permalink / raw) > This one should be more correct. > Actually, the rsp was leaked when queue->state is > NVMET_RDMA_Q_DISCONNECTING. So we should put it back. > > It works for me. Could you help to verify? works! > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index 425b55c..ee8b85e 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -727,6 +727,8 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct > ib_wc *wc) > spin_lock_irqsave(&queue->state_lock, flags); > if (queue->state == NVMET_RDMA_Q_CONNECTING) > list_add_tail(&rsp->wait_list, &queue->rsp_wait_list); > + else > + nvmet_rdma_put_rsp(rsp); > spin_unlock_irqrestore(&queue->state_lock, flags); > return; > } > ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:34 ` 'Christoph Hellwig' 2016-06-16 20:49 ` Steve Wise @ 2016-06-27 17:26 ` Ming Lin 1 sibling, 0 replies; 50+ messages in thread From: Ming Lin @ 2016-06-27 17:26 UTC (permalink / raw) On Thu, 2016-06-16@22:34 +0200, 'Christoph Hellwig' wrote: > On Thu, Jun 16, 2016@03:28:06PM -0500, Steve Wise wrote: > > > Just to follow, does Christoph's patch fix the crash? > > > > It does. > > Unfortunately I think it's still wrong because it will only delete > a single queue per controller. We'll probably need something > like this instead, which does the same think but also has a retry > loop for additional queues: > > > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c > index b1c6e5b..425b55c 100644 > --- a/drivers/nvme/target/rdma.c > +++ b/drivers/nvme/target/rdma.c > @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, > > static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl) > { > - struct nvmet_rdma_queue *queue, *next; > - static LIST_HEAD(del_list); > + struct nvmet_rdma_queue *queue; > > +restart: > mutex_lock(&nvmet_rdma_queue_mutex); > - list_for_each_entry_safe(queue, next, > - &nvmet_rdma_queue_list, queue_list) { > - if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid) > - list_move_tail(&queue->queue_list, &del_list); > + list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { > + if (queue->nvme_sq.ctrl == ctrl) { > + list_del_init(&queue->queue_list); > + mutex_unlock(&nvmet_rdma_queue_mutex); > + > + __nvmet_rdma_queue_disconnect(queue); > + goto restart; > + } > } > mutex_unlock(&nvmet_rdma_queue_mutex); > - > - list_for_each_entry_safe(queue, next, &del_list, queue_list) > - nvmet_rdma_queue_disconnect(queue); > } > > static int nvmet_rdma_add_port(struct nvmet_port *port) Run below test over weekend on host side(nvmf-all.3), #!/bin/bash while [ 1 ] ; do ifconfig eth5 down ; sleep $(( 10 + ($RANDOM & 0x7) )); ifconfig eth5 up ;sleep $(( 10 + ($RANDOM & 0x7) )) done Then target side hit below crash: [122730.252874] nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:53ea06bc-e1d0-4d59-a6e9-138684f3662b. [122730.281665] nvmet: adding queue 1 to ctrl 1. [122730.287133] nvmet: adding queue 2 to ctrl 1. [122730.292672] nvmet: adding queue 3 to ctrl 1. [122730.298197] nvmet: adding queue 4 to ctrl 1. [122730.303742] nvmet: adding queue 5 to ctrl 1. [122730.309375] nvmet: adding queue 6 to ctrl 1. [122730.315015] nvmet: adding queue 7 to ctrl 1. [122730.320688] nvmet: adding queue 8 to ctrl 1. [122732.014747] mlx4_en: eth4: Link Down [122745.298422] nvmet: ctrl 1 keep-alive timer (15 seconds) expired! [122745.305601] BUG: unable to handle kernel paging request at 0000010173180018 [122745.313755] IP: [<ffffffffc08bb7fa>] nvmet_rdma_delete_ctrl+0x4a/0xa0 [nvmet_rdma] [122745.322513] PGD 0 [122745.325667] Oops: 0000 [#1] PREEMPT SMP [122745.462435] CPU: 0 PID: 4849 Comm: kworker/0:3 Tainted: G OE 4.7.0-rc2+ #256 [122745.472376] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013 [122745.481433] Workqueue: events nvmet_keep_alive_timer [nvmet] [122745.488909] task: ffff880035346a00 ti: ffff8800d1078000 task.ti: ffff8800d1078000 [122745.498246] RIP: 0010:[<ffffffffc08bb7fa>] [<ffffffffc08bb7fa>] nvmet_rdma_delete_ctrl+0x4a/0xa0 [nvmet_rdma] [122745.510170] RSP: 0018:ffff8800d107bdf0 EFLAGS: 00010207 [122745.517384] RAX: 0000010173180100 RBX: 000001017317ffe0 RCX: 0000000000000000 [122745.526464] RDX: 0000010173180100 RSI: ffff88012020dc28 RDI: ffffffffc08bf080 [122745.535566] RBP: ffff8800d107be00 R08: 0000000000000000 R09: ffff8800c7bd7bc0 [122745.544715] R10: 000000000000f000 R11: 0000000000015c68 R12: ffff8800b85c3400 [122745.553873] R13: ffff88012021ac00 R14: 0000000000000000 R15: ffff880120216300 [122745.563025] FS: 0000000000000000(0000) GS:ffff880120200000(0000) knlGS:0000000000000000 [122745.573152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [122745.580923] CR2: 0000010173180018 CR3: 0000000001c06000 CR4: 00000000001406f0 [122745.590125] Stack: [122745.594187] ffff8800b85c34c8 ffff8800b85c34c8 ffff8800d107be18 ffffffffc090d0be [122745.603771] ffff8800d7cfaf00 ffff8800d107be60 ffffffff81083019 ffff8800d7cfaf30 [122745.613374] 0000000035346a00 ffff880120216320 ffff8800d7cfaf30 ffff880035346a00 [122745.622992] Call Trace: [122745.627593] [<ffffffffc090d0be>] nvmet_keep_alive_timer+0x2e/0x40 [nvmet] [122745.636685] [<ffffffff81083019>] process_one_work+0x159/0x370 [122745.644745] [<ffffffff81083356>] worker_thread+0x126/0x490 [122745.652545] [<ffffffff816f17fe>] ? __schedule+0x1de/0x590 [122745.660217] [<ffffffff81083230>] ? process_one_work+0x370/0x370 [122745.668387] [<ffffffff81088864>] kthread+0xc4/0xe0 [122745.675437] [<ffffffff816f571f>] ret_from_fork+0x1f/0x40 [122745.683025] [<ffffffff810887a0>] ? kthread_create_on_node+0x170/0x170 (gdb) list *nvmet_rdma_delete_ctrl+0x4a 0x82a is in nvmet_rdma_delete_ctrl (/home/mlin/linux-nvmeof/drivers/nvme/target/rdma.c:1301). 1296 struct nvmet_rdma_queue *queue; 1297 1298 restart: 1299 mutex_lock(&nvmet_rdma_queue_mutex); 1300 list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { 1301 if (queue->nvme_sq.ctrl == ctrl) { 1302 list_del_init(&queue->queue_list); 1303 mutex_unlock(&nvmet_rdma_queue_mutex); 1304 1305 __nvmet_rdma_queue_disconnect(queue); ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 20:07 ` Sagi Grimberg 2016-06-16 20:12 ` Steve Wise @ 2016-06-16 20:35 ` Steve Wise 1 sibling, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 20:35 UTC (permalink / raw) > So this is related to I/O load then? Does it happen when > you just do it without any I/O? (or small load)? I just tried, and it happens also w/o any traffic over the device on my setup. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 19:55 ` Sagi Grimberg 2016-06-16 19:59 ` Steve Wise @ 2016-06-16 20:01 ` Steve Wise 2016-06-17 14:05 ` Steve Wise [not found] ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com> 3 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-16 20:01 UTC (permalink / raw) > Umm, I think this might be happening because we get to delete_ctrl when > one of our queues has a NULL ctrl. This means that either: > 1. we never got a chance to initialize it, or > 2. we already freed it. > > (1) doesn't seem possible as we have a very short window (that we're > better off eliminating) between when we start the keep-alive timer (in > alloc_ctrl) and the time we assign the sq->ctrl (install_queue). > > (2) doesn't seem likely either to me at least as from what I followed, > delete_ctrl should be mutual exclusive with other deletions, moreover, > I didn't see an indication in the logs that any other deletions are > happening. > > Steve, is this something that started happening recently? does the > 4.6-rc3 tag suffer from the same phenomenon? In case it matters, here is my target config file. I'm only connecting to 1 device (test-hynix0). But each device is exported over 2 ports of a Chelsio iWARP T580, and 1 port of a MLX IB mlx4 QDR adapter. [root at stevo2 ~]# cat /etc/nvmet.json { "hosts": [], "ports": [ { "addr": { "adrfam": "ipv4", "traddr": "10.0.1.14", "treq": "not specified", "trsvcid": "4420", "trtype": "rdma" }, "portid": 1, "referrals": [], "subsystems": [ "test-ram0", "test-ram1", "test-ram2", "test-ram3", "test-hynix0", "test-nullb0", "test-nullb1" ] }, { "addr": { "adrfam": "ipv4", "traddr": "10.0.2.14", "treq": "not specified", "trsvcid": "4420", "trtype": "rdma" }, "portid": 2, "referrals": [], "subsystems": [ "test-ram0", "test-ram1", "test-ram2", "test-ram3", "test-hynix0", "test-nullb0", "test-nullb1" ] }, { "addr": { "adrfam": "ipv4", "traddr": "10.0.7.14", "treq": "not specified", "trsvcid": "4420", "trtype": "rdma" }, "portid": 7, "referrals": [], "subsystems": [ "test-ram0", "test-ram1", "test-ram2", "test-ram3", "test-hynix0", "test-nullb0", "test-nullb1" ] } ], "subsystems": [ { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "00000000-0000-0000-0000-123400000000", "path": "/dev/ram0" }, "enable": 1, "nsid": 1 } ], "nqn": "test-ram0" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "00000000-0000-0000-0000-567800000000", "path": "/dev/ram1" }, "enable": 1, "nsid": 1 } ], "nqn": "test-ram1" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "00000000-0000-0000-0000-9abc00000000", "path": "/dev/ram2" }, "enable": 1, "nsid": 1 } ], "nqn": "test-ram2" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "00000000-0000-0000-0000-def100000000", "path": "/dev/ram3" }, "enable": 1, "nsid": 1 } ], "nqn": "test-ram3" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "5aa133f4-7c41-f141-b469-901e58461e7b", "path": "/dev/nvme0n1" }, "enable": 1, "nsid": 1 } ], "nqn": "test-hynix0" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "569232db-0f5a-8d4f-a077-7935117e2f3f", "path": "/dev/nullb0" }, "enable": 1, "nsid": 1 } ], "nqn": "test-nullb0" }, { "allowed_hosts": [], "attr": { "allow_any_host": "1" }, "namespaces": [ { "device": { "nguid": "9eb7de4b-3ee9-4641-a6e3-e2569414e5a0", "path": "/dev/nullb1" }, "enable": 1, "nsid": 1 } ], "nqn": "test-nullb1" } ] } ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-16 19:55 ` Sagi Grimberg 2016-06-16 19:59 ` Steve Wise 2016-06-16 20:01 ` Steve Wise @ 2016-06-17 14:05 ` Steve Wise [not found] ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com> 3 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-17 14:05 UTC (permalink / raw) > > Steve, is this something that started happening recently? does the > 4.6-rc3 tag suffer from the same phenomenon? Where is this tag? ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com>]
* target crash / host hang with nvme-all.3 branch of nvme-fabrics [not found] ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com> @ 2016-06-17 14:16 ` Steve Wise 2016-06-17 17:20 ` Ming Lin 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-17 14:16 UTC (permalink / raw) > > > > > Steve, is this something that started happening recently? does the > > 4.6-rc3 tag suffer from the same phenomenon? > > Where is this tag? Never mind. I found it (needed 'git pull -t' for pull the tags from the gitlab nvmef repo). ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-17 14:16 ` Steve Wise @ 2016-06-17 17:20 ` Ming Lin 2016-06-19 11:57 ` Sagi Grimberg 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-17 17:20 UTC (permalink / raw) On Fri, Jun 17, 2016@7:16 AM, Steve Wise <swise@opengridcomputing.com> wrote: >> >> > >> > Steve, is this something that started happening recently? does the >> > 4.6-rc3 tag suffer from the same phenomenon? >> >> Where is this tag? > > Never mind. I found it (needed 'git pull -t' for pull the tags from the gitlab > nvmef repo). I run this overnight, #!/bin/bash while [ 1 ] ; do ifconfig eth5 down ; sleep 15; ifconfig eth5 up; sleep 15 done Although the crash is not reproduced, but it triggers another OOM bug. [38335.167460] nvmet: ctrl 241 keep-alive timer (15 seconds) expired! [38365.954235] nvmet: ctrl 243 keep-alive timer (15 seconds) expired! [38392.132613] nvmet: ctrl 241 keep-alive timer (15 seconds) expired! [38423.943463] nvmet: ctrl 243 keep-alive timer (15 seconds) expired! [38455.178284] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38486.093079] nvmet: ctrl 243 keep-alive timer (15 seconds) expired! [38517.231876] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38543.122210] nvmet: ctrl 243 keep-alive timer (15 seconds) expired! [38574.933079] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38605.975884] nvmet: ctrl 245 keep-alive timer (15 seconds) expired! [38636.954661] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38668.189494] nvmet: ctrl 245 keep-alive timer (15 seconds) expired! [38699.072284] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38725.154623] nvmet: ctrl 245 keep-alive timer (15 seconds) expired! [38756.901490] nvmet: ctrl 244 keep-alive timer (15 seconds) expired! [38762.153397] 96 and 0 pages still available in the bound and unbound GPU page lists. [38762.161985] Out of memory: Kill process 1270 (java) score 2 or sacrifice child [38762.169829] Killed process 1270 (java) total-vm:2829756kB, anon-rss:0kB, file-rss:2696kB, shmem-rss:0kB [38770.880447] 96 and 0 pages still available in the bound and unbound GPU page lists. [38770.888688] Out of memory: Kill process 1409 (libvirtd) score 0 or sacrifice child [38770.896618] Killed process 1409 (libvirtd) total-vm:372592kB, anon-rss:0kB, file-rss:1664kB, shmem-rss:0kB [38770.912433] 96 and 0 pages still available in the bound and unbound GPU page lists. [38770.920867] 96 and 0 pages still available in the bound and unbound GPU page lists. [38770.929214] 96 and 0 pages still available in the bound and unbound GPU page lists. [38770.937788] 96 and 0 pages still available in the bound and unbound GPU page lists. [38774.191627] 96 and 0 pages still available in the bound and unbound GPU page lists. [38774.199950] Out of memory: Kill process 11265 (java) score 0 or sacrifice child [38774.207787] Killed process 11265 (java) total-vm:35600kB, anon-rss:48kB, file-rss:4784kB, shmem-rss:0kB [38788.104282] nvmet: ctrl 246 keep-alive timer (15 seconds) expired! [38803.999289] nvmet: invalid command 0x0 on unconnected queue. [38806.514069] 96 and 0 pages still available in the bound and unbound GPU page lists. [38947.267367] Out of memory: Kill process 1153 (iscsid) score 0 or sacrifice child [38947.274738] Killed process 1153 (iscsid) total-vm:5796kB, anon-rss:1124kB, file-rss:2464kB, shmem-rss:16kB [38948.804421] 96 and 0 pages still available in the bound and unbound GPU page lists. [38948.812215] Kernel panic - not syncing: Out of memory and no killable processes... [38948.812215] [38948.821215] CPU: 1 PID: 1 Comm: init Tainted: G OE 4.7.0-rc2+ #256 [38948.828400] Hardware name: Dell Inc. OptiPlex 7010/0773VG, BIOS A12 01/10/2013 [38948.835585] 0000000000000000 ffff88011827faa8 ffffffff8133104a ffffffff81a1b758 [38948.843001] ffffffff81c0d540 ffff88011827fb20 ffffffff811218fc ffff880000000008 [38948.850413] ffff88011827fb30 ffff88011827fad0 000000000000000a ffff8800d6870d40 [38948.857824] Call Trace: [38948.860260] [<ffffffff8133104a>] dump_stack+0x63/0x89 [38948.865373] [<ffffffff811218fc>] panic+0xcd/0x21e [38948.870140] [<ffffffff81128249>] out_of_memory+0x449/0x470 [38948.875683] [<ffffffff8112c7de>] __alloc_pages_nodemask+0xa4e/0xc20 [38948.882004] [<ffffffff8109adc0>] ? put_prev_entity+0x30/0x6e0 [38948.887807] [<ffffffff81172830>] alloc_pages_vma+0xb0/0x210 [38948.893438] [<ffffffff81163514>] __read_swap_cache_async+0x134/0x1e0 [38948.899846] [<ffffffff811635d2>] read_swap_cache_async+0x12/0x30 [38948.905908] [<ffffffff81163739>] swapin_readahead+0x149/0x1b0 [38948.911710] [<ffffffff81123600>] ? find_get_entry+0x60/0xa0 [38948.917339] [<ffffffff81124767>] ? pagecache_get_page+0x27/0x250 [38948.923401] [<ffffffff8115092b>] handle_mm_fault+0x14fb/0x1cf0 [38948.929291] [<ffffffff81076fa6>] ? recalc_sigpending+0x16/0x50 [38948.935181] [<ffffffff81059bc3>] __do_page_fault+0x1a3/0x4e0 [38948.940898] [<ffffffff81059f22>] do_page_fault+0x22/0x30 [38948.946269] [<ffffffff816f6e18>] page_fault+0x28/0x30 [38948.951383] Kernel Offset: disabled [38948.960747] ---[ end Kernel panic - not syncing: Out of memory and no killable processes... [38948.960747] ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-17 17:20 ` Ming Lin @ 2016-06-19 11:57 ` Sagi Grimberg 2016-06-21 14:18 ` Steve Wise [not found] ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com> 0 siblings, 2 replies; 50+ messages in thread From: Sagi Grimberg @ 2016-06-19 11:57 UTC (permalink / raw) On 17/06/16 20:20, Ming Lin wrote: > On Fri, Jun 17, 2016@7:16 AM, Steve Wise <swise@opengridcomputing.com> wrote: >>> >>>> >>>> Steve, is this something that started happening recently? does the >>>> 4.6-rc3 tag suffer from the same phenomenon? >>> >>> Where is this tag? >> >> Never mind. I found it (needed 'git pull -t' for pull the tags from the gitlab >> nvmef repo). > > I run this overnight, > > #!/bin/bash > > while [ 1 ] ; do > ifconfig eth5 down ; sleep 15; ifconfig eth5 up; sleep 15 > done > > Although the crash is not reproduced, but it triggers another OOM bug. Which code-base is this? it looks like this code is just leaking queues. Obviously something changed... ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-19 11:57 ` Sagi Grimberg @ 2016-06-21 14:18 ` Steve Wise 2016-06-21 17:33 ` Ming Lin [not found] ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com> 1 sibling, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-21 14:18 UTC (permalink / raw) > On 17/06/16 20:20, Ming Lin wrote: > > On Fri, Jun 17, 2016 at 7:16 AM, Steve Wise <swise at opengridcomputing.com> > wrote: > >>> > >>>> > >>>> Steve, is this something that started happening recently? does the > >>>> 4.6-rc3 tag suffer from the same phenomenon? > >>> > >>> Where is this tag? > >> > >> Never mind. I found it (needed 'git pull -t' for pull the tags from the gitlab > >> nvmef repo). > > > > I run this overnight, > > > > #!/bin/bash > > > > while [ 1 ] ; do > > ifconfig eth5 down ; sleep 15; ifconfig eth5 up; sleep 15 > > done > > > > Although the crash is not reproduced, but it triggers another OOM bug. > > Which code-base is this? it looks like this code is just leaking queues. > Obviously something changed... > Yoichi has hit what is apparently the same OOM bug: Kernel panic - not syncing: Out of memory and no killable processes... He hit it with NVMF on the host, _and_ with NVME/PCI with a local SSD. This is all with using nvmf-all.3 + Christoph's fix for the target queue deletion crash (http://lists.infradead.org/pipermail/linux-nvme/2016-June/005075.html). Ming, any luck on isolating this? I'm going to enable kernel memory leak detection and see if I can figure this out. Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-21 14:18 ` Steve Wise @ 2016-06-21 17:33 ` Ming Lin 2016-06-21 17:59 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Ming Lin @ 2016-06-21 17:33 UTC (permalink / raw) On Tue, Jun 21, 2016@7:18 AM, Steve Wise <swise@opengridcomputing.com> wrote: > > Ming, any luck on isolating this? I'm going to enable kernel memory leak > detection and see if I can figure this out. Hi Steve, I'm off most days this week. Will continue debugging next week if it's still not fixed. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-21 17:33 ` Ming Lin @ 2016-06-21 17:59 ` Steve Wise 0 siblings, 0 replies; 50+ messages in thread From: Steve Wise @ 2016-06-21 17:59 UTC (permalink / raw) > On Tue, Jun 21, 2016 at 7:18 AM, Steve Wise <swise at opengridcomputing.com> > wrote: > > > > Ming, any luck on isolating this? I'm going to enable kernel memory leak > > detection and see if I can figure this out. > > Hi Steve, > > I'm off most days this week. > Will continue debugging next week if it's still not fixed. I"ll take that as a challenge to figure it out before you get back! :) Happy vacation! Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com>]
* target crash / host hang with nvme-all.3 branch of nvme-fabrics [not found] ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com> @ 2016-06-22 13:42 ` Steve Wise 2016-06-27 14:19 ` Steve Wise 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-22 13:42 UTC (permalink / raw) > > Which code-base is this? it looks like this code is just leaking queues. > > Obviously something changed... > > > > Yoichi has hit what is apparently the same OOM bug: > > Kernel panic - not syncing: Out of memory and no killable processes... > > He hit it with NVMF on the host, _and_ with NVME/PCI with a local SSD. > This is all with using nvmf-all.3 + Christoph's fix for the target queue > deletion crash > (http://lists.infradead.org/pipermail/linux-nvme/2016-June/005075.html). > Yoichi can reproduce this with linux-4.7-rc4, so the issue isn't related to the nvme/nvmf changes in nvmf-all.3. Also, I have been unable to reproduce it so far on my setup. But Yoichi can reproduce it readily. Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-22 13:42 ` Steve Wise @ 2016-06-27 14:19 ` Steve Wise 2016-06-28 8:50 ` 'Christoph Hellwig' 0 siblings, 1 reply; 50+ messages in thread From: Steve Wise @ 2016-06-27 14:19 UTC (permalink / raw) > > Yoichi has hit what is apparently the same OOM bug: > > > > Kernel panic - not syncing: Out of memory and no killable processes... > > > > He hit it with NVMF on the host, _and_ with NVME/PCI with a local SSD. > > This is all with using nvmf-all.3 + Christoph's fix for the target queue > > deletion crash > > (http://lists.infradead.org/pipermail/linux-nvme/2016-June/005075.html). > > > > Yoichi can reproduce this with linux-4.7-rc4, so the issue isn't related to the > nvme/nvmf changes in nvmf-all.3. Also, I have been unable to reproduce it so > far on my setup. But Yoichi can reproduce it readily. > Hey Christoph, It appears this OOM issue is resolved in linux-4.7-rc5. Does it make sense to publish a rebased nvmf-all? Thanks, Steve. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-27 14:19 ` Steve Wise @ 2016-06-28 8:50 ` 'Christoph Hellwig' 2016-07-04 9:57 ` Yoichi Hayakawa 0 siblings, 1 reply; 50+ messages in thread From: 'Christoph Hellwig' @ 2016-06-28 8:50 UTC (permalink / raw) On Mon, Jun 27, 2016@09:19:26AM -0500, Steve Wise wrote: > It appears this OOM issue is resolved in linux-4.7-rc5. Does it make sense to > publish a rebased nvmf-all? I'd rather get everything into Jens' tree... Either way - I suspect the fix you need is this one: http://git.kernel.dk/cgit/linux-block/commit/block/blk-lib.c?h=for-linus&id=05bd92dddc595d74ea645e793c1f3bd4b1fc251a but that won't be in the for-next tree either. ^ permalink raw reply [flat|nested] 50+ messages in thread
* target crash / host hang with nvme-all.3 branch of nvme-fabrics 2016-06-28 8:50 ` 'Christoph Hellwig' @ 2016-07-04 9:57 ` Yoichi Hayakawa 0 siblings, 0 replies; 50+ messages in thread From: Yoichi Hayakawa @ 2016-07-04 9:57 UTC (permalink / raw) nvmf-all.3 + this patch hits the issue. [ 475.017390] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1dModules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables tun bridge stp llc rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core intel_rapl iosf_mbi sb_edac ib_srp edac_core snd_hwdep scsi_transport_srp ib_ipoib snd_seq rdma_ucm ib_ucm x86_pkg_temp_thermal ib_uverbs intel_powerclamp coretemp ib_umad rdma_cm kvm_intel snd_seq_device snd_pcm kvm snd_timer ib_cm iw_cm snd irqbypass usb_storage eeepc_wmi asus_wmi sparse_keymap rfkill iTCO_wdt crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel iTCO_vendor_support lrw gf128mul glue_helper video serio_raw mxm_wmi ablk_helper soundcore pcspkr cryptd ipmi_si l[ 475.017391] CPU: 1 PID: 3802 Comm: fio Not tainted 4.7.0-rc2+ #2 [ 475.017391] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3101 11/04/2015 [ 475.017391] 0000000000000086 00000000a5918d2f ffff88085fc45bb0 ffffffff8133c50f [ 475.017392] 0000000000000000 0000000000000000 ffff88085fc45bc8 ffffffff8113b148 [ 475.017392] ffff88085f81f000 ffff88085fc45c00 ffffffff81174fec 0000000000000001 [ 475.017392] Call Trace: [ 475.017393] <NMI> [<ffffffff8133c50f>] dump_stack+0x63/0x84 [ 475.017393] [<ffffffff8113b148>] watchdog_overflow_callback+0xc8/0xf0 [ 475.017393] [<ffffffff81174fec>] __perf_event_overflow+0x7c/0x1f0 [ 475.017393] [<ffffffff8117f024>] perf_event_overflow+0x14/0x20 [ 475.017394] [<ffffffff8100c33a>] intel_pmu_handle_irq+0x1da/0x480 [ 475.017394] [<ffffffff8133e686>] ? ioremap_page_range+0x296/0x410 [ 475.017394] [<ffffffff811cb4ac>] ? vunmap_page_range+0x1dc/0x310 [ 475.017395] [<ffffffff811cb5f1>] ? unmap_kernel_range_noflush+0x11/0x20 [ 475.017395] [<ffffffff813eecb6>] ? ghes_copy_tofrom_phys+0x116/0x1f0 [ 475.017395] [<ffffffff81052f3f>] ? native_apic_wait_icr_idle+0x1f/0x30 [ 475.017396] [<ffffffff810054bd>] perf_event_nmi_handler+0x2d/0x50 [ 475.017396] [<ffffffff810311a1>] nmi_handle+0x61/0x110 [ 475.017396] [<ffffffff81031704>] default_do_nmi+0x44/0x120 [ 475.017397] [<ffffffff810318cb>] do_nmi+0xeb/0x160 [ 475.017397] [<ffffffff816b24f1>] end_repeat_nmi+0x1a/0x1e [ 475.017397] [<ffffffff8118ccd7>] ? mempool_free_slab+0x17/0x20 [ 475.017397] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017398] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017398] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017398] <<EOE>> <IRQ> [<ffffffff81187380>] queued_spin_lock_slowpath+0xb/0xf [ 475.017399] [<ffffffff816afe77>] _raw_spin_lock_irqsave+0x37/0x40 [ 475.017399] [<ffffffff8125d96f>] aio_complete+0x6f/0x300 [ 475.017399] [<ffffffff813072d3>] ? bio_put+0x23/0x30 [ 475.017400] [<ffffffff8124da8d>] dio_complete+0xad/0x140 [ 475.017400] [<ffffffff8124dd1b>] dio_bio_end_aio+0x6b/0xf0 [ 475.017400] [<ffffffff81307335>] bio_endio+0x55/0x60 [ 475.017400] [<ffffffff8130fb97>] blk_update_request+0x87/0x300 [ 475.017401] [<ffffffff8131919a>] blk_mq_end_request+0x1a/0x70 [ 475.017401] [<ffffffffa033bfc3>] nvme_complete_rq+0x73/0x1a0 [nvme] [ 475.017401] [<ffffffff813182f0>] ? blkdev_issue_zeroout+0x1e0/0x1e0 [ 475.017402] [<ffffffff81318303>] __blk_mq_complete_request_remote+0x13/0x20 [ 475.017402] [<ffffffff8110346b>] flush_smp_call_function_queue+0x5b/0x160 [ 475.017402] [<ffffffff81103f53>] generic_smp_call_function_single_interrupt+0x13/0x60 [ 475.017403] [<ffffffff81050317>] smp_call_function_single_interrupt+0x27/0x40 [ 475.017403] [<ffffffff816b135c>] call_function_single_interrupt+0x8c/0xa0 [ 475.017403] <EOI> [<ffffffff812025c9>] ? mem_cgroup_migrate+0xd9/0x140 [ 475.017404] [<ffffffff811f205a>] migrate_page_copy+0x26a/0x550 [ 475.017404] [<ffffffff8125d822>] aio_migratepage+0x142/0x1b0 [ 475.017404] [<ffffffff811f23eb>] move_to_new_page+0x5b/0x1f0 [ 475.017404] [<ffffffff811ca2b9>] ? rmap_walk+0x39/0x60 [ 475.017405] [<ffffffff811ca679>] ? try_to_unmap+0xb9/0x150 [ 475.017405] [<ffffffff811c9480>] ? page_remove_rmap+0x220/0x220 [ 475.017405] [<ffffffff811c7ea0>] ? invalid_page_referenced_vma+0x90/0x90 [ 475.017406] [<ffffffff811f2f19>] migrate_pages+0x869/0x8f0 [ 475.017406] [<ffffffff811f0ad0>] ? compound_head+0x20/0x20 [ 475.017406] [<ffffffff811f38a5>] migrate_misplaced_page+0x125/0x1b0 [ 475.017407] [<ffffffff811bc1b2>] handle_pte_fault+0x642/0x1760 [ 475.017407] [<ffffffff811be74b>] handle_mm_fault+0x2bb/0x660 [ 475.017407] [<ffffffff810b5379>] ? task_numa_work+0x259/0x320 [ 475.017407] [<ffffffff8106a3ae>] __do_page_fault+0x1ce/0x4a0 [ 475.017408] [<ffffffff8106a6b0>] do_page_fault+0x30/0x80 [ 475.017408] [<ffffffff81003ba5>] ? do_syscall_64+0xf5/0x110 [ 475.017408] [<ffffffff816b2188>] page_fault+0x28/0x30 [ 475.017409] Kernel panic - not syncing: Hard LOCKUP [ 475.017409] CPU: 1 PID: 3802 Comm: fio Not tainted 4.7.0-rc2+ #2 [ 475.017409] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D16 WS/Z10PE-D16 WS, BIOS 3101 11/04/2015 [ 475.017410] 0000000000000086 00000000a5918d2f ffff88085fc45b20 ffffffff8133c50f [ 475.017410] ffffffff81a202d2 0000000000000000 ffff88085fc45ba0 ffffffff81186dc6 [ 475.017410] 0000000000000010 ffff88085fc45bb0 ffff88085fc45b50 00000000a5918d2f [ 475.017411] Call Trace: [ 475.017411] <NMI> [<ffffffff8133c50f>] dump_stack+0x63/0x84 [ 475.017411] [<ffffffff81186dc6>] panic+0xe2/0x233 [ 475.017411] [<ffffffff8108267f>] nmi_panic+0x3f/0x40 [ 475.017412] [<ffffffff8113b161>] watchdog_overflow_callback+0xe1/0xf0 [ 475.017412] [<ffffffff81174fec>] __perf_event_overflow+0x7c/0x1f0 [ 475.017412] [<ffffffff8117f024>] perf_event_overflow+0x14/0x20 [ 475.017413] [<ffffffff8100c33a>] intel_pmu_handle_irq+0x1da/0x480 [ 475.017413] [<ffffffff8133e686>] ? ioremap_page_range+0x296/0x410 [ 475.017413] [<ffffffff811cb4ac>] ? vunmap_page_range+0x1dc/0x310 [ 475.017414] [<ffffffff811cb5f1>] ? unmap_kernel_range_noflush+0x11/0x20 [ 475.017414] [<ffffffff813eecb6>] ? ghes_copy_tofrom_phys+0x116/0x1f0 [ 475.017414] [<ffffffff81052f3f>] ? native_apic_wait_icr_idle+0x1f/0x30 [ 475.017414] [<ffffffff810054bd>] perf_event_nmi_handler+0x2d/0x50 [ 475.017415] [<ffffffff810311a1>] nmi_handle+0x61/0x110 [ 475.017415] [<ffffffff81031704>] default_do_nmi+0x44/0x120 [ 475.017415] [<ffffffff810318cb>] do_nmi+0xeb/0x160 [ 475.017416] [<ffffffff816b24f1>] end_repeat_nmi+0x1a/0x1e [ 475.017416] [<ffffffff8118ccd7>] ? mempool_free_slab+0x17/0x20 [ 475.017416] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017417] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017417] [<ffffffff810cc397>] ? native_queued_spin_lock_slowpath+0x177/0x1a0 [ 475.017417] <<EOE>> <IRQ> [<ffffffff81187380>] queued_spin_lock_slowpath+0xb/0xf [ 475.017418] [<ffffffff816afe77>] _raw_spin_lock_irqsave+0x37/0x40 [ 475.017418] [<ffffffff8125d96f>] aio_complete+0x6f/0x300 [ 475.017418] [<ffffffff813072d3>] ? bio_put+0x23/0x30 [ 475.017418] [<ffffffff8124da8d>] dio_complete+0xad/0x140 [ 475.017419] [<ffffffff8124dd1b>] dio_bio_end_aio+0x6b/0xf0 [ 475.017419] [<ffffffff81307335>] bio_endio+0x55/0x60 [ 475.017419] [<ffffffff8130fb97>] blk_update_request+0x87/0x300 [ 475.017420] [<ffffffff8131919a>] blk_mq_end_request+0x1a/0x70 [ 475.017420] [<ffffffffa033bfc3>] nvme_complete_rq+0x73/0x1a0 [nvme] [ 475.017420] [<ffffffff813182f0>] ? blkdev_issue_zeroout+0x1e0/0x1e0 [ 475.017420] [<ffffffff81318303>] __blk_mq_complete_request_remote+0x13/0x20 [ 475.017421] [<ffffffff8110346b>] flush_smp_call_function_queue+0x5b/0x160 [ 475.017421] [<ffffffff81103f53>] generic_smp_call_function_single_interrupt+0x13/0x60 [ 475.017421] [<ffffffff81050317>] smp_call_function_single_interrupt+0x27/0x40 [ 475.017422] [<ffffffff816b135c>] call_function_single_interrupt+0x8c/0xa0 [ 475.017422] <EOI> [<ffffffff812025c9>] ? mem_cgroup_migrate+0xd9/0x140 [ 475.017422] [<ffffffff811f205a>] migrate_page_copy+0x26a/0x550 [ 475.017423] [<ffffffff8125d822>] aio_migratepage+0x142/0x1b0 [ 475.017423] [<ffffffff811f23eb>] move_to_new_page+0x5b/0x1f0 [ 475.017423] [<ffffffff811ca2b9>] ? rmap_walk+0x39/0x60 [ 475.017424] [<ffffffff811ca679>] ? try_to_unmap+0xb9/0x150 [ 475.017424] [<ffffffff811c9480>] ? page_remove_rmap+0x220/0x220 [ 475.017424] [<ffffffff811c7ea0>] ? invalid_page_referenced_vma+0x90/0x90 [ 475.017424] [<ffffffff811f2f19>] migrate_pages+0x869/0x8f0 [ 475.017425] [<ffffffff811f0ad0>] ? compound_head+0x20/0x20 [ 475.017425] [<ffffffff811f38a5>] migrate_misplaced_page+0x125/0x1b0 [ 475.017425] [<ffffffff811bc1b2>] handle_pte_fault+0x642/0x1760 [ 475.017426] [<ffffffff811be74b>] handle_mm_fault+0x2bb/0x660 [ 475.017426] [<ffffffff810b5379>] ? task_numa_work+0x259/0x320 [ 475.017426] [<ffffffff8106a3ae>] __do_page_fault+0x1ce/0x4a0 [ 475.017427] [<ffffffff8106a6b0>] do_page_fault+0x30/0x80 [ 475.017427] [<ffffffff81003ba5>] ? do_syscall_64+0xf5/0x110 [ 475.017427] [<ffffffff816b2188>] page_fault+0x28/0x30 [ 475.017483] Kernel Offset: disabled # diff -u block/blk-lib.c ~/blk-lib.c --- block/blk-lib.c 2016-07-04 18:52:58.655123151 +0900 +++ /root/blk-lib.c 2016-07-04 18:52:31.154899845 +0900 @@ -118,6 +118,7 @@ ret = submit_bio_wait(bio); if (ret == -EOPNOTSUPP) ret = 0; + bio_put(bio); } blk_finish_plug(&plug); @@ -171,8 +172,10 @@ } } - if (bio) + if (bio) { ret = submit_bio_wait(bio); + bio_put(bio); + } return ret != -EOPNOTSUPP ? ret : 0; } EXPORT_SYMBOL(blkdev_issue_write_same); @@ -212,8 +215,11 @@ } } - if (bio) - return submit_bio_wait(bio); + if (bio) { + ret = submit_bio_wait( bio); + bio_put(bio); + return ret; + } return 0; } -----Original Message----- From: 'Christoph Hellwig' [mailto:hch@infradead.org] Sent: Tuesday, June 28, 2016 5:50 PM To: SWise OGC <swise at opengridcomputing.com> Cc: 'Sagi Grimberg' <sagi at grimberg.me>; 'Ming Lin' <mlin at kernel.org>; 'Christoph Hellwig' <hch at infradead.org>; Yoichi Hayakawa <yoichi at chelsio.com>; linux-nvme at lists.infradead.org Subject: Re: target crash / host hang with nvme-all.3 branch of nvme-fabrics On Mon, Jun 27, 2016@09:19:26AM -0500, Steve Wise wrote: > It appears this OOM issue is resolved in linux-4.7-rc5. Does it make sense to > publish a rebased nvmf-all? I'd rather get everything into Jens' tree... Either way - I suspect the fix you need is this one: http://git.kernel.dk/cgit/linux-block/commit/block/blk-lib.c?h=for-linus&id=05bd92dddc595d74ea645e793c1f3bd4b1fc251a but that won't be in the for-next tree either. ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2016-07-04 9:57 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-06-16 14:53 target crash / host hang with nvme-all.3 branch of nvme-fabrics Steve Wise 2016-06-16 14:57 ` Christoph Hellwig 2016-06-16 15:10 ` Christoph Hellwig 2016-06-16 15:17 ` Steve Wise 2016-06-16 19:11 ` Sagi Grimberg 2016-06-16 20:38 ` Christoph Hellwig 2016-06-16 21:37 ` Sagi Grimberg 2016-06-16 21:40 ` Sagi Grimberg 2016-06-21 16:01 ` Christoph Hellwig 2016-06-22 10:22 ` Sagi Grimberg 2016-06-16 15:24 ` Steve Wise 2016-06-16 16:41 ` Steve Wise 2016-06-16 15:56 ` Steve Wise 2016-06-16 19:55 ` Sagi Grimberg 2016-06-16 19:59 ` Steve Wise 2016-06-16 20:07 ` Sagi Grimberg 2016-06-16 20:12 ` Steve Wise 2016-06-16 20:27 ` Ming Lin 2016-06-16 20:28 ` Steve Wise 2016-06-16 20:34 ` 'Christoph Hellwig' 2016-06-16 20:49 ` Steve Wise 2016-06-16 21:06 ` Steve Wise 2016-06-16 21:42 ` Sagi Grimberg 2016-06-16 21:47 ` Ming Lin 2016-06-16 21:53 ` Steve Wise 2016-06-16 21:46 ` Steve Wise 2016-06-27 22:29 ` Ming Lin 2016-06-28 9:14 ` 'Christoph Hellwig' 2016-06-28 14:15 ` Steve Wise 2016-06-28 15:51 ` 'Christoph Hellwig' 2016-06-28 16:31 ` Steve Wise 2016-06-28 16:49 ` Ming Lin 2016-06-28 19:20 ` Steve Wise 2016-06-28 19:43 ` Steve Wise 2016-06-28 21:04 ` Ming Lin 2016-06-29 14:11 ` Steve Wise 2016-06-27 17:26 ` Ming Lin 2016-06-16 20:35 ` Steve Wise 2016-06-16 20:01 ` Steve Wise 2016-06-17 14:05 ` Steve Wise [not found] ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com> 2016-06-17 14:16 ` Steve Wise 2016-06-17 17:20 ` Ming Lin 2016-06-19 11:57 ` Sagi Grimberg 2016-06-21 14:18 ` Steve Wise 2016-06-21 17:33 ` Ming Lin 2016-06-21 17:59 ` Steve Wise [not found] ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com> 2016-06-22 13:42 ` Steve Wise 2016-06-27 14:19 ` Steve Wise 2016-06-28 8:50 ` 'Christoph Hellwig' 2016-07-04 9:57 ` Yoichi Hayakawa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).