Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme-fabrics: crash at nvme connect-all
Date: Fri, 10 Jun 2016 11:22:23 -0500	[thread overview]
Message-ID: <020b01d1c334$45077f50$cf167df0$@opengridcomputing.com> (raw)
In-Reply-To: <01c601d1c32a$59576ec0$0c064c40$@opengridcomputing.com>

> > Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations
> > (or whatever value you need to make it happen).  Then on the same machine,
> > export a target device, load nvme-rdma and discover/connect to that target
> > device with nvme.  It will crash.
> >
> > Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm
> > not sure why...
> >
> 
> Previously I was using Doug's rdma rxe branch + sagi's rxe fixes + rebased on nvmf-
> all.2.   To simplify, I have now gone to just straight nvmf-all.2.  Also, I separated the
> host and target to different nodes and reproduced the problem.  It?s the host side
> that is crashing.  Same GPF with RIP:
> 
> RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> 
> Steve.

I enabled lots of kernel memory debugging and now hit this.  Perhaps a clue?  Freeing an active timer list widget?

nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvme nvme1: creating 16 I/O queues.
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
------------[ cut here ]------------
WARNING: CPU: 1 PID: 10440 at lib/debugobjects.c:263 debug_print_object+0x8e/0xb0
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in: nvme_rdma nvme_fabrics rdma_ucm rdma_cm iw_cm configfs iw_cxgb4 cxgb4 ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q garp stp llc cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en ib_mthca dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm irqbypass uinput iTCO_wdt iTCO_vendor_support pcspkr mlx4_ib ib_core ipv6 mlx4_core dm_mod sg lpc_ich mfd_core i2c_i801 nvme nvme_core igb dca ptp pps_core acpi_cpufreq ext4(E) mbcache(E) jbd2(E) sd_mod(E) nouveau(E) ttm(E) drm_kms_helper(E) drm(E) fb_sys_fops(E) sysimgblt(E) sysfillrect(E) syscopyarea(E) i2c_algo_bit(E) i2c_core(E) mxm_wmi(E) video(E) ahci(E) libahci(E) wmi(E) [last unloaded: cxgb4]
CPU: 1 PID: 10440 Comm: nvme Tainted: G            E   4.7.0-rc2-nvmf-all.2+ #42
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
 0000000000000000 ffff881027a13a18 ffffffff812f032d ffffffff8130e65e
 ffff881027a13a78 ffff881027a13a78 0000000000000000 ffff881027a13a68
 ffffffff8106694d 0000031800000001 000001072aad7ce8 dead000000000200
Call Trace:
 [<ffffffff812f032d>] dump_stack+0x51/0x74
 [<ffffffff8130e65e>] ? debug_print_object+0x8e/0xb0
 [<ffffffff8106694d>] __warn+0xfd/0x120
 [<ffffffff81066a29>] warn_slowpath_fmt+0x49/0x50
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff8130e65e>] debug_print_object+0x8e/0xb0
 [<ffffffff81080850>] ? __queue_work+0x520/0x520
 [<ffffffff8130ecbe>] __debug_check_no_obj_freed+0x1ee/0x270
 [<ffffffff8130ed57>] debug_check_no_obj_freed+0x17/0x20
 [<ffffffff811c3aac>] kfree+0x9c/0x120
 [<ffffffff81182d72>] ? kfree_const+0x22/0x30
 [<ffffffff812f2f3c>] ? kobject_cleanup+0x9c/0x1b0
 [<ffffffffa04cc696>] nvme_rdma_free_ctrl+0xa6/0xc0 [nvme_rdma]
 [<ffffffffa06fcc36>] nvme_free_ctrl+0x46/0x60 [nvme_core]
 [<ffffffffa06feb2b>] nvme_put_ctrl+0x1b/0x20 [nvme_core]
 [<ffffffffa04cf1a2>] nvme_rdma_create_ctrl+0x412/0x4f0 [nvme_rdma]
 [<ffffffffa04c5d02>] nvmf_create_ctrl+0x182/0x210 [nvme_fabrics]
 [<ffffffffa04c5e3c>] nvmf_dev_write+0xac/0x110 [nvme_fabrics]
 [<ffffffff811d9c24>] __vfs_write+0x34/0x120
 [<ffffffff81002515>] ? trace_event_raw_event_sys_enter+0xb5/0x130
 [<ffffffff811d9dc9>] vfs_write+0xb9/0x130
 [<ffffffff811f9592>] ? __fdget_pos+0x12/0x50
 [<ffffffff811da9b9>] SyS_write+0x59/0xc0
 [<ffffffff81002d6d>] do_syscall_64+0x6d/0x160
 [<ffffffff81642e7c>] entry_SYSCALL64_slow_path+0x25/0x25
---[ end trace 7f80ebccfc6bd15d ]---

  reply	other threads:[~2016-06-10 16:22 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
2016-06-09  9:29 ` Sagi Grimberg
2016-06-09 10:07   ` Marta Rybczynska
2016-06-09 11:09     ` Sagi Grimberg
2016-06-09 12:12       ` Marta Rybczynska
2016-06-09 12:30         ` Sagi Grimberg
2016-06-09 13:27           ` Steve Wise
2016-06-09 13:36             ` Steve Wise
2016-06-09 13:48               ` Sagi Grimberg
2016-06-09 14:09                 ` Steve Wise
2016-06-09 14:22                   ` Steve Wise
2016-06-09 14:29                     ` Steve Wise
2016-06-09 15:04                       ` Marta Rybczynska
2016-06-09 15:40                         ` Steve Wise
2016-06-09 15:48                           ` Steve Wise
2016-06-10  9:03                             ` Marta Rybczynska
2016-06-10 13:40                               ` Steve Wise
2016-06-10 13:42                                 ` Marta Rybczynska
2016-06-10 13:49                                   ` Steve Wise
2016-06-09 13:25   ` Christoph Hellwig
2016-06-09 13:24 ` Christoph Hellwig
2016-06-09 15:37   ` Marta Rybczynska
2016-06-09 20:25     ` Steve Wise
2016-06-09 20:35       ` Ming Lin
2016-06-09 21:06         ` Steve Wise
2016-06-09 22:26           ` Ming Lin
2016-06-09 22:40             ` Steve Wise
     [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
2016-06-10 15:11               ` Steve Wise
2016-06-10 16:22                 ` Steve Wise [this message]
2016-06-10 18:43                   ` Ming Lin
2016-06-10 19:17                     ` Steve Wise
2016-06-10 20:00                       ` Ming Lin
2016-06-10 20:15                         ` Steve Wise
2016-06-10 20:18                           ` Ming Lin
2016-06-10 21:14                             ` Steve Wise
2016-06-10 21:20                               ` Ming Lin
2016-06-10 21:25                                 ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020b01d1c334$45077f50$cf167df0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox