Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme-fabrics: crash at nvme connect-all
Date: Thu, 9 Jun 2016 17:40:08 -0500	[thread overview]
Message-ID: <055701d1c29f$e0919180$a1b4b480$@opengridcomputing.com> (raw)
In-Reply-To: <CAF1ivSYBtsuvm-UO6osPAVT-krpF7iXqWy_8LheyWDDoAdWL1A@mail.gmail.com>



> -----Original Message-----
> From: Ming Lin [mailto:mlin at kernel.org]
> Sent: Thursday, June 9, 2016 5:26 PM
> To: Steve Wise <swise at opengridcomputing.com>
> Cc: keith busch <keith.busch at intel.com>; ming l <ming.l at ssi.samsung.com>;
> Sagi Grimberg <sagi at grimberg.me>; Marta Rybczynska
> <mrybczyn at kalray.eu>; Jens Axboe <axboe at fb.com>; linux-
> nvme at lists.infradead.org; Christoph Hellwig <hch at infradead.org>; james p
> freyensee <james.p.freyensee at intel.com>; armenx baloyan
> <armenx.baloyan at intel.com>
> Subject: Re: nvme-fabrics: crash at nvme connect-all
> 
> On Thu, Jun 9, 2016 at 2:06 PM, Steve Wise
> <swise@opengridcomputing.com> wrote:
> 
> > Yes, I get the same crash after reproducing it twice.  At least the RIP is
> exactly the same:
> >
> > get_next_timer_interrupt+0x183/0x210
> >
> > The rest of the stack looked a little different but still had tick_nohz stuff in
> it.
> >
> > Does this look correct ("freeing queue 17" twice)?
> >
> > nvmet: creating controller 1 for NQN nqn.2014-
> 08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> > nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery",
> addr 10.0.1.14:4420
> > nvmet_rdma: freeing queue 17
> > nvmet: creating controller 1 for NQN nqn.2014-
> 08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
> > nvme nvme1: creating 16 I/O queues.
> > rdma_rw_init_mrs: failed to allocated 128 MRs
> > failed to init MR pool ret= -12
> > nvmet_rdma: failed to create_qp ret= -12
> > nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> > nvme nvme1: Connect rejected, no private data.
> > nvme nvme1: rdma_resolve_addr wait failed (-104).
> > nvme nvme1: failed to initialize i/o queue: -104
> > nvmet_rdma: freeing queue 17
> > general protection fault: 0000 [#1] SMP
> 
> I'll get a Chelsio card to try.
> 
> What's the step to reproduce?

Add the hack into iw_cxgb4 to force alloc_mr failures after 200 allocations (or whatever value you need to make it happen).  Then on the same machine, export a target device, load nvme-rdma and discover/connect to that target device with nvme.  It will crash.

Unfortunately, with the 4.7-rc2 base I'm using, I get no vmcore dump.  I'm not sure why...

  reply	other threads:[~2016-06-09 22:40 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
2016-06-09  9:29 ` Sagi Grimberg
2016-06-09 10:07   ` Marta Rybczynska
2016-06-09 11:09     ` Sagi Grimberg
2016-06-09 12:12       ` Marta Rybczynska
2016-06-09 12:30         ` Sagi Grimberg
2016-06-09 13:27           ` Steve Wise
2016-06-09 13:36             ` Steve Wise
2016-06-09 13:48               ` Sagi Grimberg
2016-06-09 14:09                 ` Steve Wise
2016-06-09 14:22                   ` Steve Wise
2016-06-09 14:29                     ` Steve Wise
2016-06-09 15:04                       ` Marta Rybczynska
2016-06-09 15:40                         ` Steve Wise
2016-06-09 15:48                           ` Steve Wise
2016-06-10  9:03                             ` Marta Rybczynska
2016-06-10 13:40                               ` Steve Wise
2016-06-10 13:42                                 ` Marta Rybczynska
2016-06-10 13:49                                   ` Steve Wise
2016-06-09 13:25   ` Christoph Hellwig
2016-06-09 13:24 ` Christoph Hellwig
2016-06-09 15:37   ` Marta Rybczynska
2016-06-09 20:25     ` Steve Wise
2016-06-09 20:35       ` Ming Lin
2016-06-09 21:06         ` Steve Wise
2016-06-09 22:26           ` Ming Lin
2016-06-09 22:40             ` Steve Wise [this message]
     [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
2016-06-10 15:11               ` Steve Wise
2016-06-10 16:22                 ` Steve Wise
2016-06-10 18:43                   ` Ming Lin
2016-06-10 19:17                     ` Steve Wise
2016-06-10 20:00                       ` Ming Lin
2016-06-10 20:15                         ` Steve Wise
2016-06-10 20:18                           ` Ming Lin
2016-06-10 21:14                             ` Steve Wise
2016-06-10 21:20                               ` Ming Lin
2016-06-10 21:25                                 ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='055701d1c29f$e0919180$a1b4b480$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox