All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: nvme-fabrics: crash at nvme connect-all
Date: Thu, 9 Jun 2016 16:06:49 -0500	[thread overview]
Message-ID: <04e301d1c292$d6c34430$8449cc90$@opengridcomputing.com> (raw)
In-Reply-To: <CAF1ivSb2fvjEzCxWXnrxv_i74SRm2qxWZ-RiKpEaGOx-Dk3f1A@mail.gmail.com>

> >
> > I can force a crash with this patch:
> >
> > diff --git a/drivers/infiniband/hw/cxgb4/mem.c
> b/drivers/infiniband/hw/cxgb4/mem.c
> > index 55d0651..bbc1422 100644
> > --- a/drivers/infiniband/hw/cxgb4/mem.c
> > +++ b/drivers/infiniband/hw/cxgb4/mem.c
> > @@ -619,6 +619,10 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
> >         u32 stag = 0;
> >         int ret = 0;
> >         int length = roundup(max_num_sg * sizeof(u64), 32);
> > +       static int foo;
> > +
> > +       if (foo++ > 200)
> > +               return ERR_PTR(-ENOMEM);
> >
> >         php = to_c4iw_pd(pd);
> >         rhp = php->rhp;
> >
> >
> > Crash:
> >
> > rdma_rw_init_mrs: failed to allocated 128 MRs
> > failed to init MR pool ret= -12
> > nvmet_rdma: failed to create_qp ret= -12
> > nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
> > nvme nvme1: Connect rejected, no private data.
> > nvme nvme1: rdma_resolve_addr wait failed (-104).
> > nvme nvme1: failed to initialize i/o queue: -104
> > nvmet_rdma: freeing queue 17
> > general protection fault: 0000 [#1] SMP
> 
> > RIP: 0010:[<ffffffff810d04c3>]  [<ffffffff810d04c3>]
> get_next_timer_interrupt+0x183/0x210
> > RSP: 0018:ffff88107f243e68  EFLAGS: 00010002
> > RAX: 00000000fffe39b8 RBX: 0000000000000001 RCX: 00000000fffe39b8
> > RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000039 RDI: 0000000000000036
> > RBP: ffff88107f243eb8 R08: ffff88107f24f488 R09: 0000000000fffe36
> > R10: ffff88107f243e70 R11: ffff88107f243e88 R12: 0000002a89f289c0
> > R13: 00000000fffe35d0 R14: ffff88107f24ec40 R15: 0000000000000040
> > FS:  0000000000000000(0000) GS:ffff88107f240000(0000)
> knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffffffffff600400 CR3: 000000103af92000 CR4: 00000000000406e0
> > Stack:
> >  ffff88107f24f488 ffff88107f24f688 ffff88107f24f888 ffff88107f24fa88
> >  ffff88107ec39698 ffff88107f250180 00000000fffe35d0 ffff88107f24c700
> >  0000002a89f30293 0000002a89f289c0 ffff88107f243f38 ffffffff810e2ac4
> > Call Trace:
> >  <IRQ>
> >  [<ffffffff810e2ac4>] tick_nohz_stop_sched_tick+0x1b4/0x2c0
> >  [<ffffffff810986a5>] ? sched_clock_cpu+0xc5/0xd0
> >  [<ffffffff810e2c73>] __tick_nohz_idle_enter+0xa3/0x140
> >  [<ffffffff810e2d38>] tick_nohz_irq_exit+0x28/0x40
> >  [<ffffffff8106c0a5>] irq_exit+0x95/0xb0
> >  [<ffffffff81642c76>] smp_apic_timer_interrupt+0x46/0x60
> >  [<ffffffff8164134f>] apic_timer_interrupt+0x7f/0x90
> >  <EOI>
> >  [<ffffffff810a7d2a>] ? cpu_idle_loop+0xda/0x250
> >  [<ffffffff810a7e13>] ? cpu_idle_loop+0x1c3/0x250
> >  [<ffffffff810a7ec1>] cpu_startup_entry+0x21/0x30
> >  [<ffffffff81044ce8>] start_secondary+0x78/0x80
> 
> The stack looks weird. Nothing nvme code related.
> I guess it is a random crash.
> 
> Could you do it again and will you see a different call stack?

Yes, I get the same crash after reproducing it twice.  At least the RIP is exactly the same:

get_next_timer_interrupt+0x183/0x210

The rest of the stack looked a little different but still had tick_nohz stuff in it.

Does this look correct ("freeing queue 17" twice)?

nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420
nvmet_rdma: freeing queue 17
nvmet: creating controller 1 for NQN nqn.2014-08.org.nvmexpress:NVMf:uuid:6e01fbc9-49fb-4998-9522-df85a95f9ff7.
nvme nvme1: creating 16 I/O queues.
rdma_rw_init_mrs: failed to allocated 128 MRs
failed to init MR pool ret= -12
nvmet_rdma: failed to create_qp ret= -12
nvmet_rdma: nvmet_rdma_alloc_queue: creating RDMA queue failed (-12).
nvme nvme1: Connect rejected, no private data.
nvme nvme1: rdma_resolve_addr wait failed (-104).
nvme nvme1: failed to initialize i/o queue: -104
nvmet_rdma: freeing queue 17
general protection fault: 0000 [#1] SMP

  reply	other threads:[~2016-06-09 21:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-09  9:18 nvme-fabrics: crash at nvme connect-all Marta Rybczynska
2016-06-09  9:29 ` Sagi Grimberg
2016-06-09 10:07   ` Marta Rybczynska
2016-06-09 11:09     ` Sagi Grimberg
2016-06-09 12:12       ` Marta Rybczynska
2016-06-09 12:30         ` Sagi Grimberg
2016-06-09 13:27           ` Steve Wise
2016-06-09 13:36             ` Steve Wise
2016-06-09 13:48               ` Sagi Grimberg
2016-06-09 14:09                 ` Steve Wise
2016-06-09 14:22                   ` Steve Wise
2016-06-09 14:29                     ` Steve Wise
2016-06-09 15:04                       ` Marta Rybczynska
2016-06-09 15:40                         ` Steve Wise
2016-06-09 15:48                           ` Steve Wise
2016-06-10  9:03                             ` Marta Rybczynska
2016-06-10 13:40                               ` Steve Wise
2016-06-10 13:42                                 ` Marta Rybczynska
2016-06-10 13:49                                   ` Steve Wise
2016-06-09 13:25   ` Christoph Hellwig
2016-06-09 13:24 ` Christoph Hellwig
2016-06-09 15:37   ` Marta Rybczynska
2016-06-09 20:25     ` Steve Wise
2016-06-09 20:35       ` Ming Lin
2016-06-09 21:06         ` Steve Wise [this message]
2016-06-09 22:26           ` Ming Lin
2016-06-09 22:40             ` Steve Wise
     [not found]             ` <055801d1c29f$e164c000$a42e4000$@opengridcomputing.com>
2016-06-10 15:11               ` Steve Wise
2016-06-10 16:22                 ` Steve Wise
2016-06-10 18:43                   ` Ming Lin
2016-06-10 19:17                     ` Steve Wise
2016-06-10 20:00                       ` Ming Lin
2016-06-10 20:15                         ` Steve Wise
2016-06-10 20:18                           ` Ming Lin
2016-06-10 21:14                             ` Steve Wise
2016-06-10 21:20                               ` Ming Lin
2016-06-10 21:25                                 ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='04e301d1c292$d6c34430$8449cc90$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.