linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: target crash / host hang with nvme-all.3 branch of nvme-fabrics
Date: Thu, 16 Jun 2016 16:06:17 -0500	[thread overview]
Message-ID: <020201d1c812$ec94b430$c5be1c90$@opengridcomputing.com> (raw)
In-Reply-To: <01e701d1c810$91d851c0$b588f540$@opengridcomputing.com>

> > Unfortunately I think it's still wrong because it will only delete
> > a single queue per controller.  We'll probably need something
> > like this instead, which does the same think but also has a retry
> > loop for additional queues:
> >
> >
> > diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> > index b1c6e5b..425b55c 100644
> > --- a/drivers/nvme/target/rdma.c
> > +++ b/drivers/nvme/target/rdma.c
> > @@ -1293,19 +1293,20 @@ static int nvmet_rdma_cm_handler(struct
> > rdma_cm_id *cm_id,
> >
> >  static void nvmet_rdma_delete_ctrl(struct nvmet_ctrl *ctrl)
> >  {
> > -	struct nvmet_rdma_queue *queue, *next;
> > -	static LIST_HEAD(del_list);
> > +	struct nvmet_rdma_queue *queue;
> >
> > +restart:
> >  	mutex_lock(&nvmet_rdma_queue_mutex);
> > -	list_for_each_entry_safe(queue, next,
> > -			&nvmet_rdma_queue_list, queue_list) {
> > -		if (queue->nvme_sq.ctrl->cntlid == ctrl->cntlid)
> > -			list_move_tail(&queue->queue_list, &del_list);
> > +	list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) {
> > +		if (queue->nvme_sq.ctrl == ctrl) {
> > +			list_del_init(&queue->queue_list);
> > +			mutex_unlock(&nvmet_rdma_queue_mutex);
> > +
> > +			__nvmet_rdma_queue_disconnect(queue);
> > +			goto restart;
> > +		}
> >  	}
> >  	mutex_unlock(&nvmet_rdma_queue_mutex);
> > -
> > -	list_for_each_entry_safe(queue, next, &del_list, queue_list)
> > -		nvmet_rdma_queue_disconnect(queue);
> >  }
> >
> >  static int nvmet_rdma_add_port(struct nvmet_port *port)
> >
> 
> This patch works.
> 
> Tested-by: Steve Wise <swise at opengridcomputing.com>
> 
> 

hrm...

Forcing more reconnects, I just hit this.  It looks different from the other
issue:

general protection fault: 0000 [#1] SMP
Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm nvmet
null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_
ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q
garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad
iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i   b_mthca dm_mirror
dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm
irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib ib_core
ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor   e nvme nvme_core
acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E)
mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4]
CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G            E
4.7.0-rc2-nvmf-all.3+rxe+ #84
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000
RIP: 0010:[<ffffffffa01ef5b7>]  [<ffffffffa01ef5b7>]
nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma]
RSP: 0018:ffff880fdf7a7bb8  EFLAGS: 00010202
RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001
RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200
RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140
R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8
R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0
FS:  0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0
Stack:
 ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00
 ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05
 ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0
Call Trace:
 [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma]
 [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma]
 [<ffffffff8107cb5b>] process_one_work+0x17b/0x510
 [<ffffffff8161495c>] ? __schedule+0x23c/0x630
 [<ffffffff810c6c4c>] ? del_timer_sync+0x4c/0x60
 [<ffffffff8107da0b>] ? maybe_create_worker+0x8b/0x110
 [<ffffffff81614eb0>] ? schedule+0x40/0xb0
 [<ffffffff8107dbf6>] worker_thread+0x166/0x580
 [<ffffffff8161495c>] ? __schedule+0x23c/0x630
 [<ffffffff8108e162>] ? default_wake_function+0x12/0x20
 [<ffffffff8109fc26>] ? __wake_up_common+0x56/0x90
 [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110
 [<ffffffff81614eb0>] ? schedule+0x40/0xb0
 [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110
 [<ffffffff8108255c>] kthread+0xcc/0xf0
 [<ffffffff8108cade>] ? schedule_tail+0x1e/0xc0
 [<ffffffff816186cf>] ret_from_fork+0x1f/0x40
 [<ffffffff81082490>] ? kthread_freezable_should_stop+0x70/0x70
Code: b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de 83 c3 01 49 81 c4
38 02 00 00 48 8b 86 28 02 00 00 48 8b 96 30 02 00 00 <48> 89 50 08 48 89 45 c0
48 89 02 48 89 be 30 02 00 00 4c 89 ff
RIP  [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma]
 RSP <ffff880fdf7a7bb8>
---[ end trace a30265f72371b5ce ]---

  reply	other threads:[~2016-06-16 21:06 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 14:53 target crash / host hang with nvme-all.3 branch of nvme-fabrics Steve Wise
2016-06-16 14:57 ` Christoph Hellwig
2016-06-16 15:10   ` Christoph Hellwig
2016-06-16 15:17     ` Steve Wise
2016-06-16 19:11     ` Sagi Grimberg
2016-06-16 20:38       ` Christoph Hellwig
2016-06-16 21:37         ` Sagi Grimberg
2016-06-16 21:40           ` Sagi Grimberg
2016-06-21 16:01           ` Christoph Hellwig
2016-06-22 10:22             ` Sagi Grimberg
2016-06-16 15:24   ` Steve Wise
2016-06-16 16:41     ` Steve Wise
2016-06-16 15:56   ` Steve Wise
2016-06-16 19:55     ` Sagi Grimberg
2016-06-16 19:59       ` Steve Wise
2016-06-16 20:07         ` Sagi Grimberg
2016-06-16 20:12           ` Steve Wise
2016-06-16 20:27             ` Ming Lin
2016-06-16 20:28               ` Steve Wise
2016-06-16 20:34                 ` 'Christoph Hellwig'
2016-06-16 20:49                   ` Steve Wise
2016-06-16 21:06                     ` Steve Wise [this message]
2016-06-16 21:42                       ` Sagi Grimberg
2016-06-16 21:47                         ` Ming Lin
2016-06-16 21:53                           ` Steve Wise
2016-06-16 21:46                       ` Steve Wise
2016-06-27 22:29                       ` Ming Lin
2016-06-28  9:14                         ` 'Christoph Hellwig'
2016-06-28 14:15                           ` Steve Wise
2016-06-28 15:51                             ` 'Christoph Hellwig'
2016-06-28 16:31                               ` Steve Wise
2016-06-28 16:49                                 ` Ming Lin
2016-06-28 19:20                                   ` Steve Wise
2016-06-28 19:43                                     ` Steve Wise
2016-06-28 21:04                                       ` Ming Lin
2016-06-29 14:11                                         ` Steve Wise
2016-06-27 17:26                   ` Ming Lin
2016-06-16 20:35           ` Steve Wise
2016-06-16 20:01       ` Steve Wise
2016-06-17 14:05       ` Steve Wise
     [not found]       ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com>
2016-06-17 14:16         ` Steve Wise
2016-06-17 17:20           ` Ming Lin
2016-06-19 11:57             ` Sagi Grimberg
2016-06-21 14:18               ` Steve Wise
2016-06-21 17:33                 ` Ming Lin
2016-06-21 17:59                   ` Steve Wise
     [not found]               ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com>
2016-06-22 13:42                 ` Steve Wise
2016-06-27 14:19                   ` Steve Wise
2016-06-28  8:50                     ` 'Christoph Hellwig'
2016-07-04  9:57                       ` Yoichi Hayakawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='020201d1c812$ec94b430$c5be1c90$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).