From: swise@opengridcomputing.com (Steve Wise)
Subject: target crash / host hang with nvme-all.3 branch of nvme-fabrics
Date: Thu, 16 Jun 2016 16:46:30 -0500 [thread overview]
Message-ID: <021c01d1c818$8ac01730$a0404590$@opengridcomputing.com> (raw)
In-Reply-To: <020201d1c812$ec94b430$c5be1c90$@opengridcomputing.com>
> hrm...
>
> Forcing more reconnects, I just hit this. It looks different from the other
> issue:
>
> general protection fault: 0000 [#1] SMP
> Modules linked in: rdma_ucm iw_cxgb4 cxgb4 nvmet_rdma rdma_cm iw_cm
> nvmet
> null_blk configfs ip6table_filter ip6_tables ebtable_nat ebtables
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT nf_reject_
> ipv4 xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 8021q
> garp stp llc ipmi_devintf cachefiles fscache ib_ipoib ib_cm ib_uverbs ib_umad
> iw_nes libcrc32c iw_cxgb3 cxgb3 mdio ib_qib rdmavt mlx4_en i b_mthca
> dm_mirror
> dm_region_hash dm_log vhost_net macvtap macvlan vhost tun kvm_intel kvm
> irqbypass uinput iTCO_wdt iTCO_vendor_support mxm_wmi pcspkr mlx4_ib
> ib_core
> ipv6 mlx4_core dm_mod i2c_i801 sg lpc_ich mfd_cor e nvme nvme_core
> acpi_cpufreq ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core wmi ext4(E)
> mbcache(E) jbd2(E) sd_mod(E) ahci(E) libahci(E) [last unloaded: cxgb4]
> CPU: 3 PID: 19213 Comm: kworker/3:10 Tainted: G E
> 4.7.0-rc2-nvmf-all.3+rxe+ #84
> Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
> Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
> task: ffff88103d68cf00 ti: ffff880fdf7a4000 task.ti: ffff880fdf7a4000
> RIP: 0010:[<ffffffffa01ef5b7>] [<ffffffffa01ef5b7>]
> nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma]
> RSP: 0018:ffff880fdf7a7bb8 EFLAGS: 00010202
> RAX: dead000000000100 RBX: 000000000000001f RCX: 0000000000000001
> RDX: dead000000000200 RSI: ffff880fdd884290 RDI: dead000000000200
> RBP: ffff880fdf7a7bf8 R08: dead000000000100 R09: ffff88103c768140
> R10: ffff88103c7682c0 R11: ffff88103c768340 R12: 00000000000044c8
> R13: ffff88103db39c00 R14: 0000000000000100 R15: ffff88103e29cec0
> FS: 0000000000000000(0000) GS:ffff88107f2c0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000001016b00 CR3: 000000103bcb7000 CR4: 00000000000406e0
> Stack:
> ffff880fdd8a23f8 00000000ffac1a05 ffff880fdf7a7bf8 ffff88103db39c00
> ffff88103c64cc00 ffffe8ffffac1a00 0000000000000000 ffffe8ffffac1a05
> ffff880fdf7a7c18 ffffffffa01ef652 0000000000000246 ffff88103e29cec0
> Call Trace:
> [<ffffffffa01ef652>] nvmet_rdma_free_queue+0x52/0xa0 [nvmet_rdma]
> [<ffffffffa01ef6d3>] nvmet_rdma_release_queue_work+0x33/0x70 [nvmet_rdma]
> [<ffffffff8107cb5b>] process_one_work+0x17b/0x510
> [<ffffffff8161495c>] ? __schedule+0x23c/0x630
> [<ffffffff810c6c4c>] ? del_timer_sync+0x4c/0x60
> [<ffffffff8107da0b>] ? maybe_create_worker+0x8b/0x110
> [<ffffffff81614eb0>] ? schedule+0x40/0xb0
> [<ffffffff8107dbf6>] worker_thread+0x166/0x580
> [<ffffffff8161495c>] ? __schedule+0x23c/0x630
> [<ffffffff8108e162>] ? default_wake_function+0x12/0x20
> [<ffffffff8109fc26>] ? __wake_up_common+0x56/0x90
> [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110
> [<ffffffff81614eb0>] ? schedule+0x40/0xb0
> [<ffffffff8107da90>] ? maybe_create_worker+0x110/0x110
> [<ffffffff8108255c>] kthread+0xcc/0xf0
> [<ffffffff8108cade>] ? schedule_tail+0x1e/0xc0
> [<ffffffff816186cf>] ret_from_fork+0x1f/0x40
> [<ffffffff81082490>] ? kthread_freezable_should_stop+0x70/0x70
> Code: b8 00 01 00 00 00 00 ad de 48 bf 00 02 00 00 00 00 ad de 83 c3 01 49 81
c4
> 38 02 00 00 48 8b 86 28 02 00 00 48 8b 96 30 02 00 00 <48> 89 50 08 48 89 45
c0
> 48 89 02 48 89 be 30 02 00 00 4c 89 ff
> RIP [<ffffffffa01ef5b7>] nvmet_rdma_free_rsps+0x67/0xb0 [nvmet_rdma]
> RSP <ffff880fdf7a7bb8>
> ---[ end trace a30265f72371b5ce ]---
>
It crashed in list_del(). I assume rsp was garbage since it was a GPF.
----
static void nvmet_rdma_free_rsps(struct nvmet_rdma_queue *queue)
{
struct nvmet_rdma_device *ndev = queue->dev;
int i, nr_rsps = queue->recv_queue_size * 2;
for (i = 0; i < nr_rsps; i++) {
struct nvmet_rdma_rsp *rsp = &queue->rsps[i];
list_del(&rsp->free_list); <----- HERE
nvmet_rdma_free_rsp(ndev, rsp);
}
kfree(queue->rsps);
}
---
Here is the assembler:
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next->prev = prev;
15b7: 48 89 50 08 mov %rdx,0x8(%rax)
And RAX/RDX are list poison values, I think:
RAX: dead000000000100
RDX: dead000000000200
So rsp was already deleted off its list? Or has rsp been freed?
next prev parent reply other threads:[~2016-06-16 21:46 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-16 14:53 target crash / host hang with nvme-all.3 branch of nvme-fabrics Steve Wise
2016-06-16 14:57 ` Christoph Hellwig
2016-06-16 15:10 ` Christoph Hellwig
2016-06-16 15:17 ` Steve Wise
2016-06-16 19:11 ` Sagi Grimberg
2016-06-16 20:38 ` Christoph Hellwig
2016-06-16 21:37 ` Sagi Grimberg
2016-06-16 21:40 ` Sagi Grimberg
2016-06-21 16:01 ` Christoph Hellwig
2016-06-22 10:22 ` Sagi Grimberg
2016-06-16 15:24 ` Steve Wise
2016-06-16 16:41 ` Steve Wise
2016-06-16 15:56 ` Steve Wise
2016-06-16 19:55 ` Sagi Grimberg
2016-06-16 19:59 ` Steve Wise
2016-06-16 20:07 ` Sagi Grimberg
2016-06-16 20:12 ` Steve Wise
2016-06-16 20:27 ` Ming Lin
2016-06-16 20:28 ` Steve Wise
2016-06-16 20:34 ` 'Christoph Hellwig'
2016-06-16 20:49 ` Steve Wise
2016-06-16 21:06 ` Steve Wise
2016-06-16 21:42 ` Sagi Grimberg
2016-06-16 21:47 ` Ming Lin
2016-06-16 21:53 ` Steve Wise
2016-06-16 21:46 ` Steve Wise [this message]
2016-06-27 22:29 ` Ming Lin
2016-06-28 9:14 ` 'Christoph Hellwig'
2016-06-28 14:15 ` Steve Wise
2016-06-28 15:51 ` 'Christoph Hellwig'
2016-06-28 16:31 ` Steve Wise
2016-06-28 16:49 ` Ming Lin
2016-06-28 19:20 ` Steve Wise
2016-06-28 19:43 ` Steve Wise
2016-06-28 21:04 ` Ming Lin
2016-06-29 14:11 ` Steve Wise
2016-06-27 17:26 ` Ming Lin
2016-06-16 20:35 ` Steve Wise
2016-06-16 20:01 ` Steve Wise
2016-06-17 14:05 ` Steve Wise
[not found] ` <005f01d1c8a1$5a229240$0e67b6c0$@opengridcomputing.com>
2016-06-17 14:16 ` Steve Wise
2016-06-17 17:20 ` Ming Lin
2016-06-19 11:57 ` Sagi Grimberg
2016-06-21 14:18 ` Steve Wise
2016-06-21 17:33 ` Ming Lin
2016-06-21 17:59 ` Steve Wise
[not found] ` <006e01d1cbc7$d0d9cc40$728d64c0$@opengridcomputing.com>
2016-06-22 13:42 ` Steve Wise
2016-06-27 14:19 ` Steve Wise
2016-06-28 8:50 ` 'Christoph Hellwig'
2016-07-04 9:57 ` Yoichi Hayakawa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='021c01d1c818$8ac01730$a0404590$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.