All of lore.kernel.org
 help / color / mirror / Atom feed
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
	linux-nvme@lists.infradead.org
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 23:12:27 +0530	[thread overview]
Message-ID: <20200728174224.GA5497@chelsio.com> (raw)
In-Reply-To: <4d87ffbb-24a2-9342-4507-cabd9e3b76c2@grimberg.me>



Sagi,

Yes, Multipath is disabled.
This time, with "nvme-fabrics: allow to queue requests for live queues"
patch applied, I see hang only at blk_queue_enter():
[Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds.
[  +0.000061]       Not tainted 5.8.0-rc7ekr+ #2
[  +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  +0.000059] nvme            D14392 21119   2456 0x00004000
[  +0.000059] Call Trace:
[  +0.000110]  __schedule+0x32b/0x670
[  +0.000108]  schedule+0x45/0xb0
[  +0.000107]  blk_queue_enter+0x1e9/0x250
[  +0.000109]  ? wait_woken+0x70/0x70
[  +0.000110]  blk_mq_alloc_request+0x53/0xc0
[  +0.000111]  nvme_alloc_request+0x61/0x70 [nvme_core]
[  +0.000121]  nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[  +0.000118]  nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[  +0.000163]  ? _copy_to_user+0x22/0x30
[  +0.000113]  blkdev_ioctl+0x100/0x250
[  +0.000115]  block_ioctl+0x34/0x40
[  +0.000110]  ksys_ioctl+0x82/0xc0
[  +0.000109]  __x64_sys_ioctl+0x11/0x20
[  +0.000109]  do_syscall_64+0x3e/0x70
[  +0.000120]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000112] RIP: 0033:0x7fbe9cdbb67b
[  +0.000110] Code: Bad RIP value.
[  +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fbe9cdbb67b
[  +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI:
0000000000000003
[  +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[  +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffd61ff7219
[  +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15:
000055e09c1854a0
[  +0.000115] Kernel panic - not syncing: hung_task: blocked tasks

You could easily reproduce this by running below, parallelly, for 10min:
 while [ 1 ]; do  nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
 while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
 while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
 
 Not sure using nvme-write this way is valid or not..
 
 Thanks,
 Krishna.
On Tuesday, July 07/28/20, 2020 at 08:54:18 -0700, Sagi Grimberg wrote:
> 
> 
> On 7/28/20 4:59 AM, Krishnamraju Eraparaju wrote:
> >Sagi,
> >With the given patch, I am no more seeing the freeze_queue_wait hang
> >issue, but I am seeing another hang issue:
> 
> The trace suggest that you are not running with multipath right?
> 
> I think you need the patch:
> [PATCH] nvme-fabrics: allow to queue requests for live queues
> 
> You can find it in linux-nvme

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

WARNING: multiple messages have this Message-ID (diff)
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	bharat@chelsio.com
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 23:12:27 +0530	[thread overview]
Message-ID: <20200728174224.GA5497@chelsio.com> (raw)
In-Reply-To: <4d87ffbb-24a2-9342-4507-cabd9e3b76c2@grimberg.me>



Sagi,

Yes, Multipath is disabled.
This time, with "nvme-fabrics: allow to queue requests for live queues"
patch applied, I see hang only at blk_queue_enter():
[Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds.
[  +0.000061]       Not tainted 5.8.0-rc7ekr+ #2
[  +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  +0.000059] nvme            D14392 21119   2456 0x00004000
[  +0.000059] Call Trace:
[  +0.000110]  __schedule+0x32b/0x670
[  +0.000108]  schedule+0x45/0xb0
[  +0.000107]  blk_queue_enter+0x1e9/0x250
[  +0.000109]  ? wait_woken+0x70/0x70
[  +0.000110]  blk_mq_alloc_request+0x53/0xc0
[  +0.000111]  nvme_alloc_request+0x61/0x70 [nvme_core]
[  +0.000121]  nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[  +0.000118]  nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[  +0.000163]  ? _copy_to_user+0x22/0x30
[  +0.000113]  blkdev_ioctl+0x100/0x250
[  +0.000115]  block_ioctl+0x34/0x40
[  +0.000110]  ksys_ioctl+0x82/0xc0
[  +0.000109]  __x64_sys_ioctl+0x11/0x20
[  +0.000109]  do_syscall_64+0x3e/0x70
[  +0.000120]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000112] RIP: 0033:0x7fbe9cdbb67b
[  +0.000110] Code: Bad RIP value.
[  +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fbe9cdbb67b
[  +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI:
0000000000000003
[  +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[  +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffd61ff7219
[  +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15:
000055e09c1854a0
[  +0.000115] Kernel panic - not syncing: hung_task: blocked tasks

You could easily reproduce this by running below, parallelly, for 10min:
 while [ 1 ]; do  nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
 while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
 while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
 
 Not sure using nvme-write this way is valid or not..
 
 Thanks,
 Krishna.
On Tuesday, July 07/28/20, 2020 at 08:54:18 -0700, Sagi Grimberg wrote:
> 
> 
> On 7/28/20 4:59 AM, Krishnamraju Eraparaju wrote:
> >Sagi,
> >With the given patch, I am no more seeing the freeze_queue_wait hang
> >issue, but I am seeing another hang issue:
> 
> The trace suggest that you are not running with multipath right?
> 
> I think you need the patch:
> [PATCH] nvme-fabrics: allow to queue requests for live queues
> 
> You can find it in linux-nvme

  reply	other threads:[~2020-07-28 17:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 18:19 Hang at NVME Host caused by Controller reset Krishnamraju Eraparaju
2020-07-27 18:19 ` Krishnamraju Eraparaju
2020-07-27 18:47 ` Sagi Grimberg
2020-07-27 18:47   ` Sagi Grimberg
2020-07-28 11:59   ` Krishnamraju Eraparaju
2020-07-28 11:59     ` Krishnamraju Eraparaju
2020-07-28 15:54     ` Sagi Grimberg
2020-07-28 15:54       ` Sagi Grimberg
2020-07-28 17:42       ` Krishnamraju Eraparaju [this message]
2020-07-28 17:42         ` Krishnamraju Eraparaju
2020-07-28 18:35         ` Sagi Grimberg
2020-07-28 18:35           ` Sagi Grimberg
2020-07-28 20:20           ` Sagi Grimberg
2020-07-28 20:20             ` Sagi Grimberg
2020-07-29  8:57             ` Krishnamraju Eraparaju
2020-07-29  8:57               ` Krishnamraju Eraparaju
2020-07-29  9:28               ` Sagi Grimberg
     [not found]                 ` <20200730162056.GA17468@chelsio.com>
2020-07-30 20:59                   ` Sagi Grimberg
2020-07-30 21:32                     ` Krishnamraju Eraparaju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728174224.GA5497@chelsio.com \
    --to=krishna2@chelsio.com \
    --cc=bharat@chelsio.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.