public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Seth Forshee <sforshee@kernel.org>
To: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Subject: Re: nvme-tcp request timeouts
Date: Tue, 11 Oct 2022 15:37:40 -0500	[thread overview]
Message-ID: <Y0XUFGXHciDCXRTs@do-x1extreme> (raw)
In-Reply-To: <f72fa6ba-e55a-6bb6-2efe-60458802c870@nvidia.com>

On Tue, Oct 11, 2022 at 08:19:47PM +0000, Chaitanya Kulkarni wrote:
> On 10/11/22 13:14, Seth Forshee wrote:
> > On Tue, Oct 11, 2022 at 07:30:56PM +0000, Chaitanya Kulkarni wrote:
> >> Hi Seth,
> >>
> >> On 10/11/22 08:31, Seth Forshee wrote:
> >>> Hi,
> >>>
> >>> I'm seeing timeouts like the following from nvme-tcp:
> >>>
> >>> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> >>> [ 6369.513283] nvme nvme5: starting error recovery
> >>> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> >>> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> >>> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> >>> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> >>>
> >>> This is with 6.0, using nvmet-tcp on a different machine as the target.
> >>> I've seen this sporadically with several test cases. The fio fio-rand-RW
> >>> example test is a pretty good reproducer when numjobs in increased (I'm
> >>> setting it equal to the number of CPUs in the system).
> >>>
> >>> Let me know what I can do to help debug this. I'm currently adding some
> >>> tracing to the driver to see if I can get an idea of the sequence of
> >>> events that leads to this problem.
> >>>
> >>> Thanks,
> >>> Seth
> >>>
> >>
> >> Can you bisect it ? that will help to understand the commit causing
> >> issue.
> > 
> > I don't know of any "good" version right now. I started with a 5.10
> > kernel and saw this, and tested 6.0 and still see it. I found several
> > commits since 5.10 which fix some kind of timeouts:
> > 
> > a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
> > 70f437fb4395 nvme-tcp: fix io_work priority inversion
> > 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
> > 
> > 5.10 still has timeouts with these backported, so whatever the problem
> > is it has existed at least that long. I suppose I could go back to older
> > kernels with these backported if that's going to be the best path
> > forward here.
> > 
> > Thanks,
> > Seth
> 
> Can you please share the fio config you are using ?

Sure. Note that I can reproduce it with a lower number of numjobs, but
higher numbers make it easier, so I set it to the number of CPUs present
on the system I'm using to test.


[global]
name=fio-rand-RW
filename=fio-rand-RW
rw=randrw
rwmixread=60
rwmixwrite=40
bs=4K
direct=0
numjobs=128
time_based
runtime=900

[file1]
size=10G
ioengine=libaio
iodepth=16



  reply	other threads:[~2022-10-11 20:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-11 15:31 nvme-tcp request timeouts Seth Forshee
2022-10-11 19:30 ` Chaitanya Kulkarni
2022-10-11 20:14   ` Seth Forshee
2022-10-11 20:19     ` Chaitanya Kulkarni
2022-10-11 20:37       ` Seth Forshee [this message]
2022-10-12  6:33         ` Sagi Grimberg
2022-10-12 16:55           ` Seth Forshee
2022-10-12 17:30             ` Sagi Grimberg
2022-10-13  4:57               ` Seth Forshee
2022-10-12  7:51         ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0XUFGXHciDCXRTs@do-x1extreme \
    --to=sforshee@kernel.org \
    --cc=chaitanyak@nvidia.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox