public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* nvme-tcp request timeouts
@ 2022-10-11 15:31 Seth Forshee
  2022-10-11 19:30 ` Chaitanya Kulkarni
  0 siblings, 1 reply; 10+ messages in thread
From: Seth Forshee @ 2022-10-11 15:31 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni; +Cc: linux-nvme

Hi,

I'm seeing timeouts like the following from nvme-tcp:

[ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
[ 6369.513283] nvme nvme5: starting error recovery
[ 6369.514379] block nvme5n1: no usable path - requeuing I/O
[ 6369.514385] block nvme5n1: no usable path - requeuing I/O
[ 6369.514392] block nvme5n1: no usable path - requeuing I/O
[ 6369.514393] block nvme5n1: no usable path - requeuing I/O
[ 6369.514401] block nvme5n1: no usable path - requeuing I/O
[ 6369.514414] block nvme5n1: no usable path - requeuing I/O
[ 6369.514420] block nvme5n1: no usable path - requeuing I/O
[ 6369.514427] block nvme5n1: no usable path - requeuing I/O
[ 6369.514430] block nvme5n1: no usable path - requeuing I/O
[ 6369.514432] block nvme5n1: no usable path - requeuing I/O
[ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
[ 6379.761015] nvme nvme5: creating 128 I/O queues.
[ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
[ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)

This is with 6.0, using nvmet-tcp on a different machine as the target.
I've seen this sporadically with several test cases. The fio fio-rand-RW
example test is a pretty good reproducer when numjobs in increased (I'm
setting it equal to the number of CPUs in the system).

Let me know what I can do to help debug this. I'm currently adding some
tracing to the driver to see if I can get an idea of the sequence of
events that leads to this problem.

Thanks,
Seth



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 15:31 nvme-tcp request timeouts Seth Forshee
@ 2022-10-11 19:30 ` Chaitanya Kulkarni
  2022-10-11 20:14   ` Seth Forshee
  0 siblings, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-11 19:30 UTC (permalink / raw)
  To: Seth Forshee
  Cc: linux-nvme@lists.infradead.org, Sagi Grimberg, Chaitanya Kulkarni,
	Christoph Hellwig

Hi Seth,

On 10/11/22 08:31, Seth Forshee wrote:
> Hi,
> 
> I'm seeing timeouts like the following from nvme-tcp:
> 
> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> [ 6369.513283] nvme nvme5: starting error recovery
> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> 
> This is with 6.0, using nvmet-tcp on a different machine as the target.
> I've seen this sporadically with several test cases. The fio fio-rand-RW
> example test is a pretty good reproducer when numjobs in increased (I'm
> setting it equal to the number of CPUs in the system).
> 
> Let me know what I can do to help debug this. I'm currently adding some
> tracing to the driver to see if I can get an idea of the sequence of
> events that leads to this problem.
> 
> Thanks,
> Seth
> 

Can you bisect it ? that will help to understand the commit causing
issue.

-ck


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 19:30 ` Chaitanya Kulkarni
@ 2022-10-11 20:14   ` Seth Forshee
  2022-10-11 20:19     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 10+ messages in thread
From: Seth Forshee @ 2022-10-11 20:14 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme@lists.infradead.org, Sagi Grimberg, Christoph Hellwig

On Tue, Oct 11, 2022 at 07:30:56PM +0000, Chaitanya Kulkarni wrote:
> Hi Seth,
> 
> On 10/11/22 08:31, Seth Forshee wrote:
> > Hi,
> > 
> > I'm seeing timeouts like the following from nvme-tcp:
> > 
> > [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> > [ 6369.513283] nvme nvme5: starting error recovery
> > [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> > [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> > [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> > [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> > [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> > 
> > This is with 6.0, using nvmet-tcp on a different machine as the target.
> > I've seen this sporadically with several test cases. The fio fio-rand-RW
> > example test is a pretty good reproducer when numjobs in increased (I'm
> > setting it equal to the number of CPUs in the system).
> > 
> > Let me know what I can do to help debug this. I'm currently adding some
> > tracing to the driver to see if I can get an idea of the sequence of
> > events that leads to this problem.
> > 
> > Thanks,
> > Seth
> > 
> 
> Can you bisect it ? that will help to understand the commit causing
> issue.

I don't know of any "good" version right now. I started with a 5.10
kernel and saw this, and tested 6.0 and still see it. I found several
commits since 5.10 which fix some kind of timeouts:

a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
70f437fb4395 nvme-tcp: fix io_work priority inversion
3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out

5.10 still has timeouts with these backported, so whatever the problem
is it has existed at least that long. I suppose I could go back to older
kernels with these backported if that's going to be the best path
forward here.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 20:14   ` Seth Forshee
@ 2022-10-11 20:19     ` Chaitanya Kulkarni
  2022-10-11 20:37       ` Seth Forshee
  0 siblings, 1 reply; 10+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-11 20:19 UTC (permalink / raw)
  To: Seth Forshee
  Cc: linux-nvme@lists.infradead.org, Sagi Grimberg, Christoph Hellwig

On 10/11/22 13:14, Seth Forshee wrote:
> On Tue, Oct 11, 2022 at 07:30:56PM +0000, Chaitanya Kulkarni wrote:
>> Hi Seth,
>>
>> On 10/11/22 08:31, Seth Forshee wrote:
>>> Hi,
>>>
>>> I'm seeing timeouts like the following from nvme-tcp:
>>>
>>> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
>>> [ 6369.513283] nvme nvme5: starting error recovery
>>> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
>>> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
>>> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
>>> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
>>> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
>>>
>>> This is with 6.0, using nvmet-tcp on a different machine as the target.
>>> I've seen this sporadically with several test cases. The fio fio-rand-RW
>>> example test is a pretty good reproducer when numjobs in increased (I'm
>>> setting it equal to the number of CPUs in the system).
>>>
>>> Let me know what I can do to help debug this. I'm currently adding some
>>> tracing to the driver to see if I can get an idea of the sequence of
>>> events that leads to this problem.
>>>
>>> Thanks,
>>> Seth
>>>
>>
>> Can you bisect it ? that will help to understand the commit causing
>> issue.
> 
> I don't know of any "good" version right now. I started with a 5.10
> kernel and saw this, and tested 6.0 and still see it. I found several
> commits since 5.10 which fix some kind of timeouts:
> 
> a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
> 70f437fb4395 nvme-tcp: fix io_work priority inversion
> 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
> 
> 5.10 still has timeouts with these backported, so whatever the problem
> is it has existed at least that long. I suppose I could go back to older
> kernels with these backported if that's going to be the best path
> forward here.
> 
> Thanks,
> Seth

Can you please share the fio config you are using ?

-ck


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 20:19     ` Chaitanya Kulkarni
@ 2022-10-11 20:37       ` Seth Forshee
  2022-10-12  6:33         ` Sagi Grimberg
  2022-10-12  7:51         ` Sagi Grimberg
  0 siblings, 2 replies; 10+ messages in thread
From: Seth Forshee @ 2022-10-11 20:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme@lists.infradead.org, Sagi Grimberg, Christoph Hellwig

On Tue, Oct 11, 2022 at 08:19:47PM +0000, Chaitanya Kulkarni wrote:
> On 10/11/22 13:14, Seth Forshee wrote:
> > On Tue, Oct 11, 2022 at 07:30:56PM +0000, Chaitanya Kulkarni wrote:
> >> Hi Seth,
> >>
> >> On 10/11/22 08:31, Seth Forshee wrote:
> >>> Hi,
> >>>
> >>> I'm seeing timeouts like the following from nvme-tcp:
> >>>
> >>> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> >>> [ 6369.513283] nvme nvme5: starting error recovery
> >>> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> >>> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> >>> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> >>> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> >>> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> >>>
> >>> This is with 6.0, using nvmet-tcp on a different machine as the target.
> >>> I've seen this sporadically with several test cases. The fio fio-rand-RW
> >>> example test is a pretty good reproducer when numjobs in increased (I'm
> >>> setting it equal to the number of CPUs in the system).
> >>>
> >>> Let me know what I can do to help debug this. I'm currently adding some
> >>> tracing to the driver to see if I can get an idea of the sequence of
> >>> events that leads to this problem.
> >>>
> >>> Thanks,
> >>> Seth
> >>>
> >>
> >> Can you bisect it ? that will help to understand the commit causing
> >> issue.
> > 
> > I don't know of any "good" version right now. I started with a 5.10
> > kernel and saw this, and tested 6.0 and still see it. I found several
> > commits since 5.10 which fix some kind of timeouts:
> > 
> > a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
> > 70f437fb4395 nvme-tcp: fix io_work priority inversion
> > 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
> > 
> > 5.10 still has timeouts with these backported, so whatever the problem
> > is it has existed at least that long. I suppose I could go back to older
> > kernels with these backported if that's going to be the best path
> > forward here.
> > 
> > Thanks,
> > Seth
> 
> Can you please share the fio config you are using ?

Sure. Note that I can reproduce it with a lower number of numjobs, but
higher numbers make it easier, so I set it to the number of CPUs present
on the system I'm using to test.


[global]
name=fio-rand-RW
filename=fio-rand-RW
rw=randrw
rwmixread=60
rwmixwrite=40
bs=4K
direct=0
numjobs=128
time_based
runtime=900

[file1]
size=10G
ioengine=libaio
iodepth=16



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 20:37       ` Seth Forshee
@ 2022-10-12  6:33         ` Sagi Grimberg
  2022-10-12 16:55           ` Seth Forshee
  2022-10-12  7:51         ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2022-10-12  6:33 UTC (permalink / raw)
  To: Seth Forshee, Chaitanya Kulkarni
  Cc: linux-nvme@lists.infradead.org, Christoph Hellwig

Hey Seth, thanks for reporting.

>>>> Hi Seth,
>>>>
>>>> On 10/11/22 08:31, Seth Forshee wrote:
>>>>> Hi,
>>>>>
>>>>> I'm seeing timeouts like the following from nvme-tcp:
>>>>>
>>>>> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
>>>>> [ 6369.513283] nvme nvme5: starting error recovery
>>>>> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
>>>>> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
>>>>> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
>>>>> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
>>>>> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
>>>>>
>>>>> This is with 6.0, using nvmet-tcp on a different machine as the target.
>>>>> I've seen this sporadically with several test cases. The fio fio-rand-RW
>>>>> example test is a pretty good reproducer when numjobs in increased (I'm
>>>>> setting it equal to the number of CPUs in the system).
>>>>>
>>>>> Let me know what I can do to help debug this. I'm currently adding some
>>>>> tracing to the driver to see if I can get an idea of the sequence of
>>>>> events that leads to this problem.
>>>>>
>>>>> Thanks,
>>>>> Seth
>>>>>
>>>>
>>>> Can you bisect it ? that will help to understand the commit causing
>>>> issue.
>>>
>>> I don't know of any "good" version right now. I started with a 5.10
>>> kernel and saw this, and tested 6.0 and still see it. I found several
>>> commits since 5.10 which fix some kind of timeouts:
>>>
>>> a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
>>> 70f437fb4395 nvme-tcp: fix io_work priority inversion
>>> 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
>>>
>>> 5.10 still has timeouts with these backported, so whatever the problem
>>> is it has existed at least that long. I suppose I could go back to older
>>> kernels with these backported if that's going to be the best path
>>> forward here.
>>>
>>> Thanks,
>>> Seth
>>
>> Can you please share the fio config you are using ?
> 
> Sure. Note that I can reproduce it with a lower number of numjobs, but
> higher numbers make it easier, so I set it to the number of CPUs present
> on the system I'm using to test.
> 
> 
> [global]
> name=fio-rand-RW
> filename=fio-rand-RW
> rw=randrw
> rwmixread=60
> rwmixwrite=40
> bs=4K
> direct=0

Does this happen with direct=1?

> numjobs=128
> time_based
> runtime=900
> 
> [file1]
> size=10G
> ioengine=libaio
> iodepth=16
> 

Is it possible that the backend nvmet device is not fast enough?
Are your backend devices nvme? Can you share nvmetcli ls output?

Also, just to understand if things are slow or stuck, perhaps you
can artificially increase the io_timeout to say 60s or 120s.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-11 20:37       ` Seth Forshee
  2022-10-12  6:33         ` Sagi Grimberg
@ 2022-10-12  7:51         ` Sagi Grimberg
  1 sibling, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2022-10-12  7:51 UTC (permalink / raw)
  To: Seth Forshee, Chaitanya Kulkarni
  Cc: linux-nvme@lists.infradead.org, Christoph Hellwig


>>>>> Can you bisect it ? that will help to understand the commit causing
>>>>> issue.
>>>>
>>>> I don't know of any "good" version right now. I started with a 5.10
>>>> kernel and saw this, and tested 6.0 and still see it. I found several
>>>> commits since 5.10 which fix some kind of timeouts:
>>>>
>>>> a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
>>>> 70f437fb4395 nvme-tcp: fix io_work priority inversion
>>>> 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests 
>>>> to time out

btw Seth,

I've just looked at stable v5.10.147 and all of these fixes were taken
to 5.10 stable branch.

If you happen to use a 5.10 stable kernel (or a modified version of it),
let us know if you spot any missing fixes so we'll make sure they end up
there as well.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-12  6:33         ` Sagi Grimberg
@ 2022-10-12 16:55           ` Seth Forshee
  2022-10-12 17:30             ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Seth Forshee @ 2022-10-12 16:55 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org,
	Christoph Hellwig

On Wed, Oct 12, 2022 at 09:33:19AM +0300, Sagi Grimberg wrote:
> Hey Seth, thanks for reporting.
> 
> > > > > Hi Seth,
> > > > > 
> > > > > On 10/11/22 08:31, Seth Forshee wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I'm seeing timeouts like the following from nvme-tcp:
> > > > > > 
> > > > > > [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
> > > > > > [ 6369.513283] nvme nvme5: starting error recovery
> > > > > > [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
> > > > > > [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
> > > > > > [ 6379.761015] nvme nvme5: creating 128 I/O queues.
> > > > > > [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
> > > > > > [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
> > > > > > 
> > > > > > This is with 6.0, using nvmet-tcp on a different machine as the target.
> > > > > > I've seen this sporadically with several test cases. The fio fio-rand-RW
> > > > > > example test is a pretty good reproducer when numjobs in increased (I'm
> > > > > > setting it equal to the number of CPUs in the system).
> > > > > > 
> > > > > > Let me know what I can do to help debug this. I'm currently adding some
> > > > > > tracing to the driver to see if I can get an idea of the sequence of
> > > > > > events that leads to this problem.
> > > > > > 
> > > > > > Thanks,
> > > > > > Seth
> > > > > > 
> > > > > 
> > > > > Can you bisect it ? that will help to understand the commit causing
> > > > > issue.
> > > > 
> > > > I don't know of any "good" version right now. I started with a 5.10
> > > > kernel and saw this, and tested 6.0 and still see it. I found several
> > > > commits since 5.10 which fix some kind of timeouts:
> > > > 
> > > > a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
> > > > 70f437fb4395 nvme-tcp: fix io_work priority inversion
> > > > 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
> > > > 
> > > > 5.10 still has timeouts with these backported, so whatever the problem
> > > > is it has existed at least that long. I suppose I could go back to older
> > > > kernels with these backported if that's going to be the best path
> > > > forward here.
> > > > 
> > > > Thanks,
> > > > Seth
> > > 
> > > Can you please share the fio config you are using ?
> > 
> > Sure. Note that I can reproduce it with a lower number of numjobs, but
> > higher numbers make it easier, so I set it to the number of CPUs present
> > on the system I'm using to test.
> > 
> > 
> > [global]
> > name=fio-rand-RW
> > filename=fio-rand-RW
> > rw=randrw
> > rwmixread=60
> > rwmixwrite=40
> > bs=4K
> > direct=0
> 
> Does this happen with direct=1?

So far I haven't been able to reproduce it so far with direct=1.

> > numjobs=128
> > time_based
> > runtime=900
> > 
> > [file1]
> > size=10G
> > ioengine=libaio
> > iodepth=16
> > 
> 
> Is it possible that the backend nvmet device is not fast enough?
> Are your backend devices nvme? Can you share nvmetcli ls output?

Currently I have a quick-and-dirty setup. The systems I have to test
with don't have a spare nvme device right now but do have a lot of RAM,
so the backend is a loop device backed by a file in tmpfs. But it should
be plenty fast. I'm working on getting a setup with an nvme backend
device.

o- / ......................................................................................................................... [...]
  o- hosts ................................................................................................................... [...]
  | o- hostnqn ............................................................................................................... [...]
  o- ports ................................................................................................................... [...]
  | o- 2 ................................................... [trtype=tcp, traddr=..., trsvcid=4420, inline_data_size=16384]
  |   o- ana_groups .......................................................................................................... [...]
  |   | o- 1 ..................................................................................................... [state=optimized]
  |   o- referrals ........................................................................................................... [...]
  |   o- subsystems .......................................................................................................... [...]
  |     o- testnqn ........................................................................................................... [...]
  o- subsystems .............................................................................................................. [...]
    o- testnqn ............................................................. [version=1.3, allow_any=1, serial=2c2e39e2a551f7febf33]
      o- allowed_hosts ....................................................................................................... [...]
      o- namespaces .......................................................................................................... [...]
        o- 1  [path=/dev/loop0, uuid=8a1561fb-82c3-4e9d-96b9-11c7b590d047, nguid=ef90689c-6c46-d44c-89c1-4067801309a8, grpid=1, enabled]

> Also, just to understand if things are slow or stuck, perhaps you
> can artificially increase the io_timeout to say 60s or 120s.

I could still reproduce it with io_timeout set to 120s.

Thanks,
Seth


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-12 16:55           ` Seth Forshee
@ 2022-10-12 17:30             ` Sagi Grimberg
  2022-10-13  4:57               ` Seth Forshee
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2022-10-12 17:30 UTC (permalink / raw)
  To: Seth Forshee
  Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org,
	Christoph Hellwig



On 10/12/22 19:55, Seth Forshee wrote:
> On Wed, Oct 12, 2022 at 09:33:19AM +0300, Sagi Grimberg wrote:
>> Hey Seth, thanks for reporting.
>>
>>>>>> Hi Seth,
>>>>>>
>>>>>> On 10/11/22 08:31, Seth Forshee wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm seeing timeouts like the following from nvme-tcp:
>>>>>>>
>>>>>>> [ 6369.513269] nvme nvme5: queue 102: timeout request 0x73 type 4
>>>>>>> [ 6369.513283] nvme nvme5: starting error recovery
>>>>>>> [ 6369.514379] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514385] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514392] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514393] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514401] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514414] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514420] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514427] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514430] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514432] block nvme5n1: no usable path - requeuing I/O
>>>>>>> [ 6369.514926] nvme nvme5: Reconnecting in 10 seconds...
>>>>>>> [ 6379.761015] nvme nvme5: creating 128 I/O queues.
>>>>>>> [ 6379.944389] nvme nvme5: mapped 128/0/0 default/read/poll queues.
>>>>>>> [ 6379.947922] nvme nvme5: Successfully reconnected (1 attempt)
>>>>>>>
>>>>>>> This is with 6.0, using nvmet-tcp on a different machine as the target.
>>>>>>> I've seen this sporadically with several test cases. The fio fio-rand-RW
>>>>>>> example test is a pretty good reproducer when numjobs in increased (I'm
>>>>>>> setting it equal to the number of CPUs in the system).
>>>>>>>
>>>>>>> Let me know what I can do to help debug this. I'm currently adding some
>>>>>>> tracing to the driver to see if I can get an idea of the sequence of
>>>>>>> events that leads to this problem.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Seth
>>>>>>>
>>>>>>
>>>>>> Can you bisect it ? that will help to understand the commit causing
>>>>>> issue.
>>>>>
>>>>> I don't know of any "good" version right now. I started with a 5.10
>>>>> kernel and saw this, and tested 6.0 and still see it. I found several
>>>>> commits since 5.10 which fix some kind of timeouts:
>>>>>
>>>>> a0fdd1418007 nvme-tcp: rerun io_work if req_list is not empty
>>>>> 70f437fb4395 nvme-tcp: fix io_work priority inversion
>>>>> 3770a42bb8ce nvme-tcp: fix regression that causes sporadic requests to time out
>>>>>
>>>>> 5.10 still has timeouts with these backported, so whatever the problem
>>>>> is it has existed at least that long. I suppose I could go back to older
>>>>> kernels with these backported if that's going to be the best path
>>>>> forward here.
>>>>>
>>>>> Thanks,
>>>>> Seth
>>>>
>>>> Can you please share the fio config you are using ?
>>>
>>> Sure. Note that I can reproduce it with a lower number of numjobs, but
>>> higher numbers make it easier, so I set it to the number of CPUs present
>>> on the system I'm using to test.
>>>
>>>
>>> [global]
>>> name=fio-rand-RW
>>> filename=fio-rand-RW
>>> rw=randrw
>>> rwmixread=60
>>> rwmixwrite=40
>>> bs=4K
>>> direct=0
>>
>> Does this happen with direct=1?
> 
> So far I haven't been able to reproduce it so far with direct=1.

It is probably because the ios are actually going through as 4k and
blk-mq tags depth is throttling you. on direct=0 writes are batched and
go down to the disk in larger blocks, and in bursts, which probably is
creating longer latencies...

> 
>>> numjobs=128
>>> time_based
>>> runtime=900
>>>
>>> [file1]
>>> size=10G
>>> ioengine=libaio
>>> iodepth=16
>>>
>>
>> Is it possible that the backend nvmet device is not fast enough?
>> Are your backend devices nvme? Can you share nvmetcli ls output?
> 
> Currently I have a quick-and-dirty setup. The systems I have to test
> with don't have a spare nvme device right now but do have a lot of RAM,
> so the backend is a loop device backed by a file in tmpfs. But it should
> be plenty fast. I'm working on getting a setup with an nvme backend
> device.

I see, note that ram drives are fast, but stall the cpu for memory copy,
which will prevent nvme io threads from servicing new requests, this can
be magnified with large heavy bursts...

> 
> o- / ......................................................................................................................... [...]
>    o- hosts ................................................................................................................... [...]
>    | o- hostnqn ............................................................................................................... [...]
>    o- ports ................................................................................................................... [...]
>    | o- 2 ................................................... [trtype=tcp, traddr=..., trsvcid=4420, inline_data_size=16384]
>    |   o- ana_groups .......................................................................................................... [...]
>    |   | o- 1 ..................................................................................................... [state=optimized]
>    |   o- referrals ........................................................................................................... [...]
>    |   o- subsystems .......................................................................................................... [...]
>    |     o- testnqn ........................................................................................................... [...]
>    o- subsystems .............................................................................................................. [...]
>      o- testnqn ............................................................. [version=1.3, allow_any=1, serial=2c2e39e2a551f7febf33]
>        o- allowed_hosts ....................................................................................................... [...]
>        o- namespaces .......................................................................................................... [...]
>          o- 1  [path=/dev/loop0, uuid=8a1561fb-82c3-4e9d-96b9-11c7b590d047, nguid=ef90689c-6c46-d44c-89c1-4067801309a8, grpid=1, enabled]

Ohh, I'd say that would be the culprit...
the loop driver uses only a single queue to access the disk. This means 
that all your 100+ nvme-tcp queues are all serializing access on the 
single loop disk queue. This heavy back-pressure bubbles all the way
back to the host and manifests in IO timeouts when large bursts hit...

I can say that loop is not the best way to benchmark performance, and
I'd expect to see such phenomenons when attempting to drive high loads
to a loop device...

Maybe you can possibly use a tmpfs file directly instead (nvmet supports
file backends as well).

Or maybe you can try to use null_blk with memory_backed=Y modparam (may 
need to define cache_size modparam as well, never tried it with memory
backing...)? That would be more efficient.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: nvme-tcp request timeouts
  2022-10-12 17:30             ` Sagi Grimberg
@ 2022-10-13  4:57               ` Seth Forshee
  0 siblings, 0 replies; 10+ messages in thread
From: Seth Forshee @ 2022-10-13  4:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org,
	Christoph Hellwig

On Wed, Oct 12, 2022 at 08:30:18PM +0300, Sagi Grimberg wrote:
> > o- / ......................................................................................................................... [...]
> >    o- hosts ................................................................................................................... [...]
> >    | o- hostnqn ............................................................................................................... [...]
> >    o- ports ................................................................................................................... [...]
> >    | o- 2 ................................................... [trtype=tcp, traddr=..., trsvcid=4420, inline_data_size=16384]
> >    |   o- ana_groups .......................................................................................................... [...]
> >    |   | o- 1 ..................................................................................................... [state=optimized]
> >    |   o- referrals ........................................................................................................... [...]
> >    |   o- subsystems .......................................................................................................... [...]
> >    |     o- testnqn ........................................................................................................... [...]
> >    o- subsystems .............................................................................................................. [...]
> >      o- testnqn ............................................................. [version=1.3, allow_any=1, serial=2c2e39e2a551f7febf33]
> >        o- allowed_hosts ....................................................................................................... [...]
> >        o- namespaces .......................................................................................................... [...]
> >          o- 1  [path=/dev/loop0, uuid=8a1561fb-82c3-4e9d-96b9-11c7b590d047, nguid=ef90689c-6c46-d44c-89c1-4067801309a8, grpid=1, enabled]
> 
> Ohh, I'd say that would be the culprit...
> the loop driver uses only a single queue to access the disk. This means that
> all your 100+ nvme-tcp queues are all serializing access on the single loop
> disk queue. This heavy back-pressure bubbles all the way
> back to the host and manifests in IO timeouts when large bursts hit...
> 
> I can say that loop is not the best way to benchmark performance, and
> I'd expect to see such phenomenons when attempting to drive high loads
> to a loop device...

The goal wasn't to benchmark performance with this setup, just to start
getting familiar.

> Maybe you can possibly use a tmpfs file directly instead (nvmet supports
> file backends as well).
> 
> Or maybe you can try to use null_blk with memory_backed=Y modparam (may need
> to define cache_size modparam as well, never tried it with memory
> backing...)? That would be more efficient.

I've got this set up now with an nvme drive as the backend for the
target, and as you predicted the timeouts went away. So it does seem the
problem was with using a loop device. Thanks for the help!

Seth


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-13  4:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-11 15:31 nvme-tcp request timeouts Seth Forshee
2022-10-11 19:30 ` Chaitanya Kulkarni
2022-10-11 20:14   ` Seth Forshee
2022-10-11 20:19     ` Chaitanya Kulkarni
2022-10-11 20:37       ` Seth Forshee
2022-10-12  6:33         ` Sagi Grimberg
2022-10-12 16:55           ` Seth Forshee
2022-10-12 17:30             ` Sagi Grimberg
2022-10-13  4:57               ` Seth Forshee
2022-10-12  7:51         ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox