linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Potnuri Bharat Teja <bharat@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Samuel Jones <sjones@kalrayinc.com>, "hch@lst.de" <hch@lst.de>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: Request timeout seen with NVMEoF TCP
Date: Mon, 14 Dec 2020 23:35:31 +0530	[thread overview]
Message-ID: <X9epa3PPORdBKZ71@chelsio.com> (raw)
In-Reply-To: <dbb14893-da35-2f7e-0583-c33ce9f41243@grimberg.me>

On Monday, December 12/14/20, 2020 at 09:51:28 -0800, Sagi Grimberg wrote:
> 
> > Hi Sagi,
> > With above patch I still see the issue but less frequently. Without patch I was
> > able to consistently reproduce the timouts with 4 target devices. With patch I
> > see IO running fine for 4 targets. Tried the same test with 8 target devices
> > and I see the below timeout. I've observed only one instance of timeout. So,
> > I'll let it run for somemore time or rerun and update.
> 
> Hey Potnuri,
> 
> Have you observed this further?
> 
> I'd think that if the io_work reschedule itself when it races
> with the direct send path this should not happen, but we may be
> seeing a different race going on here, adding Samuel who saw
> a similar phenomenon.
> 

Yes, I have seen it further too but very intermittently. Below is the dmesg that
I saved before my test got stopped for some other reason. From the timestamps, 
timeouts were around ~20min, ~45min and ~3 hours apart.

dmesg on intiator:
---
[ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mod                                                                                e. Opts: (null)
[ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
[ 2111.991835] nvme nvme5: starting error recovery
[ 2111.998796] block nvme5n1: no usable path - requeuing I/O
[ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
[ 2122.253431] block nvme5n1: no usable path - requeuing I/O
[ 2122.254732] nvme nvme5: creating 16 I/O queues.
[ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
[ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
[ 2522.795467] perf: interrupt took too long (2503 > 2500), lowering kerne                                                                                l.perf_event_max_sample_rate to 79000
[ 3043.544758] perf: interrupt took too long (3135 > 3128), lowering kerne                                                                                l.perf_event_max_sample_rate to 63000
[ 4197.260041] perf: interrupt took too long (3922 > 3918), lowering kerne                                                                                l.perf_event_max_sample_rate to 50000
[ 4848.175367] nvme nvme1: queue 6: timeout request 0x42 type 4
[ 4848.176086] nvme nvme1: starting error recovery
[ 4848.176864] nvme nvme1: queue 6: timeout request 0x43 type 4
[ 4848.178194] nvme nvme1: queue 6: timeout request 0x44 type 4
[ 4848.179873] block nvme1n1: no usable path - requeuing I/O
[ 4848.180665] block nvme1n1: no usable path - requeuing I/O
[ 4848.180775] nvme nvme1: Reconnecting in 10 seconds...
[ 4848.181372] block nvme1n1: no usable path - requeuing I/O
[ 4848.183022] block nvme1n1: no usable path - requeuing I/O
[ 4848.183782] block nvme1n1: no usable path - requeuing I/O
[ 4858.436889] block nvme1n1: no usable path - requeuing I/O
[ 4858.437558] block nvme1n1: no usable path - requeuing I/O
[ 4858.438168] block nvme1n1: no usable path - requeuing I/O
[ 4858.439547] nvme nvme1: creating 16 I/O queues.
[ 4858.489650] nvme nvme1: mapped 16/0/0 default/read/poll queues.
[ 4858.490357] debugfs: Directory 'hctx0' with parent '/' already present!
[ 4858.491256] debugfs: Directory 'hctx1' with parent '/' already present!
[ 4858.492790] debugfs: Directory 'hctx2' with parent '/' already present!
[ 4858.493601] debugfs: Directory 'hctx3' with parent '/' already present!
[ 4858.494362] debugfs: Directory 'hctx4' with parent '/' already present!
[ 4858.495310] debugfs: Directory 'hctx5' with parent '/' already present!
[ 4858.496261] debugfs: Directory 'hctx6' with parent '/' already present!
[ 4858.497098] debugfs: Directory 'hctx7' with parent '/' already present!
[ 4858.497972] debugfs: Directory 'hctx8' with parent '/' already present!
[ 4858.498806] debugfs: Directory 'hctx9' with parent '/' already present!
[ 4858.499574] debugfs: Directory 'hctx10' with parent '/' already present                                                                                !
[ 4858.500268] debugfs: Directory 'hctx11' with parent '/' already present                                                                                !
[ 4858.501030] debugfs: Directory 'hctx12' with parent '/' already present                                                                                !
[ 4858.501807] debugfs: Directory 'hctx13' with parent '/' already present                                                                                !
[ 4858.502459] debugfs: Directory 'hctx14' with parent '/' already present                                                                                !
[ 4858.503196] debugfs: Directory 'hctx15' with parent '/' already present                                                                                !
[ 4858.510075] nvme nvme1: Successfully reconnected (1 attempt)
[12905.117166] perf: interrupt took too long (4944 > 4902), lowering kernel.perf_event_max_sample_rate to 40000
[15436.031342] nvme nvme0: queue 16: timeout request 0x22 type 4
[15436.031920] nvme nvme0: starting error recovery
[15436.033746] block nvme0n1: no usable path - requeuing I/O
[15436.035221] nvme nvme0: Reconnecting in 10 seconds...
[15446.292544] block nvme0n1: no usable path - requeuing I/O
[15446.293541] nvme nvme0: creating 16 I/O queues.
[15446.339595] nvme nvme0: mapped 16/0/0 default/read/poll queues.
[15446.340846] debugfs: Directory 'hctx0' with parent '/' already present!
[15446.342664] debugfs: Directory 'hctx1' with parent '/' already present!
[15446.343413] debugfs: Directory 'hctx2' with parent '/' already present!
[15446.344228] debugfs: Directory 'hctx3' with parent '/' already present!
[15446.345155] debugfs: Directory 'hctx4' with parent '/' already present!
[15446.345948] debugfs: Directory 'hctx5' with parent '/' already present!
[15446.346767] debugfs: Directory 'hctx6' with parent '/' already present!
[15446.347368] debugfs: Directory 'hctx7' with parent '/' already present!
[15446.348129] debugfs: Directory 'hctx8' with parent '/' already present!
[15446.348948] debugfs: Directory 'hctx9' with parent '/' already present!
[15446.349538] debugfs: Directory 'hctx10' with parent '/' already present!
[15446.350185] debugfs: Directory 'hctx11' with parent '/' already present!
[15446.350757] debugfs: Directory 'hctx12' with parent '/' already present!
[15446.351539] debugfs: Directory 'hctx13' with parent '/' already present!
[15446.352200] debugfs: Directory 'hctx14' with parent '/' already present!
[15446.352722] debugfs: Directory 'hctx15' with parent '/' already present!
[15446.358750] nvme nvme0: Successfully reconnected (1 attempt)

---


> > 
> > Target dmesg:
> > ---
> > [ 1704.132366] nvmet: creating controller 1 for subsystem nvme-ram0 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.185987] nvmet: creating controller 2 for subsystem nvme-ram1 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.230065] nvmet: creating controller 3 for subsystem nvme-ram2 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.277712] nvmet: creating controller 4 for subsystem nvme-ram3 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.314457] nvmet: creating controller 5 for subsystem nvme-ram4 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.370124] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.435581] nvmet: creating controller 7 for subsystem nvme-ram6 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 1704.501813] nvmet: creating controller 8 for subsystem nvme-ram7 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > [ 2103.965017] nvmet: creating controller 6 for subsystem nvme-ram5 for NQN nqn.2014-08.org.nvmexpress:uuid:77f6ffad-1c4a-4c0e-9f11-23cd4daf0216.
> > ^^^^^^^^^^^^^^^
> > ---
> > 
> > Initiator dmesg:
> > ---
> > [ 1735.038634] EXT4-fs (nvme7n1): mounted filesystem with ordered data mode. Opts: (null)
> > [ 2111.990419] nvme nvme5: queue 7: timeout request 0x57 type 4
> > [ 2111.991835] nvme nvme5: starting error recovery
> > [ 2111.998796] block nvme5n1: no usable path - requeuing I/O
> > [ 2111.998816] nvme nvme5: Reconnecting in 10 seconds...
> > [ 2122.253431] block nvme5n1: no usable path - requeuing I/O
> > [ 2122.254732] nvme nvme5: creating 16 I/O queues.
> > [ 2122.301169] nvme nvme5: mapped 16/0/0 default/read/poll queues.
> > [ 2122.314229] nvme nvme5: Successfully reconnected (1 attempt)
> > ---
> > 
> > 
> > Thanks.
> > 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-12-14 18:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 17:19 Request timeout seen with NVMEoF TCP Potnuri Bharat Teja
2020-12-09 18:02 ` Sagi Grimberg
2020-12-10 17:06   ` Potnuri Bharat Teja
2020-12-10 21:47     ` Sagi Grimberg
2020-12-11  7:26       ` Potnuri Bharat Teja
2020-12-14 17:51         ` Sagi Grimberg
2020-12-14 18:05           ` Potnuri Bharat Teja [this message]
2020-12-14 19:27             ` Samuel Jones
2020-12-14 20:13               ` Sagi Grimberg
2020-12-15  1:53           ` Sagi Grimberg
2020-12-15 18:30             ` Wunderlich, Mark
2020-12-19  1:04               ` Sagi Grimberg
2020-12-16  5:51             ` Potnuri Bharat Teja
2020-12-19  1:35               ` Sagi Grimberg
2020-12-21  7:50                 ` Potnuri Bharat Teja

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9epa3PPORdBKZ71@chelsio.com \
    --to=bharat@chelsio.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=sjones@kalrayinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).