Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: hare@suse.de (Hannes Reinecke)
Subject: [PATCH] nvme: avoid hang on inaccessible paths
Date: Wed, 30 May 2018 14:30:20 +0200	[thread overview]
Message-ID: <20180530143020.50a299b8@pentland.suse.de> (raw)
In-Reply-To: <20180530121241.GA1850@lst.de>

On Wed, 30 May 2018 14:12:41 +0200
Christoph Hellwig <hch@lst.de> wrote:

> > -	/* XXX: try an inaccessible path as last resort per
> > 8.18.3.3 */
> > +	if (!ns)
> > +		ns = __nvme_find_path(head,
> > NVME_ANA_INACCESSIBLE);  
> 
> This at least needs to keep a comment on why we are aiming for an
> inaccessible path.
> 
Ok.

> > @@ -165,10 +166,14 @@ static blk_qc_t
> > nvme_ns_head_make_request(struct request_queue *q, 
> >  	srcu_idx = srcu_read_lock(&head->srcu);
> >  	ns = nvme_find_path(head);
> > -	if (likely(ns)) {
> > +	if (likely(ns && nvme_ns_ana_state(ns) !=
> > NVME_ANA_INACCESSIBLE)) { bio->bi_disk = ns->disk;
> >  		bio->bi_opf |= REQ_NVME_MPATH;
> >  		ret = direct_make_request(bio);  
> 
> We should actually sometimes try to issue I/O on an inaccessible path.
> Please take a look at Section 8.18.3.3 in the ANA TP.  Similar
> handling would also apply to change states, which we'd also need to
> cover here and above.
> 
That specific handling hasn't been implemented with this patch, as it
would induce issues on the host side.
But see below.

> 
> > +	} else if (ns) {
> > +		bio->bi_status = BLK_STS_TRANSPORT;
> > +		bio_set_flag(bio, BIO_QUIET);
> > +		bio_endio(bio);
> >  	} else if (!list_empty_careful(&head->list)) {
> >  		dev_warn_ratelimited(dev, "no path available -
> > requeuing I/O\n");  
> 
> But a case where we only have inaccessible / change states isn't
> really different from the no path available, requeing case.
> 
It is for the initial connect.

> Now it might make sense to have a (configurable) timeout to give
> up in all those case, and if my vague memory serves me right you
> actually volunteered to implement that a while ago.
> 
The problem is that this particular code path is triggered for the
revalidate_disk() case; when opting for requeue (as the original code
did) the system will hang during revalidate_disk(), via

[ 1463.219509]  schedule+0x4f/0xd0
[ 1463.220712]  io_schedule+0x1c/0x50
[ 1463.221415]  do_read_cache_page+0x603/0x8a0
[ 1463.232007]  read_dev_sector+0x5b/0x160
[ 1463.232858]  read_lba+0x1b5/0x340
[ 1463.237400]  efi_partition+0x234/0xce0
[ 1463.254255]  check_partition+0x1ab/0x310
[ 1463.255090]  rescan_partitions+0x136/0x4b0
[ 1463.258420]  __blkdev_get+0x535/0x840
[ 1463.261793]  blkdev_get+0x1ff/0x4c0
[ 1463.267360]  __device_add_disk+0x73c/0x7a0
[ 1463.272179]  nvme_mpath_add_disk+0x9a/0xe0
[ 1463.273042]  nvme_alloc_ns+0x6e3/0xdb0

As this blocks the nvmf_dev_mutex we don't have any chance to connect
the other, optimized path.

I had been thinking of implementing that particular handling from the
ANA spec, but that would mean we're adding an ANA TT delay for each
inaccessible path, which with the current defaults would mean booting
is delayed by 10 seconds per inaccessible path.
That simply doesn't scale, so I opted for aborting the I/O straigh away.
(Which, incidentally, is also how we handle things on the dm-multipath
side).

One possible alternative would be to make revalidate_disk non-blocking
by adding another GENHD flag to avoid reading partitions, as this would
require more plumbing I didn't attempt it so far.

Cheers,

Hannes

  reply	other threads:[~2018-05-30 12:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30 11:16 [PATCH] nvme: avoid hang on inaccessible paths Hannes Reinecke
2018-05-30 12:12 ` Christoph Hellwig
2018-05-30 12:30   ` Hannes Reinecke [this message]
2018-05-30 12:54     ` Christoph Hellwig
2018-05-30 23:10       ` Sagi Grimberg
2018-06-04  6:24         ` Hannes Reinecke
2018-06-04  6:37           ` Christoph Hellwig
2018-06-04 12:17             ` Sagi Grimberg
2018-06-04 12:56               ` Christoph Hellwig
2018-06-06 19:02   ` Popuri, Sriram
2018-06-07  5:54     ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180530143020.50a299b8@pentland.suse.de \
    --to=hare@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox