From: "hch@lst.de" <hch@lst.de>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>, "axboe@kernel.dk" <axboe@kernel.dk>,
"keith.busch@intel.com" <keith.busch@intel.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"sagi@grimberg.me" <sagi@grimberg.me>
Subject: Re: [PATCH 10/10] nvme: implement multipath access to nvme subsystems
Date: Thu, 24 Aug 2017 10:59:54 +0200 [thread overview]
Message-ID: <20170824085954.GD19504@lst.de> (raw)
In-Reply-To: <1503512512.2484.16.camel@wdc.com>
On Wed, Aug 23, 2017 at 06:21:55PM +0000, Bart Van Assche wrote:
> On Wed, 2017-08-23 at 19:58 +0200, Christoph Hellwig wrote:
> > +static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
> > +{
> > + struct nvme_ns_head *head = q->queuedata;
> > + struct nvme_ns *ns;
> > + blk_qc_t ret = BLK_QC_T_NONE;
> > + int srcu_idx;
> > +
> > + srcu_idx = srcu_read_lock(&head->srcu);
> > + ns = srcu_dereference(head->current_path, &head->srcu);
> > + if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
> > + ns = nvme_find_path(head);
> > + if (likely(ns)) {
> > + bio->bi_disk = ns->disk;
> > + bio->bi_opf |= REQ_FAILFAST_TRANSPORT;
> > + ret = generic_make_request_fast(bio);
> > + } else if (!list_empty_careful(&head->list)) {
> > + printk_ratelimited("no path available - requeing I/O\n");
> > +
> > + spin_lock_irq(&head->requeue_lock);
> > + bio_list_add(&head->requeue_list, bio);
> > + spin_unlock_irq(&head->requeue_lock);
> > + } else {
> > + printk_ratelimited("no path - failing I/O\n");
> > +
> > + bio->bi_status = BLK_STS_IOERR;
> > + bio_endio(bio);
> > + }
> > +
> > + srcu_read_unlock(&head->srcu, srcu_idx);
> > + return ret;
> > +}
>
> Hello Christoph,
>
> Since generic_make_request_fast() returns BLK_STS_AGAIN for a dying path:
> can the same kind of soft lockups occur with the NVMe multipathing code as
> with the current upstream device mapper multipathing code? See e.g.
> "[PATCH 3/7] dm-mpath: Do not lock up a CPU with requeuing activity"
> (https://www.redhat.com/archives/dm-devel/2017-August/msg00124.html).
I suspect the code is not going to hit it because we check the controller
state before trying to queue I/O on the lower queue. But if you point
me to a good reproducer test case I'd like to check.
Also does the "single queue" case in your mail refer to the old
request code? nvme only uses blk-mq so it would not hit that.
But either way I think get_request should be fixed to return
BLK_STS_IOERR if the queue is dying instead of BLK_STS_AGAIN.
> Another question about this code is what will happen if
> generic_make_request_fast() returns BLK_STS_AGAIN and the submit_bio() or
> generic_make_request() caller ignores the return value of the called
> function? A quick grep revealed that there is plenty of code that ignores
> the return value of these last two functions.
generic_make_request and generic_make_request_fast only return
the polling cookie (blk_qc_t), not a block status. Note that we do
not use blk_get_request / blk_mq_alloc_request for the request allocation
of the request on the lower device, so unless the caller passed REQ_NOWAIT
and is able to handle BLK_STS_AGAIN we won't ever return it.
WARNING: multiple messages have this Message-ID (diff)
From: hch@lst.de (hch@lst.de)
Subject: [PATCH 10/10] nvme: implement multipath access to nvme subsystems
Date: Thu, 24 Aug 2017 10:59:54 +0200 [thread overview]
Message-ID: <20170824085954.GD19504@lst.de> (raw)
In-Reply-To: <1503512512.2484.16.camel@wdc.com>
On Wed, Aug 23, 2017@06:21:55PM +0000, Bart Van Assche wrote:
> On Wed, 2017-08-23@19:58 +0200, Christoph Hellwig wrote:
> > +static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
> > +{
> > + struct nvme_ns_head *head = q->queuedata;
> > + struct nvme_ns *ns;
> > + blk_qc_t ret = BLK_QC_T_NONE;
> > + int srcu_idx;
> > +
> > + srcu_idx = srcu_read_lock(&head->srcu);
> > + ns = srcu_dereference(head->current_path, &head->srcu);
> > + if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
> > + ns = nvme_find_path(head);
> > + if (likely(ns)) {
> > + bio->bi_disk = ns->disk;
> > + bio->bi_opf |= REQ_FAILFAST_TRANSPORT;
> > + ret = generic_make_request_fast(bio);
> > + } else if (!list_empty_careful(&head->list)) {
> > + printk_ratelimited("no path available - requeing I/O\n");
> > +
> > + spin_lock_irq(&head->requeue_lock);
> > + bio_list_add(&head->requeue_list, bio);
> > + spin_unlock_irq(&head->requeue_lock);
> > + } else {
> > + printk_ratelimited("no path - failing I/O\n");
> > +
> > + bio->bi_status = BLK_STS_IOERR;
> > + bio_endio(bio);
> > + }
> > +
> > + srcu_read_unlock(&head->srcu, srcu_idx);
> > + return ret;
> > +}
>
> Hello Christoph,
>
> Since generic_make_request_fast() returns BLK_STS_AGAIN for a dying path:
> can the same kind of soft lockups occur with the NVMe multipathing code as
> with the current upstream device mapper multipathing code? See e.g.
> "[PATCH 3/7] dm-mpath: Do not lock up a CPU with requeuing activity"
> (https://www.redhat.com/archives/dm-devel/2017-August/msg00124.html).
I suspect the code is not going to hit it because we check the controller
state before trying to queue I/O on the lower queue. But if you point
me to a good reproducer test case I'd like to check.
Also does the "single queue" case in your mail refer to the old
request code? nvme only uses blk-mq so it would not hit that.
But either way I think get_request should be fixed to return
BLK_STS_IOERR if the queue is dying instead of BLK_STS_AGAIN.
> Another question about this code is what will happen if
> generic_make_request_fast() returns BLK_STS_AGAIN and the submit_bio() or
> generic_make_request() caller ignores the return value of the called
> function? A quick grep revealed that there is plenty of code that ignores
> the return value of these last two functions.
generic_make_request and generic_make_request_fast only return
the polling cookie (blk_qc_t), not a block status. Note that we do
not use blk_get_request / blk_mq_alloc_request for the request allocation
of the request on the lower device, so unless the caller passed REQ_NOWAIT
and is able to handle BLK_STS_AGAIN we won't ever return it.
next prev parent reply other threads:[~2017-08-24 8:59 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-23 17:58 RFC: nvme multipath support Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-23 17:58 ` [PATCH 01/10] nvme: report more detailed status codes to the block layer Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:06 ` Sagi Grimberg
2017-08-28 6:06 ` Sagi Grimberg
2017-08-28 18:50 ` Keith Busch
2017-08-28 18:50 ` Keith Busch
2017-08-23 17:58 ` [PATCH 02/10] nvme: allow calling nvme_change_ctrl_state from irq context Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:06 ` Sagi Grimberg
2017-08-28 6:06 ` Sagi Grimberg
2017-08-28 18:50 ` Keith Busch
2017-08-28 18:50 ` Keith Busch
2017-08-23 17:58 ` [PATCH 03/10] nvme: remove unused struct nvme_ns fields Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:07 ` Sagi Grimberg
2017-08-28 6:07 ` Sagi Grimberg
2017-08-28 19:13 ` Keith Busch
2017-08-28 19:13 ` Keith Busch
2017-08-23 17:58 ` [PATCH 04/10] nvme: remove nvme_revalidate_ns Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:12 ` Sagi Grimberg
2017-08-28 6:12 ` Sagi Grimberg
2017-08-28 19:14 ` Keith Busch
2017-08-28 19:14 ` Keith Busch
2017-08-23 17:58 ` [PATCH 05/10] nvme: don't blindly overwrite identifiers on disk revalidate Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:17 ` Sagi Grimberg
2017-08-28 6:17 ` Sagi Grimberg
2017-08-28 6:23 ` Christoph Hellwig
2017-08-28 6:23 ` Christoph Hellwig
2017-08-28 6:32 ` Sagi Grimberg
2017-08-28 6:32 ` Sagi Grimberg
2017-08-28 19:15 ` Keith Busch
2017-08-28 19:15 ` Keith Busch
2017-08-23 17:58 ` [PATCH 06/10] nvme: track subsystems Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-23 22:04 ` Keith Busch
2017-08-23 22:04 ` Keith Busch
2017-08-24 8:52 ` Christoph Hellwig
2017-08-24 8:52 ` Christoph Hellwig
2017-08-28 6:22 ` Sagi Grimberg
2017-08-28 6:22 ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 07/10] nvme: track shared namespaces Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 6:51 ` Sagi Grimberg
2017-08-28 6:51 ` Sagi Grimberg
2017-08-28 8:50 ` Christoph Hellwig
2017-08-28 8:50 ` Christoph Hellwig
2017-08-28 20:21 ` J Freyensee
2017-08-28 20:21 ` J Freyensee
2017-08-29 8:25 ` Christoph Hellwig
2017-08-29 8:25 ` Christoph Hellwig
2017-08-29 6:54 ` Guan Junxiong
2017-08-29 6:54 ` Guan Junxiong
2017-08-28 12:04 ` javigon
2017-08-28 12:04 ` javigon
2017-08-28 12:41 ` Guan Junxiong
2017-08-28 12:41 ` Guan Junxiong
2017-08-28 14:30 ` Christoph Hellwig
2017-08-28 14:30 ` Christoph Hellwig
2017-08-29 2:42 ` Guan Junxiong
2017-08-29 2:42 ` Guan Junxiong
2017-08-29 8:30 ` Christoph Hellwig
2017-08-29 8:30 ` Christoph Hellwig
2017-08-29 8:29 ` Christoph Hellwig
2017-08-29 8:29 ` Christoph Hellwig
2017-08-28 19:18 ` Keith Busch
2017-08-28 19:18 ` Keith Busch
2017-08-23 17:58 ` [PATCH 08/10] block: provide a generic_make_request_fast helper Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 7:00 ` Sagi Grimberg
2017-08-28 7:00 ` Sagi Grimberg
2017-08-28 8:54 ` Christoph Hellwig
2017-08-28 8:54 ` Christoph Hellwig
2017-08-28 11:01 ` Sagi Grimberg
2017-08-28 11:01 ` Sagi Grimberg
2017-08-28 11:54 ` Christoph Hellwig
2017-08-28 11:54 ` Christoph Hellwig
2017-08-28 12:38 ` Sagi Grimberg
2017-08-28 12:38 ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 09/10] blk-mq: add a blk_steal_bios helper Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-28 7:04 ` Sagi Grimberg
2017-08-28 7:04 ` Sagi Grimberg
2017-08-23 17:58 ` [PATCH 10/10] nvme: implement multipath access to nvme subsystems Christoph Hellwig
2017-08-23 17:58 ` Christoph Hellwig
2017-08-23 18:21 ` Bart Van Assche
2017-08-23 18:21 ` Bart Van Assche
2017-08-24 8:59 ` hch [this message]
2017-08-24 8:59 ` hch
2017-08-24 20:17 ` Bart Van Assche
2017-08-24 20:17 ` Bart Van Assche
2017-09-05 11:53 ` Christoph Hellwig
2017-09-05 11:53 ` Christoph Hellwig
2017-09-11 6:34 ` Tony Yang
2017-08-23 22:53 ` Keith Busch
2017-08-23 22:53 ` Keith Busch
2017-08-24 8:52 ` Christoph Hellwig
2017-08-24 8:52 ` Christoph Hellwig
2017-08-28 7:23 ` Sagi Grimberg
2017-08-28 7:23 ` Sagi Grimberg
2017-08-28 9:06 ` Christoph Hellwig
2017-08-28 9:06 ` Christoph Hellwig
2017-08-28 13:40 ` Sagi Grimberg
2017-08-28 13:40 ` Sagi Grimberg
2017-08-28 14:24 ` Christoph Hellwig
2017-08-28 14:24 ` Christoph Hellwig
2017-09-07 15:17 ` Tony Yang
2017-08-29 10:22 ` Guan Junxiong
2017-08-29 10:22 ` Guan Junxiong
2017-08-29 14:51 ` Christoph Hellwig
2017-08-29 14:51 ` Christoph Hellwig
2017-08-29 14:54 ` Keith Busch
2017-08-29 14:54 ` Keith Busch
2017-08-29 14:55 ` Christoph Hellwig
2017-08-29 14:55 ` Christoph Hellwig
2017-08-29 15:41 ` Keith Busch
2017-08-29 15:41 ` Keith Busch
2017-09-18 0:17 ` Christoph Hellwig
2017-09-18 0:17 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170824085954.GD19504@lst.de \
--to=hch@lst.de \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.