[PATCH RFC 2/2] NVMe: rfc blk-mq support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: m@bjorling.me (Matias Bjørling)
Subject: [PATCH RFC 2/2] NVMe: rfc blk-mq support
Date: Wed, 09 Oct 2013 09:12:47 +0200	[thread overview]
Message-ID: <525501EF.8030301@bjorling.me> (raw)
In-Reply-To: <alpine.LRH.2.03.1310081254150.4763@AMR>

Thanks for the feedback. I'll make a v2 and report back measurements of 
gain/loss for the machines I have available.

On 10/08/2013 10:59 PM, Keith Busch wrote:
> On Tue, 8 Oct 2013, Matias Bj?rling wrote:
>> Convert the driver to blk mq.
>>
>> The patch consists of:
>>
>> * Initializion of mq data structures.
>> * Convert function calls from bio to request data structures.
>> * IO queues are split into an admin queue and io queues.
>> * bio splits are removed as it should be handled by block layer.
>>
>> Signed-off-by: Matias Bj?rling <m at bjorling.me>
>
> I have no opinion right now if this is a good idea or not. I'll just
> comment on a couple issues on this implementation. Otherwise I think
> it's pretty neat and gave me a reason to explore multiqueues!
>
> First a couple minor suggestions:
>
> You might want to use "REQ_END" from the rq->cmd_flags to know if you
> should write the queue doorbell or not. Aggregating these would help
> most devices but we didn't have a good way of knowing before if this
> was the last request or not.
>
> Maybe change nvme_submit_bio_queue to return a BLK_MQ_RQ_QUEUE_ status
> so you don't need that switch statement after calling it.
>
> Must do something about requests that don't align to PRPs. I think you
> mentioned this in the outstanding work in [0/2].
>
>> struct nvme_queue *get_nvmeq(struct nvme_dev *dev)
>> {
>> -    return dev->queues[get_cpu() + 1];
>> +    get_cpu();
>> +    return dev->admin_queue;
>> }
>
> get_nvmeq now returns only the admin queue when it used to return only
> IO queues. This breaks NVME_IO and SG_IO ioctl handling.
>
>> +static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>> +              unsigned int i)
>> +{
>> +    struct nvme_ns *ns = data;
>> +    struct nvme_dev *dev = ns->dev;
>> +    struct nvme_queue *nq;
>> +
>> +    nq = nvme_create_queue(dev, i + 1, hctx->queue_depth, i);
>> +    if (IS_ERR(nq))
>> +        return PTR_ERR(nq);
>> +
>> +    hctx->driver_data = nq;
>> +
>> +    return 0;
>> +}
>
> This right here is the biggest problem with the implemenation. It is
> going to fail for every namespace but the first one since each namespace
> registers a multiqueue and each mulitqueue requires a hw context to
> work. The number of queues is for the device, not namespace, so only
> the first namespace is going to successfully return from nvme_init_hctx;
> the rest will be unable to create an NVMe IO queue for trying to create
> one with already allocated QID.
>
> You should instead create the IO queues on the device like how it was
> done before then just set the hctx->driver_data to dev->queues[i + 1]
> or something like.
>
>> +static enum blk_eh_timer_return nvme_timeout(struct request *rq)
>> +{
>> +    /* Currently the driver handle timeouts by itself */
>> +    return BLK_EH_NOT_HANDLED;
>> +}
>
> Need do something with the command timeouts here or somewhere. You've
> changed the driver to poll only on the admin queue for timed out commands,
> and left the multiqueue timeout a no-op.

WARNING: multiple messages have this Message-ID (diff)

From: "Matias Bjørling" <m@bjorling.me>
To: Keith Busch <keith.busch@intel.com>
Cc: axboe@kernel.dk, willy@linux.intel.com,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org
Subject: Re: [PATCH RFC 2/2] NVMe: rfc blk-mq support
Date: Wed, 09 Oct 2013 09:12:47 +0200	[thread overview]
Message-ID: <525501EF.8030301@bjorling.me> (raw)
In-Reply-To: <alpine.LRH.2.03.1310081254150.4763@AMR>

Thanks for the feedback. I'll make a v2 and report back measurements of 
gain/loss for the machines I have available.

On 10/08/2013 10:59 PM, Keith Busch wrote:
> On Tue, 8 Oct 2013, Matias Bjørling wrote:
>> Convert the driver to blk mq.
>>
>> The patch consists of:
>>
>> * Initializion of mq data structures.
>> * Convert function calls from bio to request data structures.
>> * IO queues are split into an admin queue and io queues.
>> * bio splits are removed as it should be handled by block layer.
>>
>> Signed-off-by: Matias Bjørling <m@bjorling.me>
>
> I have no opinion right now if this is a good idea or not. I'll just
> comment on a couple issues on this implementation. Otherwise I think
> it's pretty neat and gave me a reason to explore multiqueues!
>
> First a couple minor suggestions:
>
> You might want to use "REQ_END" from the rq->cmd_flags to know if you
> should write the queue doorbell or not. Aggregating these would help
> most devices but we didn't have a good way of knowing before if this
> was the last request or not.
>
> Maybe change nvme_submit_bio_queue to return a BLK_MQ_RQ_QUEUE_ status
> so you don't need that switch statement after calling it.
>
> Must do something about requests that don't align to PRPs. I think you
> mentioned this in the outstanding work in [0/2].
>
>> struct nvme_queue *get_nvmeq(struct nvme_dev *dev)
>> {
>> -    return dev->queues[get_cpu() + 1];
>> +    get_cpu();
>> +    return dev->admin_queue;
>> }
>
> get_nvmeq now returns only the admin queue when it used to return only
> IO queues. This breaks NVME_IO and SG_IO ioctl handling.
>
>> +static int nvme_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
>> +              unsigned int i)
>> +{
>> +    struct nvme_ns *ns = data;
>> +    struct nvme_dev *dev = ns->dev;
>> +    struct nvme_queue *nq;
>> +
>> +    nq = nvme_create_queue(dev, i + 1, hctx->queue_depth, i);
>> +    if (IS_ERR(nq))
>> +        return PTR_ERR(nq);
>> +
>> +    hctx->driver_data = nq;
>> +
>> +    return 0;
>> +}
>
> This right here is the biggest problem with the implemenation. It is
> going to fail for every namespace but the first one since each namespace
> registers a multiqueue and each mulitqueue requires a hw context to
> work. The number of queues is for the device, not namespace, so only
> the first namespace is going to successfully return from nvme_init_hctx;
> the rest will be unable to create an NVMe IO queue for trying to create
> one with already allocated QID.
>
> You should instead create the IO queues on the device like how it was
> done before then just set the hctx->driver_data to dev->queues[i + 1]
> or something like.
>
>> +static enum blk_eh_timer_return nvme_timeout(struct request *rq)
>> +{
>> +    /* Currently the driver handle timeouts by itself */
>> +    return BLK_EH_NOT_HANDLED;
>> +}
>
> Need do something with the command timeouts here or somewhere. You've
> changed the driver to poll only on the admin queue for timed out commands,
> and left the multiqueue timeout a no-op.

next prev parent reply	other threads:[~2013-10-09  7:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-08  9:34 [PATCH RFC 0/2] Convert from bio-based to blk-mq Matias Bjørling
2013-10-08  9:34 ` Matias Bjørling
2013-10-08  9:34 ` [PATCH RFC 1/2] blk-mq: call exit_hctx on hw queue teardown Matias Bjørling
2013-10-08  9:34   ` Matias Bjørling
2013-10-08  9:34 ` [PATCH RFC 2/2] NVMe: rfc blk-mq support Matias Bjørling
2013-10-08  9:34   ` Matias Bjørling
2013-10-08 20:59   ` Keith Busch
2013-10-08 20:59     ` Keith Busch
2013-10-09  7:12     ` Matias Bjørling [this message]
2013-10-09  7:12       ` Matias Bjørling
2013-10-08 13:10 ` [PATCH RFC 0/2] Convert from bio-based to blk-mq Matthew Wilcox
2013-10-08 13:10   ` Matthew Wilcox
2013-10-08 13:19   ` Matias Bjørling
2013-10-08 13:19     ` Matias Bjørling
2013-10-08 18:39   ` Jens Axboe
2013-10-08 18:39     ` Jens Axboe
2013-10-09 15:48     ` Keith Busch
2013-10-09 15:48       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525501EF.8030301@bjorling.me \
    --to=m@bjorling.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.