public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Matias Bjørling" <m@bjorling.me>
To: Keith Busch <keith.busch@intel.com>
Cc: axboe@kernel.dk, willy@linux.intel.com,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org
Subject: Re: [PATCH 3/3] NVMe: Convert to blk-mq
Date: Fri, 18 Oct 2013 21:06:22 +0200	[thread overview]
Message-ID: <526186AE.3080006@bjorling.me> (raw)
In-Reply-To: <alpine.LRH.2.03.1310180837140.4763@AMR>

On 10/18/2013 05:13 PM, Keith Busch wrote:
> On Fri, 18 Oct 2013, Matias Bjorling wrote:
>> The nvme driver implements itself as a bio-based driver. This primarily
>> because of high lock congestion for high-performance nvm devices. To
>> remove the congestion within the traditional block layer, a multi-queue
>> block layer is being implemented.
>>
>> This patch converts the current bio-based approach to work with the
>> request-based approach found in the multi-queue block layer. This means
>> that bio responsibility is moved from the driver, into the block layer.
>> In return the block layer packs request structures and submit them to
>> the nvme  according to the features/limits of nvme hardware.
>>
>> The patch consists of:
>> * Initialization of multi-queue data structures
>> * Conversion of bio function call into request function calls.
>> * Separate cmdid patchs for admin and normal queues.
>> * Bio splits according to NOT_VIRT_MERGEABLE are assumed to be handled
>>   by blk-mq.
>> * Uses the timeout framework blk-mq where possible.
>>
>> Signed-off-by: Matias Bjorling <m@bjorling.me>
>> ---
>> drivers/block/nvme-core.c | 765
>> +++++++++++++++++++++++-----------------------
>> drivers/block/nvme-scsi.c |  39 +--
>> include/linux/nvme.h      |   7 +-
>> 3 files changed, 385 insertions(+), 426 deletions(-)
>>
>> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
>> index e99a30a..36bf45c 100644
>> --- a/drivers/block/nvme-core.c
>> +++ b/drivers/block/nvme-core.c
>
> [snip]
>
>> -static void nvme_start_io_acct(struct bio *bio)
>> +static void nvme_start_io_acct(struct request *rq)
>> {
>> -    struct gendisk *disk = bio->bi_bdev->bd_disk;
>> -    const int rw = bio_data_dir(bio);
>> +    struct gendisk *disk = rq->rq_disk;
>> +    const int rw = rq_data_dir(rq);
>>     int cpu = part_stat_lock();
>>     part_round_stats(cpu, &disk->part0);
>>     part_stat_inc(cpu, &disk->part0, ios[rw]);
>> -    part_stat_add(cpu, &disk->part0, sectors[rw], bio_sectors(bio));
>> +    part_stat_add(cpu, &disk->part0, sectors[rw], blk_rq_sectors(rq));
>>     part_inc_in_flight(&disk->part0, rw);
>>     part_stat_unlock();
>> }
>>
>> -static void nvme_end_io_acct(struct bio *bio, unsigned long start_time)
>> +static void nvme_end_io_acct(struct request *rq, unsigned long
>> start_time)
>> {
>> -    struct gendisk *disk = bio->bi_bdev->bd_disk;
>> -    const int rw = bio_data_dir(bio);
>> +    struct gendisk *disk = rq->rq_disk;
>> +    const int rw = rq_data_dir(rq);
>>     unsigned long duration = jiffies - start_time;
>>     int cpu = part_stat_lock();
>>     part_stat_add(cpu, &disk->part0, ticks[rw], duration);
>> @@ -342,23 +370,26 @@ static void nvme_end_io_acct(struct bio *bio,
>> unsigned long start_time)
>>     part_stat_unlock();
>> }
>
> I think you can remove the io accounting, right? These were added here
> because the diskstats are not updated in the block layer for bio-based
> block drivers.

Yeap, I'll make a patch for the next version that removes them.

>
>> @@ -715,32 +606,47 @@ static int nvme_submit_bio_queue(struct
>> nvme_queue *nvmeq, struct nvme_ns *ns,
>>         dma_dir = DMA_FROM_DEVICE;
>>     }
>>
>> -    result = nvme_map_bio(nvmeq, iod, bio, dma_dir, psegs);
>> -    if (result <= 0)
>> +    if (nvme_map_rq(nvmeq, iod, rq, dma_dir))
>>         goto free_cmdid;
>> -    length = result;
>>
>> -    cmnd->rw.command_id = cmdid;
>> +    length = blk_rq_bytes(rq);
>> +
>> +    cmnd->rw.command_id = rq->tag;
>
> The command ids have to be unique on a submission queue. Since each
> namespace's blk-mq has its own 'tags' used as command ids here but share
> submission queues, what's stopping the tags for commands sent to namespace
> 1 from clashing with tags for namespace 2?
>
> I think this would work better if one blk-mq was created per device
> rather than namespace. It would fix the tag problem above and save a
> lot of memory potentially wasted on millions of requests allocated that
> can't be used.

You're right. I didn't see the connection. In v3 I'll push struct 
request_queue to nvme_dev and map the queues appropriately. It will also 
fix the command id issues.

>
> Do you know how/if this is planned to work with scsi? Will there be one
> blk-mq per LUN or per host controller?

Christoph Hellwig and Nicholas Bellinger are working on scsi-mq.

https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/log/?h=scsi-mq

I think they will map it out per controller. That'll be the most natural.


  reply	other threads:[~2013-10-18 19:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-18 13:14 [PATCH 0/3] Convert from bio-based to blk-mq v2 Matias Bjorling
2013-10-18 13:14 ` [PATCH 1/3] blk-mq: call exit_hctx on hw queue teardown Matias Bjorling
2013-10-18 13:14 ` [PATCH 2/3] NVMe: Extract admin queue size Matias Bjorling
2013-10-18 13:14 ` [PATCH 3/3] NVMe: Convert to blk-mq Matias Bjorling
2013-10-18 15:13   ` Keith Busch
2013-10-18 19:06     ` Matias Bjørling [this message]
2013-10-22 16:55       ` Keith Busch
2013-10-22 18:55         ` Matias Bjorling
2013-10-22 19:52           ` Keith Busch
2013-10-18 15:48 ` [PATCH 0/3] Convert from bio-based to blk-mq v2 Matthew Wilcox
2013-10-18 19:10   ` Matias Bjørling
2013-10-18 19:21   ` Matias Bjorling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=526186AE.3080006@bjorling.me \
    --to=m@bjorling.me \
    --cc=axboe@kernel.dk \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox