public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: "Keith Busch" <keith.busch@intel.com>, "Matias Bjørling" <m@bjorling.me>
Cc: willy@linux.intel.com, sbradshaw@micron.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] NVMe: basic conversion to blk-mq
Date: Thu, 29 May 2014 08:25:56 -0600	[thread overview]
Message-ID: <53874374.2020302@kernel.dk> (raw)
In-Reply-To: <alpine.LRH.2.03.1405281929460.4995@AMR>

On 2014-05-28 21:07, Keith Busch wrote:
> On Wed, 28 May 2014, Matias Bjørling wrote:
>> This converts the current NVMe driver to utilize the blk-mq layer.
>
> I am concerned about device hot removal since the h/w queues can be
> freed at any time. I *think* blk-mq helps with this in that the driver
> will not see a new request after calling blk_cleanup_queue. If you can
> confirm that's true and that blk-mq waits for all requests active in the
> driver to return to the block layer, then we're probably okay in this
> path. That wasn't true as a bio based driver which is why we are cautious
> with all the locking and barriers. But what about the IOCTL paths?

Barring any bugs in the code, then yes, this should work. On the scsi-mq 
side, extensive error injection and pulling has been done, and it seems 
to hold up fine now. The ioctl path would need to be audited.

>
> It also doesn't look like we're handling the case where the SGL can't
> map to a PRP.
>
>> +static void req_completion(struct nvme_queue *nvmeq, void *ctx,
>>                         struct nvme_completion *cqe)
>> {
>>     struct nvme_iod *iod = ctx;
>> -    struct bio *bio = iod->private;
>> +    struct request *req = iod->private;
>> +
>>     u16 status = le16_to_cpup(&cqe->status) >> 1;
>>
>> -    if (unlikely(status)) {
>> -        if (!(status & NVME_SC_DNR ||
>> -                bio->bi_rw & REQ_FAILFAST_MASK) &&
>> -                (jiffies - iod->start_time) < IOD_TIMEOUT) {
>> -            if (!waitqueue_active(&nvmeq->sq_full))
>> -                add_wait_queue(&nvmeq->sq_full,
>> -                            &nvmeq->sq_cong_wait);
>> -            list_add_tail(&iod->node, &nvmeq->iod_bio);
>> -            wake_up(&nvmeq->sq_full);
>> -            return;
>> -        }
>> -    }
>>     if (iod->nents) {
>> -        dma_unmap_sg(nvmeq->q_dmadev, iod->sg, iod->nents,
>> -            bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
>> -        nvme_end_io_acct(bio, iod->start_time);
>> +        dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->sg, iod->nents,
>> +            rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
>>     }
>>     nvme_free_iod(nvmeq->dev, iod);
>> -    if (status)
>> -        bio_endio(bio, -EIO);
>> +
>> +    if (unlikely(status))
>> +        req->errors = -EIO;
>>     else
>> -        bio_endio(bio, 0);
>> +        req->errors = 0;
>> +
>> +    blk_mq_complete_request(req);
>> }
>
> Is blk-mq going to retry intermittently failed commands for me? It
> doesn't look like it will.

Not sure what kind of behavior you are looking for here. If you can 
expand on the above a bit, I'll gladly help sort it out. Only the driver 
really knows if a particular request should be failed hard or retried. 
So you'd probably have to track retry counts in the request and 
reinsert/end as appropriate.

>
>> +static int nvme_submit_flush_sync(struct nvme_queue *nvmeq, struct
>> nvme_ns *ns)
>> +{
>> +    struct request *req;
>> +    struct nvme_command cmnd;
>> +
>> +    req = blk_mq_alloc_request(ns->queue, WRITE, GFP_KERNEL, false);
>> +    if (!req)
>> +        return -ENOMEM;
>> +
>> +    nvme_setup_flush(&cmnd, ns, req->tag);
>> +    nvme_submit_sync_cmd(req, &cmnd, NULL, NVME_IO_TIMEOUT);
>>
>>     return 0;
>> }
>
> It looks like this function above is being called from an interrupt
> context where we are already holding a spinlock. The sync command will
> try to take that same lock.

Yes, that code still looks very buggy. The initial alloc for 
flush_cmd_info should also retry, not fail hard, if that alloc fails. 
For the reinsert part, Matias, you want to look at the flush code in 
blk-mq and how that handles it.

>> +    if ((req->cmd_flags & REQ_FLUSH) && psegs) {
>> +        struct flush_cmd_info *flush_cmd = kmalloc(
>> +                sizeof(struct flush_cmd_info), GFP_KERNEL);
>
> The comment above says "may not sleep", but using GFP_KERNEL here. I
> actually think it is safe to sleep, though since you're not taking a
> lock until later, so maybe you can change all the allocs to GFP_KERNEL?

It might be safe, and it might not be. It depends on the CPU mapping to 
the queue. preemption _could_ be off here (if needed), so GFP_KERNEL 
cannot be used. Additionally, see above comment on what to do on alloc 
failure.

-- 
Jens Axboe


  reply	other threads:[~2014-05-29 14:26 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-28 22:59 [PATCH V3] basic conversion to blk-mq Matias Bjørling
2014-05-28 22:59 ` [PATCH V3] NVMe: " Matias Bjørling
2014-05-29  3:07   ` Keith Busch
2014-05-29 14:25     ` Jens Axboe [this message]
2014-05-29 19:32       ` Jens Axboe
2014-05-29 19:33         ` Jens Axboe
2014-05-29 22:34       ` Keith Busch
2014-05-29 23:06         ` Jens Axboe
2014-05-29 23:12           ` Jens Axboe
2014-05-30 17:20             ` Matias Bjorling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53874374.2020302@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m@bjorling.me \
    --cc=sbradshaw@micron.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox