From: Jens Axboe <axboe@kernel.dk>
To: "Keith Busch" <keith.busch@intel.com>, "Matias Bjørling" <m@bjorling.me>
Cc: willy@linux.intel.com, sbradshaw@micron.com,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH V3] NVMe: basic conversion to blk-mq
Date: Thu, 29 May 2014 08:25:56 -0600 [thread overview]
Message-ID: <53874374.2020302@kernel.dk> (raw)
In-Reply-To: <alpine.LRH.2.03.1405281929460.4995@AMR>
On 2014-05-28 21:07, Keith Busch wrote:
> On Wed, 28 May 2014, Matias Bjørling wrote:
>> This converts the current NVMe driver to utilize the blk-mq layer.
>
> I am concerned about device hot removal since the h/w queues can be
> freed at any time. I *think* blk-mq helps with this in that the driver
> will not see a new request after calling blk_cleanup_queue. If you can
> confirm that's true and that blk-mq waits for all requests active in the
> driver to return to the block layer, then we're probably okay in this
> path. That wasn't true as a bio based driver which is why we are cautious
> with all the locking and barriers. But what about the IOCTL paths?
Barring any bugs in the code, then yes, this should work. On the scsi-mq
side, extensive error injection and pulling has been done, and it seems
to hold up fine now. The ioctl path would need to be audited.
>
> It also doesn't look like we're handling the case where the SGL can't
> map to a PRP.
>
>> +static void req_completion(struct nvme_queue *nvmeq, void *ctx,
>> struct nvme_completion *cqe)
>> {
>> struct nvme_iod *iod = ctx;
>> - struct bio *bio = iod->private;
>> + struct request *req = iod->private;
>> +
>> u16 status = le16_to_cpup(&cqe->status) >> 1;
>>
>> - if (unlikely(status)) {
>> - if (!(status & NVME_SC_DNR ||
>> - bio->bi_rw & REQ_FAILFAST_MASK) &&
>> - (jiffies - iod->start_time) < IOD_TIMEOUT) {
>> - if (!waitqueue_active(&nvmeq->sq_full))
>> - add_wait_queue(&nvmeq->sq_full,
>> - &nvmeq->sq_cong_wait);
>> - list_add_tail(&iod->node, &nvmeq->iod_bio);
>> - wake_up(&nvmeq->sq_full);
>> - return;
>> - }
>> - }
>> if (iod->nents) {
>> - dma_unmap_sg(nvmeq->q_dmadev, iod->sg, iod->nents,
>> - bio_data_dir(bio) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
>> - nvme_end_io_acct(bio, iod->start_time);
>> + dma_unmap_sg(&nvmeq->dev->pci_dev->dev, iod->sg, iod->nents,
>> + rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
>> }
>> nvme_free_iod(nvmeq->dev, iod);
>> - if (status)
>> - bio_endio(bio, -EIO);
>> +
>> + if (unlikely(status))
>> + req->errors = -EIO;
>> else
>> - bio_endio(bio, 0);
>> + req->errors = 0;
>> +
>> + blk_mq_complete_request(req);
>> }
>
> Is blk-mq going to retry intermittently failed commands for me? It
> doesn't look like it will.
Not sure what kind of behavior you are looking for here. If you can
expand on the above a bit, I'll gladly help sort it out. Only the driver
really knows if a particular request should be failed hard or retried.
So you'd probably have to track retry counts in the request and
reinsert/end as appropriate.
>
>> +static int nvme_submit_flush_sync(struct nvme_queue *nvmeq, struct
>> nvme_ns *ns)
>> +{
>> + struct request *req;
>> + struct nvme_command cmnd;
>> +
>> + req = blk_mq_alloc_request(ns->queue, WRITE, GFP_KERNEL, false);
>> + if (!req)
>> + return -ENOMEM;
>> +
>> + nvme_setup_flush(&cmnd, ns, req->tag);
>> + nvme_submit_sync_cmd(req, &cmnd, NULL, NVME_IO_TIMEOUT);
>>
>> return 0;
>> }
>
> It looks like this function above is being called from an interrupt
> context where we are already holding a spinlock. The sync command will
> try to take that same lock.
Yes, that code still looks very buggy. The initial alloc for
flush_cmd_info should also retry, not fail hard, if that alloc fails.
For the reinsert part, Matias, you want to look at the flush code in
blk-mq and how that handles it.
>> + if ((req->cmd_flags & REQ_FLUSH) && psegs) {
>> + struct flush_cmd_info *flush_cmd = kmalloc(
>> + sizeof(struct flush_cmd_info), GFP_KERNEL);
>
> The comment above says "may not sleep", but using GFP_KERNEL here. I
> actually think it is safe to sleep, though since you're not taking a
> lock until later, so maybe you can change all the allocs to GFP_KERNEL?
It might be safe, and it might not be. It depends on the CPU mapping to
the queue. preemption _could_ be off here (if needed), so GFP_KERNEL
cannot be used. Additionally, see above comment on what to do on alloc
failure.
--
Jens Axboe
next prev parent reply other threads:[~2014-05-29 14:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-28 22:59 [PATCH V3] basic conversion to blk-mq Matias Bjørling
2014-05-28 22:59 ` [PATCH V3] NVMe: " Matias Bjørling
2014-05-29 3:07 ` Keith Busch
2014-05-29 14:25 ` Jens Axboe [this message]
2014-05-29 19:32 ` Jens Axboe
2014-05-29 19:33 ` Jens Axboe
2014-05-29 22:34 ` Keith Busch
2014-05-29 23:06 ` Jens Axboe
2014-05-29 23:12 ` Jens Axboe
2014-05-30 17:20 ` Matias Bjorling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53874374.2020302@kernel.dk \
--to=axboe@kernel.dk \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m@bjorling.me \
--cc=sbradshaw@micron.com \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.