All of lore.kernel.org
 help / color / mirror / Atom feed
From: paul.grabinar@ranbarg.com (Paul Grabinar)
Subject: CQ Doorbells can be touched after queue deleted
Date: Tue, 30 Sep 2014 20:55:03 +0100	[thread overview]
Message-ID: <542B0A97.90309@ranbarg.com> (raw)
In-Reply-To: <alpine.LRH.2.03.1409301309230.4688@AMR>

On 30/09/14 20:41, Keith Busch wrote:
> On Tue, 30 Sep 2014, Paul Grabinar wrote:
>> I've encountered an interesting issue with the driver as in v3.17-rc7.
>> The NVMe specification defines writing to CQ doorbells for non-existent
>> queues as "undefined", so it is probably not a good idea to do this.
>> I'm aware of at least one drive that gets very upset if you try.
>>
>> The case I hit was where there is I/O running to the drive, but the
>> drive is being reset in the kthread due to not responding to abort
>> requests.
>> When an I/O request came in, nvme_process_cq was called from
>> nvme_make_request, but the queue no longer exists as it has been torn
>> down by the reset.
>> During nvme_process_cq, the doorbell is updated, which upsets the drive.
>>
>> This is a bit of a corner case, but it has happened.
>> We probably need to skip the doorbell update if the queue has been
>> deleted.
>
> Thanks for the info. I agree the results are undefined in this case. Your
> situation is interesting, though. The driver won't update the doorbell
> if no completions were posted, so it sounds like your device is posting
> completions long after the queues were deleted if the problem occured
> in the make_request path.
>
> Anyway, here's a patch.
>
>> From a141c8116d1d8f86fffb58bb312ec4d75cdc82ef Mon Sep 17 00:00:00 2001
> From: Keith Busch <keith.busch at intel.com>
> Date: Tue, 30 Sep 2014 12:51:06 -0600
> Subject: [PATCH] NVMe: Skip updating cq doorbell on inactive queues
>
> The driver processes the completion queue after taking queues offline to
> check for new completions that may have occured when the device deleted
> the queue. We don't want to write the completion queue doorbell in this
> case as it has undefined results.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> Reported-by: Paul Grabinar <paul.grabinar at ranbarg.com>
> ---
>  drivers/block/nvme-core.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index 28aec2d..6c077cd 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -828,7 +828,8 @@ static int nvme_process_cq(struct nvme_queue *nvmeq)
>      if (head == nvmeq->cq_head && phase == nvmeq->cq_phase)
>          return 0;
>
> -    writel(head, nvmeq->q_db + nvmeq->dev->db_stride);
> +    if (!nvmeq->q_suspended)
> +        writel(head, nvmeq->q_db + nvmeq->dev->db_stride);
>      nvmeq->cq_head = head;
>      nvmeq->cq_phase = phase;
>

Hi,
Yes, good point. I guess there are still commands that are being
processed and we are getting late completions.

Thanks for the patch.

      reply	other threads:[~2014-09-30 19:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30 18:32 CQ Doorbells can be touched after queue deleted Paul Grabinar
2014-09-30 19:41 ` Keith Busch
2014-09-30 19:55   ` Paul Grabinar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=542B0A97.90309@ranbarg.com \
    --to=paul.grabinar@ranbarg.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.