public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: "linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC PATCH] nvme: always return IRQ_HANDLED
Date: Thu, 17 Aug 2017 16:15:27 -0400	[thread overview]
Message-ID: <20170817201526.GA21397@localhost.localdomain> (raw)
In-Reply-To: <06b85884-efb4-e419-4b31-3aa4751bb338@kernel.dk>

On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote:
> We currently have an issue with nvme when polling is used. Just
> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
> disable ala:
> 
> [   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
> [   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)
> 
> when running a few processes polling. The reason is pretty obvious - if
> we're effective at polling, the triggered IRQ will never find any
> events. If this happens enough times in a row, the kernel disables our
> IRQ since we keep returning IRQ_NONE.

If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any
completions since the last time nvme_irq was called. The cqe_seen on
polled compeletions is sticky until the IRQ handler is run, in which
case it returns IRQ_HANDLED even when no completions were handled during
that interrupt.

The only way it should be able to return IRQ_NONE is if no completions
were observed (polled or otherwise) since the last time the IRQ handler
was called.

> Question is, what's the best way to solve this. Ideally we should not be
> triggering an IRQ at all, but that's still not in mainline. Can we
> safely just return IRQ_HANDLED always? That should work except for the
> case where we happen to run into an IRQ flood where DO want to turn off
> the nvme irq. For now, I think that's small price to pay, since the
> current issue is much worse and leaves the device in a weird non-working
> state where some queue interrupts are turned off.

My recommended way to get this handled is to enable interrupt coalescing
and have controllers behave as the specification describes to suppress
interrupts when polling is active. From section 5.21.1.8:

  Specifically, if the Completion Queue Head Doorbell register is being
  updated that is associated with a particular interrupt vector, then
  the controller has a positive indication that completion queue entries
  are already being processed. In this case, the aggregation time and/or
  the aggregation threshold may be reset/restarted upon the associated
  register write. This may result in interrupts being delayed indefinitely
  in certain workloads where the aggregation time or aggregation threshold
  is non-zero.

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 74a124a06264..21a35faff86f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -160,7 +160,6 @@ struct nvme_queue {
>  	u16 cq_head;
>  	u16 qid;
>  	u8 cq_phase;
> -	u8 cqe_seen;
>  	u32 *dbbuf_sq_db;
>  	u32 *dbbuf_cq_db;
>  	u32 *dbbuf_sq_ei;
> @@ -830,22 +829,19 @@ static void nvme_process_cq(struct nvme_queue *nvmeq)
>  		consumed++;
>  	}
>  
> -	if (consumed) {
> +	if (consumed)
>  		nvme_ring_cq_doorbell(nvmeq);
> -		nvmeq->cqe_seen = 1;
> -	}
>  }
>  
>  static irqreturn_t nvme_irq(int irq, void *data)
>  {
> -	irqreturn_t result;
>  	struct nvme_queue *nvmeq = data;
> +
>  	spin_lock(&nvmeq->q_lock);
>  	nvme_process_cq(nvmeq);
> -	result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
> -	nvmeq->cqe_seen = 0;
>  	spin_unlock(&nvmeq->q_lock);
> -	return result;
> +
> +	return IRQ_HANDLED;
>  }
>  
>  static irqreturn_t nvme_irq_check(int irq, void *data)
> 
> -- 
> Jens Axboe

  reply	other threads:[~2017-08-17 20:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17 19:32 [RFC PATCH] nvme: always return IRQ_HANDLED Jens Axboe
2017-08-17 20:15 ` Keith Busch [this message]
2017-08-17 20:17   ` Jens Axboe
2017-08-17 20:29     ` Keith Busch
2017-08-17 20:25       ` Jens Axboe
2017-08-18  7:14         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817201526.GA21397@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox