public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] nvme: always return IRQ_HANDLED
@ 2017-08-17 19:32 Jens Axboe
  2017-08-17 20:15 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-08-17 19:32 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org
  Cc: linux-block@vger.kernel.org, Keith Busch, Christoph Hellwig

We currently have an issue with nvme when polling is used. Just
ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
disable ala:

[   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
[   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)

when running a few processes polling. The reason is pretty obvious - if
we're effective at polling, the triggered IRQ will never find any
events. If this happens enough times in a row, the kernel disables our
IRQ since we keep returning IRQ_NONE.

Question is, what's the best way to solve this. Ideally we should not be
triggering an IRQ at all, but that's still not in mainline. Can we
safely just return IRQ_HANDLED always? That should work except for the
case where we happen to run into an IRQ flood where DO want to turn off
the nvme irq. For now, I think that's small price to pay, since the
current issue is much worse and leaves the device in a weird non-working
state where some queue interrupts are turned off.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 74a124a06264..21a35faff86f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -160,7 +160,6 @@ struct nvme_queue {
 	u16 cq_head;
 	u16 qid;
 	u8 cq_phase;
-	u8 cqe_seen;
 	u32 *dbbuf_sq_db;
 	u32 *dbbuf_cq_db;
 	u32 *dbbuf_sq_ei;
@@ -830,22 +829,19 @@ static void nvme_process_cq(struct nvme_queue *nvmeq)
 		consumed++;
 	}
 
-	if (consumed) {
+	if (consumed)
 		nvme_ring_cq_doorbell(nvmeq);
-		nvmeq->cqe_seen = 1;
-	}
 }
 
 static irqreturn_t nvme_irq(int irq, void *data)
 {
-	irqreturn_t result;
 	struct nvme_queue *nvmeq = data;
+
 	spin_lock(&nvmeq->q_lock);
 	nvme_process_cq(nvmeq);
-	result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
-	nvmeq->cqe_seen = 0;
 	spin_unlock(&nvmeq->q_lock);
-	return result;
+
+	return IRQ_HANDLED;
 }
 
 static irqreturn_t nvme_irq_check(int irq, void *data)

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] nvme: always return IRQ_HANDLED
  2017-08-17 19:32 [RFC PATCH] nvme: always return IRQ_HANDLED Jens Axboe
@ 2017-08-17 20:15 ` Keith Busch
  2017-08-17 20:17   ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2017-08-17 20:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Christoph Hellwig

On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote:
> We currently have an issue with nvme when polling is used. Just
> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
> disable ala:
> 
> [   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
> [   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)
> 
> when running a few processes polling. The reason is pretty obvious - if
> we're effective at polling, the triggered IRQ will never find any
> events. If this happens enough times in a row, the kernel disables our
> IRQ since we keep returning IRQ_NONE.

If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any
completions since the last time nvme_irq was called. The cqe_seen on
polled compeletions is sticky until the IRQ handler is run, in which
case it returns IRQ_HANDLED even when no completions were handled during
that interrupt.

The only way it should be able to return IRQ_NONE is if no completions
were observed (polled or otherwise) since the last time the IRQ handler
was called.

> Question is, what's the best way to solve this. Ideally we should not be
> triggering an IRQ at all, but that's still not in mainline. Can we
> safely just return IRQ_HANDLED always? That should work except for the
> case where we happen to run into an IRQ flood where DO want to turn off
> the nvme irq. For now, I think that's small price to pay, since the
> current issue is much worse and leaves the device in a weird non-working
> state where some queue interrupts are turned off.

My recommended way to get this handled is to enable interrupt coalescing
and have controllers behave as the specification describes to suppress
interrupts when polling is active. From section 5.21.1.8:

  Specifically, if the Completion Queue Head Doorbell register is being
  updated that is associated with a particular interrupt vector, then
  the controller has a positive indication that completion queue entries
  are already being processed. In this case, the aggregation time and/or
  the aggregation threshold may be reset/restarted upon the associated
  register write. This may result in interrupts being delayed indefinitely
  in certain workloads where the aggregation time or aggregation threshold
  is non-zero.

> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 74a124a06264..21a35faff86f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -160,7 +160,6 @@ struct nvme_queue {
>  	u16 cq_head;
>  	u16 qid;
>  	u8 cq_phase;
> -	u8 cqe_seen;
>  	u32 *dbbuf_sq_db;
>  	u32 *dbbuf_cq_db;
>  	u32 *dbbuf_sq_ei;
> @@ -830,22 +829,19 @@ static void nvme_process_cq(struct nvme_queue *nvmeq)
>  		consumed++;
>  	}
>  
> -	if (consumed) {
> +	if (consumed)
>  		nvme_ring_cq_doorbell(nvmeq);
> -		nvmeq->cqe_seen = 1;
> -	}
>  }
>  
>  static irqreturn_t nvme_irq(int irq, void *data)
>  {
> -	irqreturn_t result;
>  	struct nvme_queue *nvmeq = data;
> +
>  	spin_lock(&nvmeq->q_lock);
>  	nvme_process_cq(nvmeq);
> -	result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
> -	nvmeq->cqe_seen = 0;
>  	spin_unlock(&nvmeq->q_lock);
> -	return result;
> +
> +	return IRQ_HANDLED;
>  }
>  
>  static irqreturn_t nvme_irq_check(int irq, void *data)
> 
> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] nvme: always return IRQ_HANDLED
  2017-08-17 20:15 ` Keith Busch
@ 2017-08-17 20:17   ` Jens Axboe
  2017-08-17 20:29     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-08-17 20:17 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Christoph Hellwig

On 08/17/2017 02:15 PM, Keith Busch wrote:
> On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote:
>> We currently have an issue with nvme when polling is used. Just
>> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
>> disable ala:
>>
>> [   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
>> [   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)
>>
>> when running a few processes polling. The reason is pretty obvious - if
>> we're effective at polling, the triggered IRQ will never find any
>> events. If this happens enough times in a row, the kernel disables our
>> IRQ since we keep returning IRQ_NONE.
> 
> If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any
> completions since the last time nvme_irq was called. The cqe_seen on
> polled compeletions is sticky until the IRQ handler is run, in which
> case it returns IRQ_HANDLED even when no completions were handled during
> that interrupt.
> 
> The only way it should be able to return IRQ_NONE is if no completions
> were observed (polled or otherwise) since the last time the IRQ handler
> was called.

The polling do not update the cqe_seen. So it's possible that every time
the IRQ handler does trigger, there are no entries found. Maybe a better
or simpler fix would be to have the polling set cqe_seen to true, and
leave the clearing to the interrupt handler as is done now.

>> Question is, what's the best way to solve this. Ideally we should not be
>> triggering an IRQ at all, but that's still not in mainline. Can we
>> safely just return IRQ_HANDLED always? That should work except for the
>> case where we happen to run into an IRQ flood where DO want to turn off
>> the nvme irq. For now, I think that's small price to pay, since the
>> current issue is much worse and leaves the device in a weird non-working
>> state where some queue interrupts are turned off.
> 
> My recommended way to get this handled is to enable interrupt coalescing
> and have controllers behave as the specification describes to suppress
> interrupts when polling is active. From section 5.21.1.8:
> 
>   Specifically, if the Completion Queue Head Doorbell register is being
>   updated that is associated with a particular interrupt vector, then
>   the controller has a positive indication that completion queue entries
>   are already being processed. In this case, the aggregation time and/or
>   the aggregation threshold may be reset/restarted upon the associated
>   register write. This may result in interrupts being delayed indefinitely
>   in certain workloads where the aggregation time or aggregation threshold
>   is non-zero.

That would indeed be much better, and not require is to fiddle with IRQ
less completion queues for more efficient polling. Of the test devices that
I do use, not all of them support coalescing, and of the ones that do, some
of them implement it in a pretty poor fashion.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] nvme: always return IRQ_HANDLED
  2017-08-17 20:29     ` Keith Busch
@ 2017-08-17 20:25       ` Jens Axboe
  2017-08-18  7:14         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-08-17 20:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Christoph Hellwig

On 08/17/2017 02:29 PM, Keith Busch wrote:
> On Thu, Aug 17, 2017 at 02:17:08PM -0600, Jens Axboe wrote:
>> On 08/17/2017 02:15 PM, Keith Busch wrote:
>>> On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote:
>>>> We currently have an issue with nvme when polling is used. Just
>>>> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
>>>> disable ala:
>>>>
>>>> [   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
>>>> [   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)
>>>>
>>>> when running a few processes polling. The reason is pretty obvious - if
>>>> we're effective at polling, the triggered IRQ will never find any
>>>> events. If this happens enough times in a row, the kernel disables our
>>>> IRQ since we keep returning IRQ_NONE.
>>>
>>> If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any
>>> completions since the last time nvme_irq was called. The cqe_seen on
>>> polled compeletions is sticky until the IRQ handler is run, in which
>>> case it returns IRQ_HANDLED even when no completions were handled during
>>> that interrupt.
>>>
>>> The only way it should be able to return IRQ_NONE is if no completions
>>> were observed (polled or otherwise) since the last time the IRQ handler
>>> was called.
>>
>> The polling do not update the cqe_seen. So it's possible that every time
>> the IRQ handler does trigger, there are no entries found. Maybe a better
>> or simpler fix would be to have the polling set cqe_seen to true, and
>> leave the clearing to the interrupt handler as is done now.
> 
> Oops, that looks like a mistake. I don't think we want to suppress
> spurious interrupt detection, though. How about this patch to set cq_seen
> on polling like we used to have?

Yes, that will work. We need to get this into 4.13.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] nvme: always return IRQ_HANDLED
  2017-08-17 20:17   ` Jens Axboe
@ 2017-08-17 20:29     ` Keith Busch
  2017-08-17 20:25       ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2017-08-17 20:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Christoph Hellwig

On Thu, Aug 17, 2017 at 02:17:08PM -0600, Jens Axboe wrote:
> On 08/17/2017 02:15 PM, Keith Busch wrote:
> > On Thu, Aug 17, 2017 at 01:32:20PM -0600, Jens Axboe wrote:
> >> We currently have an issue with nvme when polling is used. Just
> >> ran some testing on 4.13-rc5, and it's trivial to trigger an IRQ
> >> disable ala:
> >>
> >> [   52.412851] irq 77: nobody cared (try booting with the "irqpoll" option)
> >> [   52.415310] irq 70: nobody cared (try booting with the "irqpoll" option)
> >>
> >> when running a few processes polling. The reason is pretty obvious - if
> >> we're effective at polling, the triggered IRQ will never find any
> >> events. If this happens enough times in a row, the kernel disables our
> >> IRQ since we keep returning IRQ_NONE.
> > 
> > If you're seeing IRQ_NONE returned, the NVMe driver didn't poll any
> > completions since the last time nvme_irq was called. The cqe_seen on
> > polled compeletions is sticky until the IRQ handler is run, in which
> > case it returns IRQ_HANDLED even when no completions were handled during
> > that interrupt.
> > 
> > The only way it should be able to return IRQ_NONE is if no completions
> > were observed (polled or otherwise) since the last time the IRQ handler
> > was called.
> 
> The polling do not update the cqe_seen. So it's possible that every time
> the IRQ handler does trigger, there are no entries found. Maybe a better
> or simpler fix would be to have the polling set cqe_seen to true, and
> leave the clearing to the interrupt handler as is done now.

Oops, that looks like a mistake. I don't think we want to suppress
spurious interrupt detection, though. How about this patch to set cq_seen
on polling like we used to have?

Fixes: 920d13a884 ("nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq")

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ea5eb4c..e9886e1 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -795,6 +795,7 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq,
 		return;
 	}
 
+	nvmeq->cqe_seen = 1;
 	req = blk_mq_tag_to_rq(*nvmeq->tags, cqe->command_id);
 	nvme_end_request(req, cqe->status, cqe->result);
 }
@@ -824,10 +825,8 @@ static void nvme_process_cq(struct nvme_queue *nvmeq)
 		consumed++;
 	}
 
-	if (consumed) {
+	if (consumed)
 		nvme_ring_cq_doorbell(nvmeq);
-		nvmeq->cqe_seen = 1;
-	}
 }
 
 static irqreturn_t nvme_irq(int irq, void *data)
--

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] nvme: always return IRQ_HANDLED
  2017-08-17 20:25       ` Jens Axboe
@ 2017-08-18  7:14         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2017-08-18  7:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, linux-block@vger.kernel.org, Christoph Hellwig,
	linux-nvme@lists.infradead.org

On Thu, Aug 17, 2017 at 02:25:27PM -0600, Jens Axboe wrote:
> Yes, that will work. We need to get this into 4.13.

I'll pick it up for the pull request I'm going to send you today.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-18  7:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-17 19:32 [RFC PATCH] nvme: always return IRQ_HANDLED Jens Axboe
2017-08-17 20:15 ` Keith Busch
2017-08-17 20:17   ` Jens Axboe
2017-08-17 20:29     ` Keith Busch
2017-08-17 20:25       ` Jens Axboe
2017-08-18  7:14         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox