Questions on Interruption handling

All of lore.kernel.org
 help / color / mirror / Atom feed

* Questions on Interruption handling
@ 2014-10-22 17:43 Angelo Brito
  2014-10-22 18:10 ` Keith Busch
  2014-10-23 16:53 ` Matthew Wilcox
  0 siblings, 2 replies; 11+ messages in thread
From: Angelo Brito @ 2014-10-22 17:43 UTC (permalink / raw)


Hello all!

I had some issues with the Interruption handling. The scenario is as follows:
We have a NVMe Device with single MSI enabled and some of its
transfers took about 1000 jiffies (ms) to execute. We saw this when we
used IOMeter to benchmark a NVMe controller and we noticed that about
1 in 10 commands took much longer than expected. When we traced
through the kernel code we tracked the issue to come from the nvme_irq
function. In most cases, it is triggered by the interrupts and all
CQEs in the queue are processed correctly. In some cases, though, we
found out that a new CQE arrived while the nvme_irq function was
processing previous entries or just after the CQ doorbell has been
sent. These entries were overlooked by the driver and picked up later
by the nvme_kthread function, which reexecutes the nvme_process_cq
function once every second.

We read the NVMe specification 1.1 section 7.5.1.1 and noticed that it
defined a procedure to not loose any interrupts in a Legacy or MSI
system. The Host Software Interrupt Handling should mask and clear the
interruptions by using the INTMS and INTMC registers. We are proposing
a change to the nvme_irq function as described below. This is probably
not needed on MSI-X enabled devices but it is harmless to leave it in
for them as well.

So we modified the nvme_irq function and tested it on our controller
and the problem was fixed. Below annexed is the modified code, since
we don't know how to submit it. Sorry about that.

static irqreturn_t nvme_irq(int irq, void *data)
{
        irqreturn_t result;
        struct nvme_queue *nvmeq = data;
+        u32 maskvec;
+
+        /* We calculate which mask vector we use. */
+        maskvec = 1 << nvmeq->cq_vector;
+
        spin_lock(&nvmeq->q_lock);
+
+       /* We mask interrupts to this vector. */
+        writel(maskvec, &nvmeq->dev->bar->intms);
+
        nvme_process_cq(nvmeq);
        result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
        nvmeq->cqe_seen = 0;
+
+       /* And we unmask interrupt to this vector. */
+        writel(maskvec, &nvmeq->dev->bar->intmc);
+
        spin_unlock(&nvmeq->q_lock);
        return result;
}


Regards,
Angelo Silva Brito.
Digital System Engineer from Silicon Reef
http://about.me/angelobrito
_________________________________________________

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-22 17:43 Questions on Interruption handling Angelo Brito
@ 2014-10-22 18:10 ` Keith Busch
  2014-10-23 13:30   ` Angelo Brito
  2014-10-23 14:26   ` Angelo Brito
  2014-10-23 16:53 ` Matthew Wilcox
  1 sibling, 2 replies; 11+ messages in thread
From: Keith Busch @ 2014-10-22 18:10 UTC (permalink / raw)


On Wed, 22 Oct 2014, Angelo Brito wrote:
> Hello all!
>
> I had some issues with the Interruption handling. The scenario is as follows:
> We have a NVMe Device with single MSI enabled and some of its
> transfers took about 1000 jiffies (ms) to execute. We saw this when we
> used IOMeter to benchmark a NVMe controller and we noticed that about
> 1 in 10 commands took much longer than expected. When we traced
> through the kernel code we tracked the issue to come from the nvme_irq
> function. In most cases, it is triggered by the interrupts and all
> CQEs in the queue are processed correctly. In some cases, though, we
> found out that a new CQE arrived while the nvme_irq function was
> processing previous entries or just after the CQ doorbell has been
> sent. These entries were overlooked by the driver and picked up later
> by the nvme_kthread function, which reexecutes the nvme_process_cq
> function once every second.
>
> We read the NVMe specification 1.1 section 7.5.1.1 and noticed that it
> defined a procedure to not loose any interrupts in a Legacy or MSI
> system. The Host Software Interrupt Handling should mask and clear the
> interruptions by using the INTMS and INTMC registers. We are proposing
> a change to the nvme_irq function as described below. This is probably
> not needed on MSI-X enabled devices but it is harmless to leave it in
> for them as well.

I don't think you can't really assume it's harmless. It adds two
unnecessary MMIO writes on every interrupt, and the spec says undefined
results if you touch those registers when configured with MSI-x, so
please change this to affect only MSI and INTx.

> So we modified the nvme_irq function and tested it on our controller
> and the problem was fixed. Below annexed is the modified code, since
> we don't know how to submit it. Sorry about that.
>
> static irqreturn_t nvme_irq(int irq, void *data)
> {
>        irqreturn_t result;
>        struct nvme_queue *nvmeq = data;
> +        u32 maskvec;
> +
> +        /* We calculate which mask vector we use. */
> +        maskvec = 1 << nvmeq->cq_vector;
> +
>        spin_lock(&nvmeq->q_lock);
> +
> +       /* We mask interrupts to this vector. */
> +        writel(maskvec, &nvmeq->dev->bar->intms);
> +
>        nvme_process_cq(nvmeq);
>        result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
>        nvmeq->cqe_seen = 0;
> +
> +       /* And we unmask interrupt to this vector. */
> +        writel(maskvec, &nvmeq->dev->bar->intmc);
> +
>        spin_unlock(&nvmeq->q_lock);
>        return result;
> }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-22 18:10 ` Keith Busch
@ 2014-10-23 13:30   ` Angelo Brito
  2014-10-23 14:26   ` Angelo Brito
  1 sibling, 0 replies; 11+ messages in thread
From: Angelo Brito @ 2014-10-23 13:30 UTC (permalink / raw)


Ok Keith,
I presume you meant this:
static irqreturn_t nvme_irq(int irq, void *data)
{
       irqreturn_t result;
       struct nvme_queue *nvmeq = data;
+        u32 maskvec;
+
+        /* We calculate which mask vector we use. */
+        maskvec = 1 << nvmeq->cq_vector;
+
       spin_lock(&nvmeq->q_lock);
+
+       /* We mask interrupts to this vector but not for MSI-X */
+       if(!nvmeq->dev->pci_dev->msix_enabled)
+           writel(maskvec, &nvmeq->dev->bar->intms);
+
       nvme_process_cq(nvmeq);
       result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
       nvmeq->cqe_seen = 0;
+
+       /* And we unmask interrupt to this vector but not for MSI-X. */
+       if(!nvmeq->dev->pci_dev->msix_enabled)
+           writel(maskvec, &nvmeq->dev->bar->intmc);
+
       spin_unlock(&nvmeq->q_lock);
       return result;
}

Ps: I don't have and MSI-X device to check if that doesn't affect it.
Could you check?


Regards,
Angelo Silva Brito.
Graduate in Engenharia da Computa??o - UFPE
http://about.me/angelobrito
_________________________________________________


On Wed, Oct 22, 2014@3:10 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, 22 Oct 2014, Angelo Brito wrote:
>>
>> Hello all!
>>
>> I had some issues with the Interruption handling. The scenario is as
>> follows:
>> We have a NVMe Device with single MSI enabled and some of its
>> transfers took about 1000 jiffies (ms) to execute. We saw this when we
>> used IOMeter to benchmark a NVMe controller and we noticed that about
>> 1 in 10 commands took much longer than expected. When we traced
>> through the kernel code we tracked the issue to come from the nvme_irq
>> function. In most cases, it is triggered by the interrupts and all
>> CQEs in the queue are processed correctly. In some cases, though, we
>> found out that a new CQE arrived while the nvme_irq function was
>> processing previous entries or just after the CQ doorbell has been
>> sent. These entries were overlooked by the driver and picked up later
>> by the nvme_kthread function, which reexecutes the nvme_process_cq
>> function once every second.
>>
>> We read the NVMe specification 1.1 section 7.5.1.1 and noticed that it
>> defined a procedure to not loose any interrupts in a Legacy or MSI
>> system. The Host Software Interrupt Handling should mask and clear the
>> interruptions by using the INTMS and INTMC registers. We are proposing
>> a change to the nvme_irq function as described below. This is probably
>> not needed on MSI-X enabled devices but it is harmless to leave it in
>> for them as well.
>
>
> I don't think you can't really assume it's harmless. It adds two
> unnecessary MMIO writes on every interrupt, and the spec says undefined
> results if you touch those registers when configured with MSI-x, so
> please change this to affect only MSI and INTx.
>
>
>> So we modified the nvme_irq function and tested it on our controller
>> and the problem was fixed. Below annexed is the modified code, since
>> we don't know how to submit it. Sorry about that.
>>
>> static irqreturn_t nvme_irq(int irq, void *data)
>> {
>>        irqreturn_t result;
>>        struct nvme_queue *nvmeq = data;
>> +        u32 maskvec;
>> +
>> +        /* We calculate which mask vector we use. */
>> +        maskvec = 1 << nvmeq->cq_vector;
>> +
>>        spin_lock(&nvmeq->q_lock);
>> +
>> +       /* We mask interrupts to this vector. */
>> +        writel(maskvec, &nvmeq->dev->bar->intms);
>> +
>>        nvme_process_cq(nvmeq);
>>        result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
>>        nvmeq->cqe_seen = 0;
>> +
>> +       /* And we unmask interrupt to this vector. */
>> +        writel(maskvec, &nvmeq->dev->bar->intmc);
>> +
>>        spin_unlock(&nvmeq->q_lock);
>>        return result;
>> }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-22 18:10 ` Keith Busch
  2014-10-23 13:30   ` Angelo Brito
@ 2014-10-23 14:26   ` Angelo Brito
  1 sibling, 0 replies; 11+ messages in thread
From: Angelo Brito @ 2014-10-23 14:26 UTC (permalink / raw)


Ok Keith,
I presume you meant this:
static irqreturn_t nvme_irq(int irq, void *data)
{
       irqreturn_t result;
       struct nvme_queue *nvmeq = data;
+        u32 maskvec;
+
+        /* We calculate which mask vector we use. */
+        maskvec = 1 << nvmeq->cq_vector;
+
       spin_lock(&nvmeq->q_lock);
+
+       /* We mask interrupts to this vector but not for MSI-X */
+       if(!nvmeq->dev->pci_dev->msix_enabled)
+           writel(maskvec, &nvmeq->dev->bar->intms);
+
       nvme_process_cq(nvmeq);
       result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
       nvmeq->cqe_seen = 0;
+
+       /* And we unmask interrupt to this vector but not for MSI-X. */
+       if(!nvmeq->dev->pci_dev->msix_enabled)
+           writel(maskvec, &nvmeq->dev->bar->intmc);
+
       spin_unlock(&nvmeq->q_lock);
       return result;
}

Ps: I don't have and MSI-X device to check if that doesn't affect it.
Could you check?


Regards,
Angelo Silva Brito.
Graduate in Engenharia da Computa??o - UFPE
http://about.me/angelobrito
_________________________________________________


On Wed, Oct 22, 2014@3:10 PM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, 22 Oct 2014, Angelo Brito wrote:
>>
>> Hello all!
>>
>> I had some issues with the Interruption handling. The scenario is as
>> follows:
>> We have a NVMe Device with single MSI enabled and some of its
>> transfers took about 1000 jiffies (ms) to execute. We saw this when we
>> used IOMeter to benchmark a NVMe controller and we noticed that about
>> 1 in 10 commands took much longer than expected. When we traced
>> through the kernel code we tracked the issue to come from the nvme_irq
>> function. In most cases, it is triggered by the interrupts and all
>> CQEs in the queue are processed correctly. In some cases, though, we
>> found out that a new CQE arrived while the nvme_irq function was
>> processing previous entries or just after the CQ doorbell has been
>> sent. These entries were overlooked by the driver and picked up later
>> by the nvme_kthread function, which reexecutes the nvme_process_cq
>> function once every second.
>>
>> We read the NVMe specification 1.1 section 7.5.1.1 and noticed that it
>> defined a procedure to not loose any interrupts in a Legacy or MSI
>> system. The Host Software Interrupt Handling should mask and clear the
>> interruptions by using the INTMS and INTMC registers. We are proposing
>> a change to the nvme_irq function as described below. This is probably
>> not needed on MSI-X enabled devices but it is harmless to leave it in
>> for them as well.
>
>
> I don't think you can't really assume it's harmless. It adds two
> unnecessary MMIO writes on every interrupt, and the spec says undefined
> results if you touch those registers when configured with MSI-x, so
> please change this to affect only MSI and INTx.
>
>
>> So we modified the nvme_irq function and tested it on our controller
>> and the problem was fixed. Below annexed is the modified code, since
>> we don't know how to submit it. Sorry about that.
>>
>> static irqreturn_t nvme_irq(int irq, void *data)
>> {
>>        irqreturn_t result;
>>        struct nvme_queue *nvmeq = data;
>> +        u32 maskvec;
>> +
>> +        /* We calculate which mask vector we use. */
>> +        maskvec = 1 << nvmeq->cq_vector;
>> +
>>        spin_lock(&nvmeq->q_lock);
>> +
>> +       /* We mask interrupts to this vector. */
>> +        writel(maskvec, &nvmeq->dev->bar->intms);
>> +
>>        nvme_process_cq(nvmeq);
>>        result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE;
>>        nvmeq->cqe_seen = 0;
>> +
>> +       /* And we unmask interrupt to this vector. */
>> +        writel(maskvec, &nvmeq->dev->bar->intmc);
>> +
>>        spin_unlock(&nvmeq->q_lock);
>>        return result;
>> }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-22 17:43 Questions on Interruption handling Angelo Brito
  2014-10-22 18:10 ` Keith Busch
@ 2014-10-23 16:53 ` Matthew Wilcox
  2014-10-24 16:51   ` Angelo Brito
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2014-10-23 16:53 UTC (permalink / raw)

On Wed, Oct 22, 2014@02:43:44PM -0300, Angelo Brito wrote:
> I had some issues with the Interruption handling. The scenario is as follows:
> We have a NVMe Device with single MSI enabled and some of its
> transfers took about 1000 jiffies (ms) to execute. We saw this when we
> used IOMeter to benchmark a NVMe controller and we noticed that about
> 1 in 10 commands took much longer than expected. When we traced
> through the kernel code we tracked the issue to come from the nvme_irq
> function. In most cases, it is triggered by the interrupts and all
> CQEs in the queue are processed correctly. In some cases, though, we
> found out that a new CQE arrived while the nvme_irq function was
> processing previous entries or just after the CQ doorbell has been
> sent. These entries were overlooked by the driver and picked up later
> by the nvme_kthread function, which reexecutes the nvme_process_cq
> function once every second.

This ought not be possible.  This is how things are supposed to work:

A. Device writes to CQ
B. Device sends MSI

1. Host receives interrupt
2. Host checks CQ

Now, I'm assuming that you have a flood of interrupts coming in because
you have a high IOPS workload and haven't configured interrupt mitigation.
The soft interrupt mitigation in handle_edge_irq() should be kicking in
and preventing the driver from being overwhelmed:

        if (unlikely(irqd_irq_disabled(&desc->irq_data) ||
                     irqd_irq_inprogress(&desc->irq_data) || !desc->action)) {
                if (!irq_check_poll(desc)) {
                        desc->istate |= IRQS_PENDING;
                        mask_ack_irq(desc);
                        goto out_unlock;
                }
        }
...
        do {
...
                if (unlikely(desc->istate & IRQS_PENDING)) {
                        if (!irqd_irq_disabled(&desc->irq_data) &&
                            irqd_irq_masked(&desc->irq_data))
                                unmask_irq(desc);
                }

                handle_irq_event(desc);

        } while ((desc->istate & IRQS_PENDING) &&
                 !irqd_irq_disabled(&desc->irq_data));

handle_irq_event() ends up calling the nvme_irq() handler.

Notice that we never tell the *device* to stop sending interrupts.
We'll mask this interrupt on the CPU, but we'll always unmask it before
calling the interrupt handler again.  That guarantees that if an interrupt
arrives during handling of the previous interrupt, we'll call the handler
at least once more.

So, absolutely, a CQE can arive *just* after nvme_process_cq() loads
the cqe.  But if it does, there should be an interrupt shortly afterwards
that triggers nvme_irq() to be called again.  Are you sure your device
is sending an interrupt after it sends the CQE whose processing is
being delayed?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-23 16:53 ` Matthew Wilcox
@ 2014-10-24 16:51   ` Angelo Brito
  2014-10-24 17:49     ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Angelo Brito @ 2014-10-24 16:51 UTC (permalink / raw)


We can look more carefuly at those functions you stated, but perhaps
there is a small difference on how we are reading the spec. We do not
send a MSI for every single CQ because the spec states a different
functionality in section 7.5.1. It defines that the internal IS vector
should have a bit high when there are unanswered CQ entries and the
vector is not masked. The table then states that the MSI should be
sent only when a bit in the IS vector rises, meaning it either had
entries and was unmasked or it did not have entries and an entry came
in. I presume that was to reduce traffic in a very overloaded system.
This is for MSI and legacy only, of course, MSI-X uses a different
mechanism.

Now, there is a window that we noticed. After the interrupt was
triggered it starts reading the CQs. It takes a few hundred
nanoseconds from the time the CQs have been read to the time the
doorbell arives at the controller, and the controller will take time
to process it as well, probably up to a few microsencods. If the
controller decides to write a new entry in a CQ in this time the
corresponding bit in the IS vector will already be high, therefore
there should be no new MSI. The host though already checked the CQs so
it will not see that new entries came in.

We believe that is why section 7.5.1.1 states that the host should
mask interrupts and then release them. This way the host forces the
bits in the IS vector in the controller to go low and high again (see
section 7.5.1). If the host did not answer every single CQ entry, then
when the INTMC register is written a new MSI will be issued.

Regards,
Angelo Brito

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-24 16:51   ` Angelo Brito
@ 2014-10-24 17:49     ` Matthew Wilcox
  2014-10-24 17:59       ` Angelo Brito
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2014-10-24 17:49 UTC (permalink / raw)


On Fri, Oct 24, 2014@01:51:59PM -0300, Angelo Brito wrote:
> We can look more carefuly at those functions you stated, but perhaps
> there is a small difference on how we are reading the spec. We do not
> send a MSI for every single CQ because the spec states a different
> functionality in section 7.5.1. It defines that the internal IS vector
> should have a bit high when there are unanswered CQ entries and the
> vector is not masked. The table then states that the MSI should be
> sent only when a bit in the IS vector rises, meaning it either had
> entries and was unmasked or it did not have entries and an entry came
> in. I presume that was to reduce traffic in a very overloaded system.
> This is for MSI and legacy only, of course, MSI-X uses a different
> mechanism.
> 
> Now, there is a window that we noticed. After the interrupt was
> triggered it starts reading the CQs. It takes a few hundred
> nanoseconds from the time the CQs have been read to the time the
> doorbell arives at the controller, and the controller will take time
> to process it as well, probably up to a few microsencods. If the
> controller decides to write a new entry in a CQ in this time the
> corresponding bit in the IS vector will already be high, therefore
> there should be no new MSI. The host though already checked the CQs so
> it will not see that new entries came in.
> 
> We believe that is why section 7.5.1.1 states that the host should
> mask interrupts and then release them. This way the host forces the
> bits in the IS vector in the controller to go low and high again (see
> section 7.5.1). If the host did not answer every single CQ entry, then
> when the INTMC register is written a new MSI will be issued.

Argh, the spec is buggy.  It should say that if the CQ doorbell write is
less than the controller's notion of the CQ head, that the controller
should send another interrupt.  I've sent in a request to the NVMe
workgroup that we do an ECN to fix this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-24 17:49     ` Matthew Wilcox
@ 2014-10-24 17:59       ` Angelo Brito
  2015-01-20 19:13         ` Angelo Brito
  0 siblings, 1 reply; 11+ messages in thread
From: Angelo Brito @ 2014-10-24 17:59 UTC (permalink / raw)


Thanks for looking into it.
Simply masking and unmasking the interruptions fixed our problems but
perhaps it creates other issues.
So please keep us posted. We will watch out ECN.

Regards,
Angelo Brito


On Fri, Oct 24, 2014@2:49 PM, Matthew Wilcox <willy@linux.intel.com> wrote:
> On Fri, Oct 24, 2014@01:51:59PM -0300, Angelo Brito wrote:
>> We can look more carefuly at those functions you stated, but perhaps
>> there is a small difference on how we are reading the spec. We do not
>> send a MSI for every single CQ because the spec states a different
>> functionality in section 7.5.1. It defines that the internal IS vector
>> should have a bit high when there are unanswered CQ entries and the
>> vector is not masked. The table then states that the MSI should be
>> sent only when a bit in the IS vector rises, meaning it either had
>> entries and was unmasked or it did not have entries and an entry came
>> in. I presume that was to reduce traffic in a very overloaded system.
>> This is for MSI and legacy only, of course, MSI-X uses a different
>> mechanism.
>>
>> Now, there is a window that we noticed. After the interrupt was
>> triggered it starts reading the CQs. It takes a few hundred
>> nanoseconds from the time the CQs have been read to the time the
>> doorbell arives at the controller, and the controller will take time
>> to process it as well, probably up to a few microsencods. If the
>> controller decides to write a new entry in a CQ in this time the
>> corresponding bit in the IS vector will already be high, therefore
>> there should be no new MSI. The host though already checked the CQs so
>> it will not see that new entries came in.
>>
>> We believe that is why section 7.5.1.1 states that the host should
>> mask interrupts and then release them. This way the host forces the
>> bits in the IS vector in the controller to go low and high again (see
>> section 7.5.1). If the host did not answer every single CQ entry, then
>> when the INTMC register is written a new MSI will be issued.
>
> Argh, the spec is buggy.  It should say that if the CQ doorbell write is
> less than the controller's notion of the CQ head, that the controller
> should send another interrupt.  I've sent in a request to the NVMe
> workgroup that we do an ECN to fix this.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2014-10-24 17:59       ` Angelo Brito
@ 2015-01-20 19:13         ` Angelo Brito
  2015-01-20 22:35           ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Angelo Brito @ 2015-01-20 19:13 UTC (permalink / raw)


Hi all,

So I just checked that a new version of the Specification 1.2 was
released in November and no modifications were done for this issue.

Do you guys have any news?


Regards,
Angelo Silva Brito.
Digital Systems Engineer at Silicon Reef
http://about.me/angelobrito
_________________________________________________


On Fri, Oct 24, 2014@2:59 PM, Angelo Brito <asb@cin.ufpe.br> wrote:
> Thanks for looking into it.
> Simply masking and unmasking the interruptions fixed our problems but
> perhaps it creates other issues.
> So please keep us posted. We will watch out ECN.
>
> Regards,
> Angelo Brito
>
>
> On Fri, Oct 24, 2014@2:49 PM, Matthew Wilcox <willy@linux.intel.com> wrote:
>> On Fri, Oct 24, 2014@01:51:59PM -0300, Angelo Brito wrote:
>>> We can look more carefuly at those functions you stated, but perhaps
>>> there is a small difference on how we are reading the spec. We do not
>>> send a MSI for every single CQ because the spec states a different
>>> functionality in section 7.5.1. It defines that the internal IS vector
>>> should have a bit high when there are unanswered CQ entries and the
>>> vector is not masked. The table then states that the MSI should be
>>> sent only when a bit in the IS vector rises, meaning it either had
>>> entries and was unmasked or it did not have entries and an entry came
>>> in. I presume that was to reduce traffic in a very overloaded system.
>>> This is for MSI and legacy only, of course, MSI-X uses a different
>>> mechanism.
>>>
>>> Now, there is a window that we noticed. After the interrupt was
>>> triggered it starts reading the CQs. It takes a few hundred
>>> nanoseconds from the time the CQs have been read to the time the
>>> doorbell arives at the controller, and the controller will take time
>>> to process it as well, probably up to a few microsencods. If the
>>> controller decides to write a new entry in a CQ in this time the
>>> corresponding bit in the IS vector will already be high, therefore
>>> there should be no new MSI. The host though already checked the CQs so
>>> it will not see that new entries came in.
>>>
>>> We believe that is why section 7.5.1.1 states that the host should
>>> mask interrupts and then release them. This way the host forces the
>>> bits in the IS vector in the controller to go low and high again (see
>>> section 7.5.1). If the host did not answer every single CQ entry, then
>>> when the INTMC register is written a new MSI will be issued.
>>
>> Argh, the spec is buggy.  It should say that if the CQ doorbell write is
>> less than the controller's notion of the CQ head, that the controller
>> should send another interrupt.  I've sent in a request to the NVMe
>> workgroup that we do an ECN to fix this.
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2015-01-20 19:13         ` Angelo Brito
@ 2015-01-20 22:35           ` Keith Busch
  2015-01-21  0:08             ` Angelo Brito
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2015-01-20 22:35 UTC (permalink / raw)


The ECN request was sent 24 October, which was after the 1.2 revision
was under review for ratification.

On Tue, 20 Jan 2015, Angelo Brito wrote:
> Hi all,
>
> So I just checked that a new version of the Specification 1.2 was
> released in November and no modifications were done for this issue.
>
> Do you guys have any news?
>
>
> Regards,
> Angelo Silva Brito.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Questions on Interruption handling
  2015-01-20 22:35           ` Keith Busch
@ 2015-01-21  0:08             ` Angelo Brito
  0 siblings, 0 replies; 11+ messages in thread
From: Angelo Brito @ 2015-01-21  0:08 UTC (permalink / raw)


Thanks Keith!



Regards,
Angelo Silva Brito.
Graduate in Engenharia da Computa??o - UFPE
http://about.me/angelobrito
_________________________________________________


On Tue, Jan 20, 2015@7:35 PM, Keith Busch <keith.busch@intel.com> wrote:
> The ECN request was sent 24 October, which was after the 1.2 revision
> was under review for ratification.
>
>
> On Tue, 20 Jan 2015, Angelo Brito wrote:
>>
>> Hi all,
>>
>> So I just checked that a new version of the Specification 1.2 was
>> released in November and no modifications were done for this issue.
>>
>> Do you guys have any news?
>>
>>
>> Regards,
>> Angelo Silva Brito.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-21  0:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-22 17:43 Questions on Interruption handling Angelo Brito
2014-10-22 18:10 ` Keith Busch
2014-10-23 13:30   ` Angelo Brito
2014-10-23 14:26   ` Angelo Brito
2014-10-23 16:53 ` Matthew Wilcox
2014-10-24 16:51   ` Angelo Brito
2014-10-24 17:49     ` Matthew Wilcox
2014-10-24 17:59       ` Angelo Brito
2015-01-20 19:13         ` Angelo Brito
2015-01-20 22:35           ` Keith Busch
2015-01-21  0:08             ` Angelo Brito

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.