* A problem about SPI Interrupt Configuration @ 2016-01-20 2:38 Yang Yingliang 2016-01-22 7:57 ` Marc Zyngier 0 siblings, 1 reply; 4+ messages in thread From: Yang Yingliang @ 2016-01-20 2:38 UTC (permalink / raw) To: linux-arm-kernel Hi, Marc I got some error messages "RWP timeout, gone fishing". The case is : CPU0 CPU1 acquire desc->lock in __setup_irq() enable irq in __setup_irq() read iar in gic_handle_irq() waiting for desc->lock in handle_fasteoi_irq() call gic_set_affinity() from setup_affinity() waiting for the irq deactive in gic_do_wait_for_rwp() The hardware will not clear GICD_CTLR.RWP until the interrupt is not active. The interrupt is keeping active while it's waiting for desc->lock on cpu0. But the lock is hold by cpu1 while it's waiting for the interrupt is not active. It causes a deadlock here in 1s. And the GICv3 SPEC says: 4.5.5 SPI Interrupt Configuration To configure an SPI interrupt, to ensure that interrupts are never distributed using partially updated configuration information, software must: o Ensure the interrupt is not active o Ensure that the interrupt is disabled o This might be done either by writing to GICD_CTLR to clear the enables for a group, or o By writing to GICD_ICENABLERn to clear the Enable bit of the interrupt (see section 5.3.11). o In both cases, software must poll GICD_CTLR.RWP to ensure the effects are visible (see section 5.3.20). o Program the routing (if appropriate), priority and group o Enable the interrupt (if required) Because it says "Ensure the interrupt is not active", so I can not tell it is a hardware or software problem. Can you please give some advice? Thanks, Yang ^ permalink raw reply [flat|nested] 4+ messages in thread
* A problem about SPI Interrupt Configuration 2016-01-20 2:38 A problem about SPI Interrupt Configuration Yang Yingliang @ 2016-01-22 7:57 ` Marc Zyngier 2016-01-28 2:32 ` Yang Yingliang 0 siblings, 1 reply; 4+ messages in thread From: Marc Zyngier @ 2016-01-22 7:57 UTC (permalink / raw) To: linux-arm-kernel On Wed, 20 Jan 2016 10:38:13 +0800 Yang Yingliang <yangyingliang@huawei.com> wrote: Hi Yang, > Hi, Marc > > I got some error messages "RWP timeout, gone fishing". > > The case is : > > CPU0 CPU1 > acquire desc->lock in __setup_irq() > enable irq in __setup_irq() > read iar in gic_handle_irq() > waiting for desc->lock in handle_fasteoi_irq() > call gic_set_affinity() from setup_affinity() > waiting for the irq deactive in gic_do_wait_for_rwp() > > > The hardware will not clear GICD_CTLR.RWP until the interrupt is not > active. The interrupt is keeping active while it's waiting for > desc->lock on cpu0. But the lock is hold by cpu1 while it's waiting for > the interrupt is not active. It causes a deadlock here in 1s. > > > And the GICv3 SPEC says: > > 4.5.5 SPI Interrupt Configuration > To configure an SPI interrupt, to ensure that interrupts are never > distributed using partially updated configuration > information, software must: > o Ensure the interrupt is not active > o Ensure that the interrupt is disabled > o This might be done either by writing to GICD_CTLR to clear the enables > for a group, or > o By writing to GICD_ICENABLERn to clear the Enable bit of the interrupt > (see section 5.3.11). > o In both cases, software must poll GICD_CTLR.RWP to ensure the effects > are visible (see section > 5.3.20). > o Program the routing (if appropriate), priority and group > o Enable the interrupt (if required) > > Because it says "Ensure the interrupt is not active", so I can not tell > it is a hardware or software problem. > > Can you please give some advice? Thanks for the accurate description of the problem. This looks to be a core issue, or at least a problem between core code and the way the GIC behaves, unfortunately. The architecture expects the interrupt to be fully configured before it is enabled and made active, while the core code does this the other way around. Can you please have a go at the patch below and let me know if it improve things? This is just a test, and definitely not the complete solution, but I'd like to find out if I'm on the right track. Thanks, M. diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 0eebaee..e5802fb 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1303,12 +1303,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) if (new->flags & IRQF_ONESHOT) desc->istate |= IRQS_ONESHOT; - if (irq_settings_can_autoenable(desc)) - irq_startup(desc, true); - else - /* Undo nested disables: */ - desc->depth = 1; - /* Exclude IRQ from balancing if requested */ if (new->flags & IRQF_NOBALANCING) { irq_settings_set_no_balancing(desc); @@ -1318,6 +1312,12 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) /* Set default affinity mask once everything is setup */ setup_affinity(desc, mask); + if (irq_settings_can_autoenable(desc)) + irq_startup(desc, true); + else + /* Undo nested disables: */ + desc->depth = 1; + } else if (new->flags & IRQF_TRIGGER_MASK) { unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK; unsigned int omsk = irq_settings_get_trigger_mask(desc); -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 4+ messages in thread
* A problem about SPI Interrupt Configuration 2016-01-22 7:57 ` Marc Zyngier @ 2016-01-28 2:32 ` Yang Yingliang 2016-01-28 8:50 ` Marc Zyngier 0 siblings, 1 reply; 4+ messages in thread From: Yang Yingliang @ 2016-01-28 2:32 UTC (permalink / raw) To: linux-arm-kernel On 2016/1/22 15:57, Marc Zyngier wrote: > On Wed, 20 Jan 2016 10:38:13 +0800 > Yang Yingliang <yangyingliang@huawei.com> wrote: > > Hi Yang, > >> Hi, Marc >> >> I got some error messages "RWP timeout, gone fishing". >> >> The case is : >> >> CPU0 CPU1 >> acquire desc->lock in __setup_irq() >> enable irq in __setup_irq() >> read iar in gic_handle_irq() >> waiting for desc->lock in handle_fasteoi_irq() >> call gic_set_affinity() from setup_affinity() >> waiting for the irq deactive in gic_do_wait_for_rwp() >> >> >> The hardware will not clear GICD_CTLR.RWP until the interrupt is not >> active. The interrupt is keeping active while it's waiting for >> desc->lock on cpu0. But the lock is hold by cpu1 while it's waiting for >> the interrupt is not active. It causes a deadlock here in 1s. >> >> >> And the GICv3 SPEC says: >> >> 4.5.5 SPI Interrupt Configuration >> To configure an SPI interrupt, to ensure that interrupts are never >> distributed using partially updated configuration >> information, software must: >> o Ensure the interrupt is not active >> o Ensure that the interrupt is disabled >> o This might be done either by writing to GICD_CTLR to clear the enables >> for a group, or >> o By writing to GICD_ICENABLERn to clear the Enable bit of the interrupt >> (see section 5.3.11). >> o In both cases, software must poll GICD_CTLR.RWP to ensure the effects >> are visible (see section >> 5.3.20). >> o Program the routing (if appropriate), priority and group >> o Enable the interrupt (if required) >> >> Because it says "Ensure the interrupt is not active", so I can not tell >> it is a hardware or software problem. >> >> Can you please give some advice? > > Thanks for the accurate description of the problem. This looks to be a > core issue, or at least a problem between core code and the way the GIC > behaves, unfortunately. The architecture expects the interrupt to be > fully configured before it is enabled and made active, while the core > code does this the other way around. > > Can you please have a go at the patch below and let me know if it > improve things? This is just a test, and definitely not the complete > solution, but I'd like to find out if I'm on the right track. > > Thanks, > > M. > > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c > index 0eebaee..e5802fb 100644 > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1303,12 +1303,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) > if (new->flags & IRQF_ONESHOT) > desc->istate |= IRQS_ONESHOT; > > - if (irq_settings_can_autoenable(desc)) > - irq_startup(desc, true); > - else > - /* Undo nested disables: */ > - desc->depth = 1; > - > /* Exclude IRQ from balancing if requested */ > if (new->flags & IRQF_NOBALANCING) { > irq_settings_set_no_balancing(desc); > @@ -1318,6 +1312,12 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) > /* Set default affinity mask once everything is setup */ > setup_affinity(desc, mask); > > + if (irq_settings_can_autoenable(desc)) > + irq_startup(desc, true); > + else > + /* Undo nested disables: */ > + desc->depth = 1; > + > } else if (new->flags & IRQF_TRIGGER_MASK) { > unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK; > unsigned int omsk = irq_settings_get_trigger_mask(desc); > This patch can avoid the case I said in this mail. But in other case like set affinity by the /proc/irq/x/smp_affinity, it's will have the same problem _if_ the hardware is waiting for deactive. Thanks, Yang ^ permalink raw reply [flat|nested] 4+ messages in thread
* A problem about SPI Interrupt Configuration 2016-01-28 2:32 ` Yang Yingliang @ 2016-01-28 8:50 ` Marc Zyngier 0 siblings, 0 replies; 4+ messages in thread From: Marc Zyngier @ 2016-01-28 8:50 UTC (permalink / raw) To: linux-arm-kernel On 28/01/16 02:32, Yang Yingliang wrote: > > > On 2016/1/22 15:57, Marc Zyngier wrote: >> On Wed, 20 Jan 2016 10:38:13 +0800 >> Yang Yingliang <yangyingliang@huawei.com> wrote: >> >> Hi Yang, >> >>> Hi, Marc >>> >>> I got some error messages "RWP timeout, gone fishing". >>> >>> The case is : >>> >>> CPU0 CPU1 >>> acquire desc->lock in __setup_irq() >>> enable irq in __setup_irq() >>> read iar in gic_handle_irq() >>> waiting for desc->lock in handle_fasteoi_irq() >>> call gic_set_affinity() from setup_affinity() >>> waiting for the irq deactive in gic_do_wait_for_rwp() >>> >>> >>> The hardware will not clear GICD_CTLR.RWP until the interrupt is not >>> active. The interrupt is keeping active while it's waiting for >>> desc->lock on cpu0. But the lock is hold by cpu1 while it's waiting for >>> the interrupt is not active. It causes a deadlock here in 1s. >>> >>> >>> And the GICv3 SPEC says: >>> >>> 4.5.5 SPI Interrupt Configuration >>> To configure an SPI interrupt, to ensure that interrupts are never >>> distributed using partially updated configuration >>> information, software must: >>> o Ensure the interrupt is not active >>> o Ensure that the interrupt is disabled >>> o This might be done either by writing to GICD_CTLR to clear the enables >>> for a group, or >>> o By writing to GICD_ICENABLERn to clear the Enable bit of the interrupt >>> (see section 5.3.11). >>> o In both cases, software must poll GICD_CTLR.RWP to ensure the effects >>> are visible (see section >>> 5.3.20). >>> o Program the routing (if appropriate), priority and group >>> o Enable the interrupt (if required) >>> >>> Because it says "Ensure the interrupt is not active", so I can not tell >>> it is a hardware or software problem. >>> >>> Can you please give some advice? >> >> Thanks for the accurate description of the problem. This looks to be a >> core issue, or at least a problem between core code and the way the GIC >> behaves, unfortunately. The architecture expects the interrupt to be >> fully configured before it is enabled and made active, while the core >> code does this the other way around. >> >> Can you please have a go at the patch below and let me know if it >> improve things? This is just a test, and definitely not the complete >> solution, but I'd like to find out if I'm on the right track. >> >> Thanks, >> >> M. >> >> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c >> index 0eebaee..e5802fb 100644 >> --- a/kernel/irq/manage.c >> +++ b/kernel/irq/manage.c >> @@ -1303,12 +1303,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) >> if (new->flags & IRQF_ONESHOT) >> desc->istate |= IRQS_ONESHOT; >> >> - if (irq_settings_can_autoenable(desc)) >> - irq_startup(desc, true); >> - else >> - /* Undo nested disables: */ >> - desc->depth = 1; >> - >> /* Exclude IRQ from balancing if requested */ >> if (new->flags & IRQF_NOBALANCING) { >> irq_settings_set_no_balancing(desc); >> @@ -1318,6 +1312,12 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) >> /* Set default affinity mask once everything is setup */ >> setup_affinity(desc, mask); >> >> + if (irq_settings_can_autoenable(desc)) >> + irq_startup(desc, true); >> + else >> + /* Undo nested disables: */ >> + desc->depth = 1; >> + >> } else if (new->flags & IRQF_TRIGGER_MASK) { >> unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK; >> unsigned int omsk = irq_settings_get_trigger_mask(desc); >> > > This patch can avoid the case I said in this mail. > But in other case like set affinity by the /proc/irq/x/smp_affinity, > it's will have the same problem _if_ the hardware is waiting for deactive. Indeed, and the more I think of it, the more I'm convinced that what you are looking at is a hardware bug. If RWP is gated by an interrupt being active, or the disable is gated by the interrupt being active, then you will end-up in all kind of horrible problems that software *cannot* solve. The text you quoted earlier is not something that can be enforced from a software perspective (it contains a race condition that cannot be avoided). Furthermore, take the example of a VM that has an active interrupt linked to a physical interrupt (HW bit set in the List Register) while you are changing the affinity on the host. Nothing guarantees that the deactivate will *ever* happen (the guest could decide it doesn't need to do it, or takes a very long time to do so). In that case, you will deadlock the same way, and there is nothing you can do from a SW PoV to solve it. I suggest you get back to your HW folks, and explain that what they have here is not a viable design. Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-28 8:50 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-20 2:38 A problem about SPI Interrupt Configuration Yang Yingliang 2016-01-22 7:57 ` Marc Zyngier 2016-01-28 2:32 ` Yang Yingliang 2016-01-28 8:50 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).