* IRQ affinity enforced only after first interrupt.
@ 2012-03-26 9:06 Yevgeny Petrilin
2012-03-26 14:28 ` Bjorn Helgaas
0 siblings, 1 reply; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-03-26 9:06 UTC (permalink / raw)
To: linux-pci
Hello,
I'm working on an issue where affinity changes to IRQ only have effect after the first interrupt which still happens on the original core.
I understand that the decision regarding it takes place in this code:
if (irq_can_move_pcntxt(data)) {
ret = chip->irq_set_affinity(data, mask, false);
switch (ret) {
case IRQ_SET_MASK_OK:
cpumask_copy(data->affinity, mask);
case IRQ_SET_MASK_OK_NOCOPY:
irq_set_thread_affinity(desc);
ret = 0;
}
} else {
irqd_set_move_pending(data);
irq_copy_pending(desc, mask);
}
Which means that the "IRQD_MOVE_PCNTXT" flag is not set in irq_data->state_use_accessors.
I was able to add this flag using irq_modify_status(), which is probably not the way to go.
This option also doesn't exist in older kernels (2.6.32)
So the question is, when irq_desc is created, how is it determined that "IRQD_MOVE_PCNTXT" flag is set?
Thanks,
Yevgeny
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: IRQ affinity enforced only after first interrupt. 2012-03-26 9:06 IRQ affinity enforced only after first interrupt Yevgeny Petrilin @ 2012-03-26 14:28 ` Bjorn Helgaas 2012-03-26 14:33 ` Jiang Liu 0 siblings, 1 reply; 9+ messages in thread From: Bjorn Helgaas @ 2012-03-26 14:28 UTC (permalink / raw) To: Yevgeny Petrilin; +Cc: linux-pci, Thomas Gleixner, linux-kernel [This is not really a PCI question, so +cc Thomas, LKML.] On Mon, Mar 26, 2012 at 3:06 AM, Yevgeny Petrilin <yevgenyp@mellanox.co.il> wrote: > Hello, > > I'm working on an issue where affinity changes to IRQ only have effect after the first interrupt which still happens on the original core. > I understand that the decision regarding it takes place in this code: > > if (irq_can_move_pcntxt(data)) { > ret = chip->irq_set_affinity(data, mask, false); > switch (ret) { > case IRQ_SET_MASK_OK: > cpumask_copy(data->affinity, mask); > case IRQ_SET_MASK_OK_NOCOPY: > irq_set_thread_affinity(desc); > ret = 0; > } > } else { > irqd_set_move_pending(data); > irq_copy_pending(desc, mask); > } > > Which means that the "IRQD_MOVE_PCNTXT" flag is not set in irq_data->state_use_accessors. > I was able to add this flag using irq_modify_status(), which is probably not the way to go. > This option also doesn't exist in older kernels (2.6.32) > > So the question is, when irq_desc is created, how is it determined that "IRQD_MOVE_PCNTXT" flag is set? > > Thanks, > Yevgeny > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IRQ affinity enforced only after first interrupt. 2012-03-26 14:28 ` Bjorn Helgaas @ 2012-03-26 14:33 ` Jiang Liu 2012-03-26 15:24 ` Yevgeny Petrilin 0 siblings, 1 reply; 9+ messages in thread From: Jiang Liu @ 2012-03-26 14:33 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: Yevgeny Petrilin, linux-pci, Thomas Gleixner, linux-kernel The architecture specific code will determine whether the IRQ could be migrated in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 systems if interrupt remapping is enabled. On 03/26/2012 10:28 PM, Bjorn Helgaas wrote: > [This is not really a PCI question, so +cc Thomas, LKML.] > > On Mon, Mar 26, 2012 at 3:06 AM, Yevgeny Petrilin > <yevgenyp@mellanox.co.il> wrote: >> Hello, >> >> I'm working on an issue where affinity changes to IRQ only have effect after the first interrupt which still happens on the original core. >> I understand that the decision regarding it takes place in this code: >> >> if (irq_can_move_pcntxt(data)) { >> ret = chip->irq_set_affinity(data, mask, false); >> switch (ret) { >> case IRQ_SET_MASK_OK: >> cpumask_copy(data->affinity, mask); >> case IRQ_SET_MASK_OK_NOCOPY: >> irq_set_thread_affinity(desc); >> ret = 0; >> } >> } else { >> irqd_set_move_pending(data); >> irq_copy_pending(desc, mask); >> } >> >> Which means that the "IRQD_MOVE_PCNTXT" flag is not set in irq_data->state_use_accessors. >> I was able to add this flag using irq_modify_status(), which is probably not the way to go. >> This option also doesn't exist in older kernels (2.6.32) >> >> So the question is, when irq_desc is created, how is it determined that "IRQD_MOVE_PCNTXT" flag is set? >> >> Thanks, >> Yevgeny >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: IRQ affinity enforced only after first interrupt. 2012-03-26 14:33 ` Jiang Liu @ 2012-03-26 15:24 ` Yevgeny Petrilin 2012-03-26 19:04 ` Thomas Gleixner 0 siblings, 1 reply; 9+ messages in thread From: Yevgeny Petrilin @ 2012-03-26 15:24 UTC (permalink / raw) To: Jiang Liu, Bjorn Helgaas Cc: linux-pci@vger.kernel.org, Thomas Gleixner, linux-kernel@vger.kernel.org, Yael Shenhav > > The architecture specific code will determine whether the IRQ could be migrated > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > systems if interrupt remapping is enabled. Actually I am encountering this issue with x86, and see different behavior with different HW devices (NICs). On same machine I have one device that responds immediately to affinity changes while the other one changes the affinity only after first interrupt. > > On 03/26/2012 10:28 PM, Bjorn Helgaas wrote: > > [This is not really a PCI question, so +cc Thomas, LKML.] > > > > On Mon, Mar 26, 2012 at 3:06 AM, Yevgeny Petrilin > > <yevgenyp@mellanox.co.il> wrote: > >> Hello, > >> > >> I'm working on an issue where affinity changes to IRQ only have effect after the first interrupt which still happens on the original core. > >> I understand that the decision regarding it takes place in this code: > >> > >> if (irq_can_move_pcntxt(data)) { > >> ret = chip->irq_set_affinity(data, mask, false); > >> switch (ret) { > >> case IRQ_SET_MASK_OK: > >> cpumask_copy(data->affinity, mask); > >> case IRQ_SET_MASK_OK_NOCOPY: > >> irq_set_thread_affinity(desc); > >> ret = 0; > >> } > >> } else { > >> irqd_set_move_pending(data); > >> irq_copy_pending(desc, mask); > >> } > >> > >> Which means that the "IRQD_MOVE_PCNTXT" flag is not set in irq_data->state_use_accessors. > >> I was able to add this flag using irq_modify_status(), which is probably not the way to go. > >> This option also doesn't exist in older kernels (2.6.32) > >> > >> So the question is, when irq_desc is created, how is it determined that "IRQD_MOVE_PCNTXT" flag is set? > >> > >> Thanks, > >> Yevgeny ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: IRQ affinity enforced only after first interrupt. 2012-03-26 15:24 ` Yevgeny Petrilin @ 2012-03-26 19:04 ` Thomas Gleixner 2012-03-27 9:39 ` Yevgeny Petrilin 2012-04-04 10:01 ` Alexander Gordeev 0 siblings, 2 replies; 9+ messages in thread From: Thomas Gleixner @ 2012-03-26 19:04 UTC (permalink / raw) To: Yevgeny Petrilin Cc: Jiang Liu, Bjorn Helgaas, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Yael Shenhav On Mon, 26 Mar 2012, Yevgeny Petrilin wrote: > > > > The architecture specific code will determine whether the IRQ could be migrated > > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > > systems if interrupt remapping is enabled. > > Actually I am encountering this issue with x86, and see different > behavior with different HW devices (NICs). On same machine I have > one device that responds immediately to affinity changes while the > other one changes the affinity only after first interrupt. That simply depends on the underlying hardware. On certain hardware we can change the affinity only in hard interrupt context, that means right when a interrupt of that device is delivered. On the other devices we can change it right away and the corresponding interrupt chips set IRQ_MOVE_PCNTXT to indicate that. There is nothing we can do about this. It's dictated by hardware. Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: IRQ affinity enforced only after first interrupt. 2012-03-26 19:04 ` Thomas Gleixner @ 2012-03-27 9:39 ` Yevgeny Petrilin 2012-03-27 12:52 ` Thomas Gleixner 2012-04-04 10:01 ` Alexander Gordeev 1 sibling, 1 reply; 9+ messages in thread From: Yevgeny Petrilin @ 2012-03-27 9:39 UTC (permalink / raw) To: Thomas Gleixner Cc: Jiang Liu, Bjorn Helgaas, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Yael Shenhav > > > > > > The architecture specific code will determine whether the IRQ could be migrated > > > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > > > systems if interrupt remapping is enabled. > > > > Actually I am encountering this issue with x86, and see different > > behavior with different HW devices (NICs). On same machine I have > > one device that responds immediately to affinity changes while the > > other one changes the affinity only after first interrupt. > > That simply depends on the underlying hardware. On certain hardware we > can change the affinity only in hard interrupt context, that means > right when a interrupt of that device is delivered. > > On the other devices we can change it right away and the corresponding > interrupt chips set IRQ_MOVE_PCNTXT to indicate that. > > There is nothing we can do about this. It's dictated by hardware. > Thanks for the explanation, Which capabilities of the HW show whether IRQ_MOVE_PCNTXT can be set or not? Is it done by reading configuration from PCI? Thanks, Yevgeny ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: IRQ affinity enforced only after first interrupt. 2012-03-27 9:39 ` Yevgeny Petrilin @ 2012-03-27 12:52 ` Thomas Gleixner 0 siblings, 0 replies; 9+ messages in thread From: Thomas Gleixner @ 2012-03-27 12:52 UTC (permalink / raw) To: Yevgeny Petrilin Cc: Jiang Liu, Bjorn Helgaas, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Yael Shenhav On Tue, 27 Mar 2012, Yevgeny Petrilin wrote: > > > > > > > > The architecture specific code will determine whether the IRQ could be migrated > > > > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > > > > systems if interrupt remapping is enabled. > > > > > > Actually I am encountering this issue with x86, and see different > > > behavior with different HW devices (NICs). On same machine I have > > > one device that responds immediately to affinity changes while the > > > other one changes the affinity only after first interrupt. > > > > That simply depends on the underlying hardware. On certain hardware we > > can change the affinity only in hard interrupt context, that means > > right when a interrupt of that device is delivered. > > > > On the other devices we can change it right away and the corresponding > > interrupt chips set IRQ_MOVE_PCNTXT to indicate that. > > > > There is nothing we can do about this. It's dictated by hardware. > > > > Thanks for the explanation, > Which capabilities of the HW show whether IRQ_MOVE_PCNTXT can be set or not? > Is it done by reading configuration from PCI? It's done by reading the specs of the interrupt controllers. This is not at PCI (device) level. It's a property of the interrupt controller (PIC, APIC, IOAPIC) and additional features like interrupt remapping. The device merily uses an interrupt, but it does not know at all which underlying interrupt controller is handling it. The only choice a device driver has is between pin based interrupts and Message Signaled Interrupts, when the hardware supports it. This information is retrieved from the PCI config space. Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IRQ affinity enforced only after first interrupt. 2012-03-26 19:04 ` Thomas Gleixner 2012-03-27 9:39 ` Yevgeny Petrilin @ 2012-04-04 10:01 ` Alexander Gordeev 2012-04-05 8:47 ` Thomas Gleixner 1 sibling, 1 reply; 9+ messages in thread From: Alexander Gordeev @ 2012-04-04 10:01 UTC (permalink / raw) To: Thomas Gleixner Cc: Yevgeny Petrilin, Jiang Liu, Bjorn Helgaas, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Yael Shenhav On Mon, Mar 26, 2012 at 09:04:20PM +0200, Thomas Gleixner wrote: > On Mon, 26 Mar 2012, Yevgeny Petrilin wrote: > > > > > > > The architecture specific code will determine whether the IRQ could be migrated > > > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > > > systems if interrupt remapping is enabled. > > > > Actually I am encountering this issue with x86, and see different > > behavior with different HW devices (NICs). On same machine I have > > one device that responds immediately to affinity changes while the > > other one changes the affinity only after first interrupt. > > That simply depends on the underlying hardware. On certain hardware we > can change the affinity only in hard interrupt context, that means > right when a interrupt of that device is delivered. > > On the other devices we can change it right away and the corresponding > interrupt chips set IRQ_MOVE_PCNTXT to indicate that. Actually, even with IRQ_MOVE_PCNTXT capable chips, a hardware handler still might be called on a core that belongs to old affinity, after the successful write of new affinity. Threaded handlers are also racy with irq affinity updates. If that is inconsistency, bug or design? > There is nothing we can do about this. It's dictated by hardware. May be we could wait for desc->pending_mask to be cleared before returning from irq_set_affinity()? > Thanks, > > tglx > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: IRQ affinity enforced only after first interrupt. 2012-04-04 10:01 ` Alexander Gordeev @ 2012-04-05 8:47 ` Thomas Gleixner 0 siblings, 0 replies; 9+ messages in thread From: Thomas Gleixner @ 2012-04-05 8:47 UTC (permalink / raw) To: Alexander Gordeev Cc: Yevgeny Petrilin, Jiang Liu, Bjorn Helgaas, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Yael Shenhav On Wed, 4 Apr 2012, Alexander Gordeev wrote: > On Mon, Mar 26, 2012 at 09:04:20PM +0200, Thomas Gleixner wrote: > > On Mon, 26 Mar 2012, Yevgeny Petrilin wrote: > > > > > > > > > > The architecture specific code will determine whether the IRQ could be migrated > > > > in process context. For example, the IRQ_MOVE_PCNTXT flag will be set on x86 > > > > systems if interrupt remapping is enabled. > > > > > > Actually I am encountering this issue with x86, and see different > > > behavior with different HW devices (NICs). On same machine I have > > > one device that responds immediately to affinity changes while the > > > other one changes the affinity only after first interrupt. > > > > That simply depends on the underlying hardware. On certain hardware we > > can change the affinity only in hard interrupt context, that means > > right when a interrupt of that device is delivered. > > > > On the other devices we can change it right away and the corresponding > > interrupt chips set IRQ_MOVE_PCNTXT to indicate that. > > Actually, even with IRQ_MOVE_PCNTXT capable chips, a hardware handler still > might be called on a core that belongs to old affinity, after the successful > write of new affinity. Threaded handlers are also racy with irq affinity > updates. > > If that is inconsistency, bug or design? Well, irq affinity updates are not designed to be immediate. There is no point in doing so. > > There is nothing we can do about this. It's dictated by hardware. > > May be we could wait for desc->pending_mask to be cleared before returning from > irq_set_affinity()? If that device does not issue an interrupt for a long time, e.g. because the interface is down, then you are stuck there forever. What's the point of this? One interrupt on the wrong core is nothing we need to worry about. Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-04-05 8:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-26 9:06 IRQ affinity enforced only after first interrupt Yevgeny Petrilin 2012-03-26 14:28 ` Bjorn Helgaas 2012-03-26 14:33 ` Jiang Liu 2012-03-26 15:24 ` Yevgeny Petrilin 2012-03-26 19:04 ` Thomas Gleixner 2012-03-27 9:39 ` Yevgeny Petrilin 2012-03-27 12:52 ` Thomas Gleixner 2012-04-04 10:01 ` Alexander Gordeev 2012-04-05 8:47 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).