* [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested @ 2013-04-29 4:31 Alexander Gordeev 2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev 2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev 0 siblings, 2 replies; 7+ messages in thread From: Alexander Gordeev @ 2013-04-29 4:31 UTC (permalink / raw) To: linux-kernel Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel, Jan Beulich, Ingo Molnar, Bjorn Helgaas Hi Gentleman, This update will allow to conserve interrupt resources when PCI device uses multiple-MSI mode and sends lesser power-of-two MSIs. I have held this off for some time, since there were no such usages (at least known to me). But recently PLX Technology confirmed they do have, i.e. their new PEX8796 chip needs 18 MSIs. The series is against the tip/apic. Alexander Gordeev (2): PCI/MSI: Allocate as many multiple-MSIs as requested x86/MSI: Allocate as many multiple-MSIs as requested drivers/iommu/irq_remapping.c | 7 ++++--- drivers/pci/msi.c | 10 ++++++++-- include/linux/msi.h | 1 + 3 files changed, 13 insertions(+), 5 deletions(-) -- 1.7.7.6 -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -tip/apic 1/2] PCI/MSI: Allocate as many multiple-MSIs as requested 2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev @ 2013-04-29 4:32 ` Alexander Gordeev 2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev 1 sibling, 0 replies; 7+ messages in thread From: Alexander Gordeev @ 2013-04-29 4:32 UTC (permalink / raw) To: linux-kernel Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel, Jan Beulich, Ingo Molnar, Bjorn Helgaas When multiple MSIs are enabled with pci_enable_msi_block(), the requested number of interrupts 'nvec' is rounded up to the nearest power-of-two value. The result is then used for setting up the number of MSI messages in the PCI device and allocation of interrupt resources in the operating system (i.e. vector numbers). Thus, in cases when a device driver requests some number of MSIs and this number is not a power-of-two value, the extra operating system resources (allocated as the result of rounding) are wasted. This fix introduces 'msi_desc::nvec' field to address the above issue. When non-zero, it will report the actual number of MSIs the device will send, as requested by the device driver. This value should be used by architectures to properly set up and tear down associated interrupt resources. Note, although the existing 'msi_desc::multiple' field might seem redundant, in fact in does not. In general case the number of MSIs a PCI device is initialized with is not necessarily the closest power- of-two value of the number of MSIs the device will send. Thus, in theory it would not be always possible to derive the former from the latter and we need to keep them both, to stress this corner case. Besides, since 'msi_desc::multiple' is a bitfield, throwing it out would not save us any space. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> --- drivers/pci/msi.c | 10 ++++++++-- include/linux/msi.h | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 00cc78c..014b9d5 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev) int i, nvec; if (entry->irq == 0) continue; - nvec = 1 << entry->msi_attrib.multiple; + if (entry->nvec) + nvec = entry->nvec; + else + nvec = 1 << entry->msi_attrib.multiple; for (i = 0; i < nvec; i++) arch_teardown_msi_irq(entry->irq + i); } @@ -340,7 +343,10 @@ static void free_msi_irqs(struct pci_dev *dev) int i, nvec; if (!entry->irq) continue; - nvec = 1 << entry->msi_attrib.multiple; + if (entry->nvec) + nvec = entry->nvec; + else + nvec = 1 << entry->msi_attrib.multiple; #ifdef CONFIG_GENERIC_HARDIRQS for (i = 0; i < nvec; i++) BUG_ON(irq_has_action(entry->irq + i)); diff --git a/include/linux/msi.h b/include/linux/msi.h index ce93a34..0e20dfc 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -35,6 +35,7 @@ struct msi_desc { u32 masked; /* mask bits */ unsigned int irq; + unsigned int nvec; /* number of messages */ struct list_head list; union { -- 1.7.7.6 -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested 2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev 2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev @ 2013-04-29 4:33 ` Alexander Gordeev 2013-04-29 7:22 ` Jan Beulich 1 sibling, 1 reply; 7+ messages in thread From: Alexander Gordeev @ 2013-04-29 4:33 UTC (permalink / raw) To: linux-kernel Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel, Jan Beulich, Ingo Molnar, Bjorn Helgaas Signed-off-by: Alexander Gordeev <agordeev@redhat.com> --- drivers/iommu/irq_remapping.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index d56f8c1..315c563 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec) unsigned int irq; struct msi_desc *msidesc; - nvec = __roundup_pow_of_two(nvec); - WARN_ON(!list_is_singular(&dev->msi_list)); msidesc = list_entry(dev->msi_list.next, struct msi_desc, list); WARN_ON(msidesc->irq); WARN_ON(msidesc->msi_attrib.multiple); + WARN_ON(msidesc->nvec); node = dev_to_node(&dev->dev); irq = __create_irqs(get_nr_irqs_gsi(), nvec, node); if (irq == 0) return -ENOSPC; - msidesc->msi_attrib.multiple = ilog2(nvec); + msidesc->nvec = nvec; + msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec)); for (sub_handle = 0; sub_handle < nvec; sub_handle++) { if (!sub_handle) { index = msi_alloc_remapped_irq(dev, irq, nvec); @@ -95,6 +95,7 @@ error: * IRQs from tearing down again in default_teardown_msi_irqs() */ msidesc->irq = 0; + msidesc->nvec = 0; msidesc->msi_attrib.multiple = 0; return ret; -- 1.7.7.6 -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested 2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev @ 2013-04-29 7:22 ` Jan Beulich 2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev 0 siblings, 1 reply; 7+ messages in thread From: Jan Beulich @ 2013-04-29 7:22 UTC (permalink / raw) To: Alexander Gordeev Cc: Joerg Roedel, Bjorn Helgaas, Suresh Siddha, x86, Yinghai Lu, Ingo Molnar, linux-kernel, linux-pci >>> On 29.04.13 at 06:33, Alexander Gordeev <agordeev@redhat.com> wrote: > --- a/drivers/iommu/irq_remapping.c > +++ b/drivers/iommu/irq_remapping.c > @@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec) > unsigned int irq; > struct msi_desc *msidesc; > > - nvec = __roundup_pow_of_two(nvec); > - > WARN_ON(!list_is_singular(&dev->msi_list)); > msidesc = list_entry(dev->msi_list.next, struct msi_desc, list); > WARN_ON(msidesc->irq); > WARN_ON(msidesc->msi_attrib.multiple); > + WARN_ON(msidesc->nvec); > > node = dev_to_node(&dev->dev); > irq = __create_irqs(get_nr_irqs_gsi(), nvec, node); > if (irq == 0) > return -ENOSPC; > > - msidesc->msi_attrib.multiple = ilog2(nvec); > + msidesc->nvec = nvec; > + msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec)); > for (sub_handle = 0; sub_handle < nvec; sub_handle++) { > if (!sub_handle) { > index = msi_alloc_remapped_irq(dev, irq, nvec); This breaks the interface to IOMMU-specific code: While Intel's implementation does bump the number of allocated IRTEs to a power of 2, AMD's doesn't, and hence the tail entries in the block that don't get allocated here can get used for another device, thus creating a security hole when both devices aren't owned by the same guest (with the host being considered a special kind of guest for this purpose). IOW, while you can conserve on the number of vectors allocated, you can't on the IRTEs, and I think this should be taken care of in the generic IOMMU code, not in the individual vendor implementations. Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested 2013-04-29 7:22 ` Jan Beulich @ 2013-04-30 11:11 ` Alexander Gordeev 2013-05-02 12:36 ` Joerg Roedel 0 siblings, 1 reply; 7+ messages in thread From: Alexander Gordeev @ 2013-04-30 11:11 UTC (permalink / raw) To: linux-kernel Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel, Jan Beulich, Ingo Molnar, Bjorn Helgaas Signed-off-by: Alexander Gordeev <agordeev@redhat.com> --- drivers/iommu/irq_remapping.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index d56f8c1..9eeb6cf 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void) static int do_setup_msi_irqs(struct pci_dev *dev, int nvec) { - int node, ret, sub_handle, index = 0; + int node, ret, sub_handle, nvec_pow2, index = 0; unsigned int irq; struct msi_desc *msidesc; - nvec = __roundup_pow_of_two(nvec); - WARN_ON(!list_is_singular(&dev->msi_list)); msidesc = list_entry(dev->msi_list.next, struct msi_desc, list); WARN_ON(msidesc->irq); WARN_ON(msidesc->msi_attrib.multiple); + WARN_ON(msidesc->nvec); node = dev_to_node(&dev->dev); irq = __create_irqs(get_nr_irqs_gsi(), nvec, node); if (irq == 0) return -ENOSPC; - msidesc->msi_attrib.multiple = ilog2(nvec); + nvec_pow2 = __roundup_pow_of_two(nvec); + msidesc->nvec = nvec; + msidesc->msi_attrib.multiple = ilog2(nvec_pow2); for (sub_handle = 0; sub_handle < nvec; sub_handle++) { if (!sub_handle) { - index = msi_alloc_remapped_irq(dev, irq, nvec); + index = msi_alloc_remapped_irq(dev, irq, nvec_pow2); if (index < 0) { ret = index; goto error; @@ -95,6 +96,7 @@ error: * IRQs from tearing down again in default_teardown_msi_irqs() */ msidesc->irq = 0; + msidesc->nvec = 0; msidesc->msi_attrib.multiple = 0; return ret; -- 1.7.7.6 -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested 2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev @ 2013-05-02 12:36 ` Joerg Roedel 2013-05-03 7:12 ` [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs Alexander Gordeev 0 siblings, 1 reply; 7+ messages in thread From: Joerg Roedel @ 2013-05-02 12:36 UTC (permalink / raw) To: Alexander Gordeev Cc: linux-kernel, x86, linux-pci, Suresh Siddha, Yinghai Lu, Jan Beulich, Ingo Molnar, Bjorn Helgaas On Tue, Apr 30, 2013 at 01:11:11PM +0200, Alexander Gordeev wrote: > Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Please add a proper commit message explaining why this change is necessary or what problem it fixes. Joerg ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs 2013-05-02 12:36 ` Joerg Roedel @ 2013-05-03 7:12 ` Alexander Gordeev 0 siblings, 0 replies; 7+ messages in thread From: Alexander Gordeev @ 2013-05-03 7:12 UTC (permalink / raw) To: Joerg Roedel Cc: linux-kernel, x86, linux-pci, Suresh Siddha, Yinghai Lu, Jan Beulich, Ingo Molnar, Bjorn Helgaas Current multiple-MSI implementation does not take into account actual number of requested MSIs and always rounds that number to a closest power-of-two value. Yet, a number of MSIs a PCI device could send (and therefore a number of messages a device driver could request) may be a lesser power-of-two. As result, resources allocated for extra MSIs just wasted. This update takes advantage of 'msi_desc::nvec' field introduced with generic MSI code to track number of requested and used MSIs. As result, resources associated with interrupts are conserved. Of those resources most noticeable are x86 interrupt vectors. The initial version of this fix also consumed on IRTEs, but Jan noticed that a malfunctioning PCI device might send a message number it did not claim and thus refer an IRTE it does not own. To avoid this security hole the old-way approach preserved and as many IRTEs are reserved as the device could possibly send. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> --- drivers/iommu/irq_remapping.c | 12 +++++++----- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index d56f8c1..9eeb6cf 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void) static int do_setup_msi_irqs(struct pci_dev *dev, int nvec) { - int node, ret, sub_handle, index = 0; + int node, ret, sub_handle, nvec_pow2, index = 0; unsigned int irq; struct msi_desc *msidesc; - nvec = __roundup_pow_of_two(nvec); - WARN_ON(!list_is_singular(&dev->msi_list)); msidesc = list_entry(dev->msi_list.next, struct msi_desc, list); WARN_ON(msidesc->irq); WARN_ON(msidesc->msi_attrib.multiple); + WARN_ON(msidesc->nvec); node = dev_to_node(&dev->dev); irq = __create_irqs(get_nr_irqs_gsi(), nvec, node); if (irq == 0) return -ENOSPC; - msidesc->msi_attrib.multiple = ilog2(nvec); + nvec_pow2 = __roundup_pow_of_two(nvec); + msidesc->nvec = nvec; + msidesc->msi_attrib.multiple = ilog2(nvec_pow2); for (sub_handle = 0; sub_handle < nvec; sub_handle++) { if (!sub_handle) { - index = msi_alloc_remapped_irq(dev, irq, nvec); + index = msi_alloc_remapped_irq(dev, irq, nvec_pow2); if (index < 0) { ret = index; goto error; @@ -95,6 +96,7 @@ error: * IRQs from tearing down again in default_teardown_msi_irqs() */ msidesc->irq = 0; + msidesc->nvec = 0; msidesc->msi_attrib.multiple = 0; return ret; -- 1.7.7.6 -- Regards, Alexander Gordeev agordeev@redhat.com ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-05-03 7:11 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev 2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev 2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev 2013-04-29 7:22 ` Jan Beulich 2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev 2013-05-02 12:36 ` Joerg Roedel 2013-05-03 7:12 ` [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs Alexander Gordeev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).