* [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested
@ 2013-04-29 4:31 Alexander Gordeev
2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev
2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev
0 siblings, 2 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-04-29 4:31 UTC (permalink / raw)
To: linux-kernel
Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
Hi Gentleman,
This update will allow to conserve interrupt resources when PCI
device uses multiple-MSI mode and sends lesser power-of-two MSIs.
I have held this off for some time, since there were no such usages
(at least known to me). But recently PLX Technology confirmed they
do have, i.e. their new PEX8796 chip needs 18 MSIs.
The series is against the tip/apic.
Alexander Gordeev (2):
PCI/MSI: Allocate as many multiple-MSIs as requested
x86/MSI: Allocate as many multiple-MSIs as requested
drivers/iommu/irq_remapping.c | 7 ++++---
drivers/pci/msi.c | 10 ++++++++--
include/linux/msi.h | 1 +
3 files changed, 13 insertions(+), 5 deletions(-)
--
1.7.7.6
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH -tip/apic 1/2] PCI/MSI: Allocate as many multiple-MSIs as requested
2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev
@ 2013-04-29 4:32 ` Alexander Gordeev
2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev
1 sibling, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-04-29 4:32 UTC (permalink / raw)
To: linux-kernel
Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
When multiple MSIs are enabled with pci_enable_msi_block(), the
requested number of interrupts 'nvec' is rounded up to the nearest
power-of-two value. The result is then used for setting up the
number of MSI messages in the PCI device and allocation of
interrupt resources in the operating system (i.e. vector numbers).
Thus, in cases when a device driver requests some number of MSIs
and this number is not a power-of-two value, the extra operating
system resources (allocated as the result of rounding) are wasted.
This fix introduces 'msi_desc::nvec' field to address the above
issue. When non-zero, it will report the actual number of MSIs the
device will send, as requested by the device driver. This value
should be used by architectures to properly set up and tear down
associated interrupt resources.
Note, although the existing 'msi_desc::multiple' field might seem
redundant, in fact in does not. In general case the number of MSIs a
PCI device is initialized with is not necessarily the closest power-
of-two value of the number of MSIs the device will send. Thus, in
theory it would not be always possible to derive the former from the
latter and we need to keep them both, to stress this corner case.
Besides, since 'msi_desc::multiple' is a bitfield, throwing it out
would not save us any space.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
drivers/pci/msi.c | 10 ++++++++--
include/linux/msi.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 00cc78c..014b9d5 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -79,7 +79,10 @@ void default_teardown_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (entry->irq == 0)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
for (i = 0; i < nvec; i++)
arch_teardown_msi_irq(entry->irq + i);
}
@@ -340,7 +343,10 @@ static void free_msi_irqs(struct pci_dev *dev)
int i, nvec;
if (!entry->irq)
continue;
- nvec = 1 << entry->msi_attrib.multiple;
+ if (entry->nvec)
+ nvec = entry->nvec;
+ else
+ nvec = 1 << entry->msi_attrib.multiple;
#ifdef CONFIG_GENERIC_HARDIRQS
for (i = 0; i < nvec; i++)
BUG_ON(irq_has_action(entry->irq + i));
diff --git a/include/linux/msi.h b/include/linux/msi.h
index ce93a34..0e20dfc 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -35,6 +35,7 @@ struct msi_desc {
u32 masked; /* mask bits */
unsigned int irq;
+ unsigned int nvec; /* number of messages */
struct list_head list;
union {
--
1.7.7.6
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested
2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev
2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev
@ 2013-04-29 4:33 ` Alexander Gordeev
2013-04-29 7:22 ` Jan Beulich
1 sibling, 1 reply; 7+ messages in thread
From: Alexander Gordeev @ 2013-04-29 4:33 UTC (permalink / raw)
To: linux-kernel
Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
drivers/iommu/irq_remapping.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..315c563 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
unsigned int irq;
struct msi_desc *msidesc;
- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);
node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;
- msidesc->msi_attrib.multiple = ilog2(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
index = msi_alloc_remapped_irq(dev, irq, nvec);
@@ -95,6 +95,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;
return ret;
--
1.7.7.6
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested
2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev
@ 2013-04-29 7:22 ` Jan Beulich
2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-04-29 7:22 UTC (permalink / raw)
To: Alexander Gordeev
Cc: Joerg Roedel, Bjorn Helgaas, Suresh Siddha, x86, Yinghai Lu,
Ingo Molnar, linux-kernel, linux-pci
>>> On 29.04.13 at 06:33, Alexander Gordeev <agordeev@redhat.com> wrote:
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -55,19 +55,19 @@ static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
> unsigned int irq;
> struct msi_desc *msidesc;
>
> - nvec = __roundup_pow_of_two(nvec);
> -
> WARN_ON(!list_is_singular(&dev->msi_list));
> msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
> WARN_ON(msidesc->irq);
> WARN_ON(msidesc->msi_attrib.multiple);
> + WARN_ON(msidesc->nvec);
>
> node = dev_to_node(&dev->dev);
> irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
> if (irq == 0)
> return -ENOSPC;
>
> - msidesc->msi_attrib.multiple = ilog2(nvec);
> + msidesc->nvec = nvec;
> + msidesc->msi_attrib.multiple = ilog2(__roundup_pow_of_two(nvec));
> for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
> if (!sub_handle) {
> index = msi_alloc_remapped_irq(dev, irq, nvec);
This breaks the interface to IOMMU-specific code: While Intel's
implementation does bump the number of allocated IRTEs to a
power of 2, AMD's doesn't, and hence the tail entries in the block
that don't get allocated here can get used for another device,
thus creating a security hole when both devices aren't owned by
the same guest (with the host being considered a special kind of
guest for this purpose).
IOW, while you can conserve on the number of vectors allocated,
you can't on the IRTEs, and I think this should be taken care of in
the generic IOMMU code, not in the individual vendor
implementations.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested
2013-04-29 7:22 ` Jan Beulich
@ 2013-04-30 11:11 ` Alexander Gordeev
2013-05-02 12:36 ` Joerg Roedel
0 siblings, 1 reply; 7+ messages in thread
From: Alexander Gordeev @ 2013-04-30 11:11 UTC (permalink / raw)
To: linux-kernel
Cc: x86, linux-pci, Suresh Siddha, Yinghai Lu, Joerg Roedel,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
drivers/iommu/irq_remapping.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..9eeb6cf 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void)
static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
{
- int node, ret, sub_handle, index = 0;
+ int node, ret, sub_handle, nvec_pow2, index = 0;
unsigned int irq;
struct msi_desc *msidesc;
- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);
node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;
- msidesc->msi_attrib.multiple = ilog2(nvec);
+ nvec_pow2 = __roundup_pow_of_two(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
- index = msi_alloc_remapped_irq(dev, irq, nvec);
+ index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
if (index < 0) {
ret = index;
goto error;
@@ -95,6 +96,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;
return ret;
--
1.7.7.6
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 -tip/apic 2/2] x86/MSI: Allocate as many multiple-MSIs as requested
2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev
@ 2013-05-02 12:36 ` Joerg Roedel
2013-05-03 7:12 ` [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs Alexander Gordeev
0 siblings, 1 reply; 7+ messages in thread
From: Joerg Roedel @ 2013-05-02 12:36 UTC (permalink / raw)
To: Alexander Gordeev
Cc: linux-kernel, x86, linux-pci, Suresh Siddha, Yinghai Lu,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
On Tue, Apr 30, 2013 at 01:11:11PM +0200, Alexander Gordeev wrote:
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Please add a proper commit message explaining why this change is
necessary or what problem it fixes.
Joerg
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs
2013-05-02 12:36 ` Joerg Roedel
@ 2013-05-03 7:12 ` Alexander Gordeev
0 siblings, 0 replies; 7+ messages in thread
From: Alexander Gordeev @ 2013-05-03 7:12 UTC (permalink / raw)
To: Joerg Roedel
Cc: linux-kernel, x86, linux-pci, Suresh Siddha, Yinghai Lu,
Jan Beulich, Ingo Molnar, Bjorn Helgaas
Current multiple-MSI implementation does not take into account
actual number of requested MSIs and always rounds that number
to a closest power-of-two value. Yet, a number of MSIs a PCI
device could send (and therefore a number of messages a device
driver could request) may be a lesser power-of-two. As result,
resources allocated for extra MSIs just wasted.
This update takes advantage of 'msi_desc::nvec' field introduced
with generic MSI code to track number of requested and used MSIs.
As result, resources associated with interrupts are conserved.
Of those resources most noticeable are x86 interrupt vectors.
The initial version of this fix also consumed on IRTEs, but Jan
noticed that a malfunctioning PCI device might send a message
number it did not claim and thus refer an IRTE it does not own.
To avoid this security hole the old-way approach preserved and
as many IRTEs are reserved as the device could possibly send.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
drivers/iommu/irq_remapping.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index d56f8c1..9eeb6cf 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -51,26 +51,27 @@ static void irq_remapping_disable_io_apic(void)
static int do_setup_msi_irqs(struct pci_dev *dev, int nvec)
{
- int node, ret, sub_handle, index = 0;
+ int node, ret, sub_handle, nvec_pow2, index = 0;
unsigned int irq;
struct msi_desc *msidesc;
- nvec = __roundup_pow_of_two(nvec);
-
WARN_ON(!list_is_singular(&dev->msi_list));
msidesc = list_entry(dev->msi_list.next, struct msi_desc, list);
WARN_ON(msidesc->irq);
WARN_ON(msidesc->msi_attrib.multiple);
+ WARN_ON(msidesc->nvec);
node = dev_to_node(&dev->dev);
irq = __create_irqs(get_nr_irqs_gsi(), nvec, node);
if (irq == 0)
return -ENOSPC;
- msidesc->msi_attrib.multiple = ilog2(nvec);
+ nvec_pow2 = __roundup_pow_of_two(nvec);
+ msidesc->nvec = nvec;
+ msidesc->msi_attrib.multiple = ilog2(nvec_pow2);
for (sub_handle = 0; sub_handle < nvec; sub_handle++) {
if (!sub_handle) {
- index = msi_alloc_remapped_irq(dev, irq, nvec);
+ index = msi_alloc_remapped_irq(dev, irq, nvec_pow2);
if (index < 0) {
ret = index;
goto error;
@@ -95,6 +96,7 @@ error:
* IRQs from tearing down again in default_teardown_msi_irqs()
*/
msidesc->irq = 0;
+ msidesc->nvec = 0;
msidesc->msi_attrib.multiple = 0;
return ret;
--
1.7.7.6
--
Regards,
Alexander Gordeev
agordeev@redhat.com
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-05-03 7:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-29 4:31 [PATCH -tip/apic 0/2] PCI/MSI: Allocate as many multiple-MSIs as requested Alexander Gordeev
2013-04-29 4:32 ` [PATCH -tip/apic 1/2] " Alexander Gordeev
2013-04-29 4:33 ` [PATCH -tip/apic 2/2] x86/MSI: " Alexander Gordeev
2013-04-29 7:22 ` Jan Beulich
2013-04-30 11:11 ` [PATCH v2 " Alexander Gordeev
2013-05-02 12:36 ` Joerg Roedel
2013-05-03 7:12 ` [PATCH v3 -tip/apic 2/2] x86/MSI: Conserve interrupt resources when using multiple-MSIs Alexander Gordeev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).