qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed
@ 2014-04-03  5:18 arei.gonglei
  2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: arei.gonglei @ 2014-04-03  5:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: weidong.huangwei, pbonzini, alex.williamson, Gonglei, mst

From: Gonglei <arei.gonglei@huawei.com>

when map MSI-X table memory failed, the dev->msix_table not be
set to NULL, the assigned_dev_unregister_msix_mmio() will case
a segfault when munmap the failed dev->msix_table.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 hw/i386/kvm/pci-assign.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index a825871..570333f 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1608,6 +1608,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
                            MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
     if (dev->msix_table == MAP_FAILED) {
         error_report("fail allocate msix_table! %s", strerror(errno));
+        dev->msix_table = NULL;
         return -EFAULT;
     }
 
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-03  5:18 [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed arei.gonglei
@ 2014-04-03  5:18 ` arei.gonglei
  2014-04-08 15:32   ` Michael S. Tsirkin
  2014-04-09 14:12   ` Laszlo Ersek
  2014-04-08 14:02 ` [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed Gonglei (Arei)
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: arei.gonglei @ 2014-04-03  5:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: weidong.huangwei, pbonzini, alex.williamson, Gonglei, mst

From: Gonglei <arei.gonglei@huawei.com>

QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
assigned_dev_register_msix_mmio(), meanwhile the set the one
page memmory to zero, so the rest memory will be random value
(maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
maybe occur the issue of entry_nr > 256, and the kmod reports
the EINVAL error.

Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 hw/i386/kvm/pci-assign.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 570333f..d25a19e 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -37,6 +37,7 @@
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "kvm_i386.h"
+#include "qemu/osdep.h"
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -59,6 +60,9 @@
 #define DEBUG(fmt, ...)
 #endif
 
+/* the msix-table size readed from pci device config */
+static int msix_table_size;
+
 typedef struct PCIRegion {
     int type;           /* Memory or port I/O */
     int valid;
@@ -1604,7 +1608,12 @@ static void assigned_dev_msix_reset(AssignedDevice *dev)
 
 static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
 {
-    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE, PROT_READ|PROT_WRITE,
+    msix_table_size = ROUND_UP(dev->msix_max * sizeof(struct MSIXTableEntry),
+                                MSIX_PAGE_SIZE);
+
+    DEBUG("msix_table_size: 0x%x\n", msix_table_size);
+
+    dev->msix_table = mmap(NULL, msix_table_size, PROT_READ|PROT_WRITE,
                            MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
     if (dev->msix_table == MAP_FAILED) {
         error_report("fail allocate msix_table! %s", strerror(errno));
@@ -1615,7 +1624,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
     assigned_dev_msix_reset(dev);
 
     memory_region_init_io(&dev->mmio, OBJECT(dev), &assigned_dev_msix_mmio_ops,
-                          dev, "assigned-dev-msix", MSIX_PAGE_SIZE);
+                          dev, "assigned-dev-msix", msix_table_size);
     return 0;
 }
 
@@ -1627,7 +1636,7 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 
     memory_region_destroy(&dev->mmio);
 
-    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
+    if (munmap(dev->msix_table, msix_table_size) == -1) {
         error_report("error unmapping msix_table! %s", strerror(errno));
     }
     dev->msix_table = NULL;
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed
  2014-04-03  5:18 [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed arei.gonglei
  2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
@ 2014-04-08 14:02 ` Gonglei (Arei)
  2014-04-08 15:32 ` Michael S. Tsirkin
  2014-04-09 14:21 ` Michael S. Tsirkin
  3 siblings, 0 replies; 11+ messages in thread
From: Gonglei (Arei) @ 2014-04-08 14:02 UTC (permalink / raw)
  To: qemu-devel@nongnu.org
  Cc: weidong.huangwei@huawei.com, pbonzini@redhat.com,
	alex.williamson@redhat.com, Gonglei (Arei), mst@redhat.com

Hi, mst and alex.

Ping... These two bug fix can be accepted for KVM pci-assign ? Thanks.

BTW, I have finished the testing work of the Emulex Corporation 
OneConnect NIC (Lancer) Nic by vfio-pci, and the pass-troughed VF works well. 

My environment of testing as follows:

Host: 3.12.16-0.6.6-default
Guest: Suse11sp1, linux-2.6.32.59-0.7
VF: 04:01.5 Ethernet controller: Emulex Corporation Device e228 (rev 10)
Qemu command: 
/usr/bin/qemu-kvm -name suse -cpu kvm64,+x2apic -enable-kvm -m 4096 -smp 1,sockets=1,cores=1,threads=1  \
-drive file=/opt/suse11sp1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0, \
addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-1,readonly=on,format=raw,cache=none, \
aio=native -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -vnc 0.0.0.0:0 -vga cirrus -device vfio-pci,host=04:01.5

> Subject: [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed
> 
> From: Gonglei <arei.gonglei@huawei.com>
> 
> when map MSI-X table memory failed, the dev->msix_table not be
> set to NULL, the assigned_dev_unregister_msix_mmio() will case
> a segfault when munmap the failed dev->msix_table.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  hw/i386/kvm/pci-assign.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..570333f 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1608,6 +1608,7 @@ static int
> assigned_dev_register_msix_mmio(AssignedDevice *dev)
>                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
>      if (dev->msix_table == MAP_FAILED) {
>          error_report("fail allocate msix_table! %s", strerror(errno));
> +        dev->msix_table = NULL;
>          return -EFAULT;
>      }
> 
> --
> 1.7.12.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
@ 2014-04-08 15:32   ` Michael S. Tsirkin
  2014-04-09 10:56     ` Gonglei (Arei)
  2014-04-09 14:12   ` Laszlo Ersek
  1 sibling, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-04-08 15:32 UTC (permalink / raw)
  To: arei.gonglei; +Cc: weidong.huangwei, pbonzini, alex.williamson, qemu-devel

On Thu, Apr 03, 2014 at 01:18:24PM +0800, arei.gonglei@huawei.com wrote:
> From: Gonglei <arei.gonglei@huawei.com>
> 
> QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> assigned_dev_register_msix_mmio(), meanwhile the set the one
> page memmory to zero, so the rest memory will be random value
> (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> maybe occur the issue of entry_nr > 256, and the kmod reports
> the EINVAL error.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>

Okay so this kind of works because guest does not try
to use so many vectors.

But I think it's better not to give guest more entries
than we can actually support.

How about tweaking MSIX capability exposed to guest to limit table size?

> ---
>  hw/i386/kvm/pci-assign.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 570333f..d25a19e 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -37,6 +37,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
>  #include "kvm_i386.h"
> +#include "qemu/osdep.h"
>  
>  #define MSIX_PAGE_SIZE 0x1000
>  
> @@ -59,6 +60,9 @@
>  #define DEBUG(fmt, ...)
>  #endif
>  
> +/* the msix-table size readed from pci device config */
> +static int msix_table_size;
> +
>  typedef struct PCIRegion {
>      int type;           /* Memory or port I/O */
>      int valid;
> @@ -1604,7 +1608,12 @@ static void assigned_dev_msix_reset(AssignedDevice *dev)
>  
>  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>  {
> -    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE, PROT_READ|PROT_WRITE,
> +    msix_table_size = ROUND_UP(dev->msix_max * sizeof(struct MSIXTableEntry),
> +                                MSIX_PAGE_SIZE);
> +
> +    DEBUG("msix_table_size: 0x%x\n", msix_table_size);
> +
> +    dev->msix_table = mmap(NULL, msix_table_size, PROT_READ|PROT_WRITE,
>                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
>      if (dev->msix_table == MAP_FAILED) {
>          error_report("fail allocate msix_table! %s", strerror(errno));
> @@ -1615,7 +1624,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>      assigned_dev_msix_reset(dev);
>  
>      memory_region_init_io(&dev->mmio, OBJECT(dev), &assigned_dev_msix_mmio_ops,
> -                          dev, "assigned-dev-msix", MSIX_PAGE_SIZE);
> +                          dev, "assigned-dev-msix", msix_table_size);
>      return 0;
>  }
>  
> @@ -1627,7 +1636,7 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>  
>      memory_region_destroy(&dev->mmio);
>  
> -    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
> +    if (munmap(dev->msix_table, msix_table_size) == -1) {
>          error_report("error unmapping msix_table! %s", strerror(errno));
>      }
>      dev->msix_table = NULL;
> -- 
> 1.7.12.4
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed
  2014-04-03  5:18 [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed arei.gonglei
  2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
  2014-04-08 14:02 ` [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed Gonglei (Arei)
@ 2014-04-08 15:32 ` Michael S. Tsirkin
  2014-04-09 14:21 ` Michael S. Tsirkin
  3 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-04-08 15:32 UTC (permalink / raw)
  To: arei.gonglei; +Cc: weidong.huangwei, pbonzini, alex.williamson, qemu-devel

On Thu, Apr 03, 2014 at 01:18:23PM +0800, arei.gonglei@huawei.com wrote:
> From: Gonglei <arei.gonglei@huawei.com>
> 
> when map MSI-X table memory failed, the dev->msix_table not be
> set to NULL, the assigned_dev_unregister_msix_mmio() will case
> a segfault when munmap the failed dev->msix_table.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/i386/kvm/pci-assign.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..570333f 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1608,6 +1608,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
>      if (dev->msix_table == MAP_FAILED) {
>          error_report("fail allocate msix_table! %s", strerror(errno));
> +        dev->msix_table = NULL;
>          return -EFAULT;
>      }
>  
> -- 
> 1.7.12.4
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-08 15:32   ` Michael S. Tsirkin
@ 2014-04-09 10:56     ` Gonglei (Arei)
  2014-04-09 13:52       ` Michael S. Tsirkin
  0 siblings, 1 reply; 11+ messages in thread
From: Gonglei (Arei) @ 2014-04-09 10:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pbonzini@redhat.com, alex.williamson@redhat.com, Huangweidong (C),
	qemu-devel@nongnu.org

Hi,

> > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > page memmory to zero, so the rest memory will be random value
> > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > maybe occur the issue of entry_nr > 256, and the kmod reports
> > the EINVAL error.
> >
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> 
> Okay so this kind of works because guest does not try
> to use so many vectors.
> 
Yes. It's true.

> But I think it's better not to give guest more entries
> than we can actually support.
> 
> How about tweaking MSIX capability exposed to guest to limit table size?
> 
IMHO, the MSIX table entry size got form the physic NIC hardware. The software
layer shouldn't tweak the capability of real hardware, neither QEMU nor KVM. 


Best regards,
-Gonglei

> > ---
> >  hw/i386/kvm/pci-assign.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> > index 570333f..d25a19e 100644
> > --- a/hw/i386/kvm/pci-assign.c
> > +++ b/hw/i386/kvm/pci-assign.c
> > @@ -37,6 +37,7 @@
> >  #include "hw/pci/pci.h"
> >  #include "hw/pci/msi.h"
> >  #include "kvm_i386.h"
> > +#include "qemu/osdep.h"
> >
> >  #define MSIX_PAGE_SIZE 0x1000
> >
> > @@ -59,6 +60,9 @@
> >  #define DEBUG(fmt, ...)
> >  #endif
> >
> > +/* the msix-table size readed from pci device config */
> > +static int msix_table_size;
> > +
> >  typedef struct PCIRegion {
> >      int type;           /* Memory or port I/O */
> >      int valid;
> > @@ -1604,7 +1608,12 @@ static void
> assigned_dev_msix_reset(AssignedDevice *dev)
> >
> >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  {
> > -    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE,
> PROT_READ|PROT_WRITE,
> > +    msix_table_size = ROUND_UP(dev->msix_max * sizeof(struct
> MSIXTableEntry),
> > +                                MSIX_PAGE_SIZE);
> > +
> > +    DEBUG("msix_table_size: 0x%x\n", msix_table_size);
> > +
> > +    dev->msix_table = mmap(NULL, msix_table_size,
> PROT_READ|PROT_WRITE,
> >                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> >      if (dev->msix_table == MAP_FAILED) {
> >          error_report("fail allocate msix_table! %s", strerror(errno));
> > @@ -1615,7 +1624,7 @@ static int
> assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >      assigned_dev_msix_reset(dev);
> >
> >      memory_region_init_io(&dev->mmio, OBJECT(dev),
> &assigned_dev_msix_mmio_ops,
> > -                          dev, "assigned-dev-msix",
> MSIX_PAGE_SIZE);
> > +                          dev, "assigned-dev-msix", msix_table_size);
> >      return 0;
> >  }
> >
> > @@ -1627,7 +1636,7 @@ static void
> assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> >
> >      memory_region_destroy(&dev->mmio);
> >
> > -    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
> > +    if (munmap(dev->msix_table, msix_table_size) == -1) {
> >          error_report("error unmapping msix_table! %s", strerror(errno));
> >      }
> >      dev->msix_table = NULL;
> > --
> > 1.7.12.4
> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-09 10:56     ` Gonglei (Arei)
@ 2014-04-09 13:52       ` Michael S. Tsirkin
  2014-04-10  2:34         ` Gonglei (Arei)
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-04-09 13:52 UTC (permalink / raw)
  To: Gonglei (Arei)
  Cc: pbonzini@redhat.com, alex.williamson@redhat.com, Huangweidong (C),
	qemu-devel@nongnu.org

On Wed, Apr 09, 2014 at 10:56:57AM +0000, Gonglei (Arei) wrote:
> Hi,
> 
> > > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > > page memmory to zero, so the rest memory will be random value
> > > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > > maybe occur the issue of entry_nr > 256, and the kmod reports
> > > the EINVAL error.
> > >
> > > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > 
> > Okay so this kind of works because guest does not try
> > to use so many vectors.
> > 
> Yes. It's true.
> 
> > But I think it's better not to give guest more entries
> > than we can actually support.
> > 
> > How about tweaking MSIX capability exposed to guest to limit table size?
> > 
> IMHO, the MSIX table entry size got form the physic NIC hardware. The software
> layer shouldn't tweak the capability of real hardware, neither QEMU nor KVM. 
> 
> 
> Best regards,
> -Gonglei

Should be easy to fix though. Does the following help?

-->

pci-assign: limit # of msix vectors


Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index a825871..76aa86e 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1258,6 +1258,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
     if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
         int bar_nr;
         uint32_t msix_table_entry;
+        uint16_t msix_max;
 
         if (!check_irqchip_in_kernel()) {
             return -ENOTSUP;
@@ -1269,9 +1270,10 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         }
         pci_dev->msix_cap = pos;
 
-        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
-                     pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
-                     PCI_MSIX_FLAGS_QSIZE);
+        msix_max = (pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+                    PCI_MSIX_FLAGS_QSIZE) + 1;
+        msix_max = MIN(msix_max, KVM_MAX_MSIX_PER_DEV);
+        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, msix_max - 1);
 
         /* Only enable and function mask bits are writable */
         pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
@@ -1281,9 +1283,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
         bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
         msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
         dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
-        dev->msix_max = pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS);
-        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
-        dev->msix_max += 1;
+        dev->msix_max = msix_max;
     }
 
     /* Minimal PM support, nothing writable, device appears to NAK changes */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
  2014-04-08 15:32   ` Michael S. Tsirkin
@ 2014-04-09 14:12   ` Laszlo Ersek
  2014-04-10  2:05     ` Gonglei (Arei)
  1 sibling, 1 reply; 11+ messages in thread
From: Laszlo Ersek @ 2014-04-09 14:12 UTC (permalink / raw)
  To: arei.gonglei, qemu-devel; +Cc: weidong.huangwei, pbonzini, alex.williamson, mst

On 04/03/14 07:18, arei.gonglei@huawei.com wrote:
> From: Gonglei <arei.gonglei@huawei.com>
> 
> QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> assigned_dev_register_msix_mmio(), meanwhile the set the one
> page memmory to zero, so the rest memory will be random value
> (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> maybe occur the issue of entry_nr > 256, and the kmod reports
> the EINVAL error.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  hw/i386/kvm/pci-assign.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index 570333f..d25a19e 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -37,6 +37,7 @@
>  #include "hw/pci/pci.h"
>  #include "hw/pci/msi.h"
>  #include "kvm_i386.h"
> +#include "qemu/osdep.h"
>  
>  #define MSIX_PAGE_SIZE 0x1000
>  
> @@ -59,6 +60,9 @@
>  #define DEBUG(fmt, ...)
>  #endif
>  
> +/* the msix-table size readed from pci device config */
> +static int msix_table_size;
> +
>  typedef struct PCIRegion {
>      int type;           /* Memory or port I/O */
>      int valid;
> @@ -1604,7 +1608,12 @@ static void assigned_dev_msix_reset(AssignedDevice *dev)
>  
>  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>  {
> -    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE, PROT_READ|PROT_WRITE,
> +    msix_table_size = ROUND_UP(dev->msix_max * sizeof(struct MSIXTableEntry),
> +                                MSIX_PAGE_SIZE);
> +
> +    DEBUG("msix_table_size: 0x%x\n", msix_table_size);
> +
> +    dev->msix_table = mmap(NULL, msix_table_size, PROT_READ|PROT_WRITE,
>                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
>      if (dev->msix_table == MAP_FAILED) {
>          error_report("fail allocate msix_table! %s", strerror(errno));
> @@ -1615,7 +1624,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>      assigned_dev_msix_reset(dev);
>  
>      memory_region_init_io(&dev->mmio, OBJECT(dev), &assigned_dev_msix_mmio_ops,
> -                          dev, "assigned-dev-msix", MSIX_PAGE_SIZE);
> +                          dev, "assigned-dev-msix", msix_table_size);
>      return 0;
>  }
>  
> @@ -1627,7 +1636,7 @@ static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
>  
>      memory_region_destroy(&dev->mmio);
>  
> -    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
> +    if (munmap(dev->msix_table, msix_table_size) == -1) {
>          error_report("error unmapping msix_table! %s", strerror(errno));
>      }
>      dev->msix_table = NULL;
> 

My only interest in this patch is RHBZ 616415. Namely, I have to improve
the propagation of human-readable errors in PCI device assignment
(through QMP as well). This patchset has hunks that look capable of
conflicting with my work (not started yet), so I think maybe I should
delay my work until this patchset is hashed out and applied.

Anyway, I just don't want to wait, so here are some review comments (I'm
not a PCI assignment expert by any means!):

(1) The patch introduces "msix_table_size" as an object with static
storage duration (ie. as a "global variable"). This is wrong; different
passthru devices will have different MSI-X table sizes. The patch causes
breakage as soon as two devices with different (rounded up) table sizes
are assigned, and then at least one of them is unassigned (because
assigned_dev_unregister_msix_mmio() will unmap either too much or too
little).

There are two ways to fix this:
- Make "msix_table_size" a *sibling* field of "msix_table" and
"msix_max", in struct AssignedDevice. Or,
- Recompute the value of "msix_table_size" (to be introduced as a local
variable in all relevant functions) from "dev->msix_max" each time you
need it.

(2) The other problem is that not all uses of MSIX_PAGE_SIZE are
updated. Namely, assigned_dev_msix_reset() clears the first page, then
overwrites the first dev->msix_table entries (masking them). However, if
we've allocated at least two pages due to *inexact* division (ie.
rounding up), then the trailing portion of the last page continues to
contain garbage.

Now, this
(a) either matters to KVM (because it wants to have sensible values in
*all* entries that fit into the pages that it sees), or
(b) it doesn't (because KVM complies with "entries_nr" in
assigned_dev_update_msix_mmio(), which is bounded by "adev->msix_max").

I don't know which one of (a) or (b) is the case. But, the code seems
incorrect (or at least misleading) in both cases:

In case (a), the memset() in assigned_dev_msix_reset() should be updated
so that it covers the entire allocation (all full pages, including the
last one).

In case (b), the memset() should be dropped from
assigned_dev_msix_reset(), because the loop just beneath it will
populate the entire set that KVM will look at.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed
  2014-04-03  5:18 [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed arei.gonglei
                   ` (2 preceding siblings ...)
  2014-04-08 15:32 ` Michael S. Tsirkin
@ 2014-04-09 14:21 ` Michael S. Tsirkin
  3 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-04-09 14:21 UTC (permalink / raw)
  To: arei.gonglei; +Cc: weidong.huangwei, pbonzini, alex.williamson, qemu-devel

On Thu, Apr 03, 2014 at 01:18:23PM +0800, arei.gonglei@huawei.com wrote:
> From: Gonglei <arei.gonglei@huawei.com>
> 
> when map MSI-X table memory failed, the dev->msix_table not be
> set to NULL, the assigned_dev_unregister_msix_mmio() will case
> a segfault when munmap the failed dev->msix_table.
> 
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>


Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  hw/i386/kvm/pci-assign.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..570333f 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1608,6 +1608,7 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
>                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
>      if (dev->msix_table == MAP_FAILED) {
>          error_report("fail allocate msix_table! %s", strerror(errno));
> +        dev->msix_table = NULL;
>          return -EFAULT;
>      }
>  
> -- 
> 1.7.12.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-09 14:12   ` Laszlo Ersek
@ 2014-04-10  2:05     ` Gonglei (Arei)
  0 siblings, 0 replies; 11+ messages in thread
From: Gonglei (Arei) @ 2014-04-10  2:05 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel@nongnu.org
  Cc: pbonzini@redhat.com, alex.williamson@redhat.com, Huangweidong (C),
	mst@redhat.com

> On 04/03/14 07:18, arei.gonglei@huawei.com wrote:
> > From: Gonglei <arei.gonglei@huawei.com>
> >
> > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > page memmory to zero, so the rest memory will be random value
> > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > maybe occur the issue of entry_nr > 256, and the kmod reports
> > the EINVAL error.
> >
> > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > ---
> >  hw/i386/kvm/pci-assign.c | 15 ++++++++++++---
> >  1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> > index 570333f..d25a19e 100644
> > --- a/hw/i386/kvm/pci-assign.c
> > +++ b/hw/i386/kvm/pci-assign.c
> > @@ -37,6 +37,7 @@
> >  #include "hw/pci/pci.h"
> >  #include "hw/pci/msi.h"
> >  #include "kvm_i386.h"
> > +#include "qemu/osdep.h"
> >
> >  #define MSIX_PAGE_SIZE 0x1000
> >
> > @@ -59,6 +60,9 @@
> >  #define DEBUG(fmt, ...)
> >  #endif
> >
> > +/* the msix-table size readed from pci device config */
> > +static int msix_table_size;
> > +
> >  typedef struct PCIRegion {
> >      int type;           /* Memory or port I/O */
> >      int valid;
> > @@ -1604,7 +1608,12 @@ static void
> assigned_dev_msix_reset(AssignedDevice *dev)
> >
> >  static int assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >  {
> > -    dev->msix_table = mmap(NULL, MSIX_PAGE_SIZE,
> PROT_READ|PROT_WRITE,
> > +    msix_table_size = ROUND_UP(dev->msix_max * sizeof(struct
> MSIXTableEntry),
> > +                                MSIX_PAGE_SIZE);
> > +
> > +    DEBUG("msix_table_size: 0x%x\n", msix_table_size);
> > +
> > +    dev->msix_table = mmap(NULL, msix_table_size,
> PROT_READ|PROT_WRITE,
> >                             MAP_ANONYMOUS|MAP_PRIVATE, 0, 0);
> >      if (dev->msix_table == MAP_FAILED) {
> >          error_report("fail allocate msix_table! %s", strerror(errno));
> > @@ -1615,7 +1624,7 @@ static int
> assigned_dev_register_msix_mmio(AssignedDevice *dev)
> >      assigned_dev_msix_reset(dev);
> >
> >      memory_region_init_io(&dev->mmio, OBJECT(dev),
> &assigned_dev_msix_mmio_ops,
> > -                          dev, "assigned-dev-msix",
> MSIX_PAGE_SIZE);
> > +                          dev, "assigned-dev-msix", msix_table_size);
> >      return 0;
> >  }
> >
> > @@ -1627,7 +1636,7 @@ static void
> assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
> >
> >      memory_region_destroy(&dev->mmio);
> >
> > -    if (munmap(dev->msix_table, MSIX_PAGE_SIZE) == -1) {
> > +    if (munmap(dev->msix_table, msix_table_size) == -1) {
> >          error_report("error unmapping msix_table! %s", strerror(errno));
> >      }
> >      dev->msix_table = NULL;
> >
> 
> My only interest in this patch is RHBZ 616415. Namely, I have to improve
> the propagation of human-readable errors in PCI device assignment
> (through QMP as well). This patchset has hunks that look capable of
> conflicting with my work (not started yet), so I think maybe I should
> delay my work until this patchset is hashed out and applied.
> 
> Anyway, I just don't want to wait, so here are some review comments (I'm
> not a PCI assignment expert by any means!):
> 
> (1) The patch introduces "msix_table_size" as an object with static
> storage duration (ie. as a "global variable"). This is wrong; different
> passthru devices will have different MSI-X table sizes. The patch causes
> breakage as soon as two devices with different (rounded up) table sizes
> are assigned, and then at least one of them is unassigned (because
> assigned_dev_unregister_msix_mmio() will unmap either too much or too
> little).
> 
Good catch. Thanks!

> There are two ways to fix this:
> - Make "msix_table_size" a *sibling* field of "msix_table" and
> "msix_max", in struct AssignedDevice. Or,
> - Recompute the value of "msix_table_size" (to be introduced as a local
> variable in all relevant functions) from "dev->msix_max" each time you
> need it.
> 
IMHO, the second way is better.

> (2) The other problem is that not all uses of MSIX_PAGE_SIZE are
> updated. Namely, assigned_dev_msix_reset() clears the first page, then
> overwrites the first dev->msix_table entries (masking them). However, if
> we've allocated at least two pages due to *inexact* division (ie.
> rounding up), then the trailing portion of the last page continues to
> contain garbage.
> 
Yep, this is my fault.

> Now, this
> (a) either matters to KVM (because it wants to have sensible values in
> *all* entries that fit into the pages that it sees), or
> (b) it doesn't (because KVM complies with "entries_nr" in
> assigned_dev_update_msix_mmio(), which is bounded by "adev->msix_max").
> 
(b) is not correct when the msix table size > MSIX_PAGE_SIZE. The "entries_nr"
maybe greater than 256 for the last garbage page.

> I don't know which one of (a) or (b) is the case. But, the code seems
> incorrect (or at least misleading) in both cases:
> 
> In case (a), the memset() in assigned_dev_msix_reset() should be updated
> so that it covers the entire allocation (all full pages, including the
> last one).
> 
Agreed.

> In case (b), the memset() should be dropped from
> assigned_dev_msix_reset(), because the loop just beneath it will
> populate the entire set that KVM will look at.
> 
> Thanks
> Laszlo

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page
  2014-04-09 13:52       ` Michael S. Tsirkin
@ 2014-04-10  2:34         ` Gonglei (Arei)
  0 siblings, 0 replies; 11+ messages in thread
From: Gonglei (Arei) @ 2014-04-10  2:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pbonzini@redhat.com, alex.williamson@redhat.com, Huangweidong (C),
	qemu-devel@nongnu.org

> On Wed, Apr 09, 2014 at 10:56:57AM +0000, Gonglei (Arei) wrote:
> > Hi,
> >
> > > > QEMU only mmap MSIX_PAGE_SIZE memory for all pci devices in
> > > > assigned_dev_register_msix_mmio(), meanwhile the set the one
> > > > page memmory to zero, so the rest memory will be random value
> > > > (maybe etnry.data is not 0). In the assigned_dev_update_msix_mmio()
> > > > maybe occur the issue of entry_nr > 256, and the kmod reports
> > > > the EINVAL error.
> > > >
> > > > Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> > >
> > > Okay so this kind of works because guest does not try
> > > to use so many vectors.
> > >
> > Yes. It's true.
> >
> > > But I think it's better not to give guest more entries
> > > than we can actually support.
> > >
> > > How about tweaking MSIX capability exposed to guest to limit table size?
> > >
> > IMHO, the MSIX table entry size got form the physic NIC hardware. The
> software
> > layer shouldn't tweak the capability of real hardware, neither QEMU nor
> KVM.
> >
> >
> > Best regards,
> > -Gonglei
> 
> Should be easy to fix though. Does the following help?
> 
Good, the patch is perfect.

> -->
> 
> pci-assign: limit # of msix vectors
> 
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> 
> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index a825871..76aa86e 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -1258,6 +1258,7 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>      if (pos != 0 && kvm_device_msix_supported(kvm_state)) {
>          int bar_nr;
>          uint32_t msix_table_entry;
> +        uint16_t msix_max;
> 
>          if (!check_irqchip_in_kernel()) {
>              return -ENOTSUP;
> @@ -1269,9 +1270,10 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>          }
>          pci_dev->msix_cap = pos;
> 
> -        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
> -                     pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS) &
> -                     PCI_MSIX_FLAGS_QSIZE);
> +        msix_max = (pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS) &
> +                    PCI_MSIX_FLAGS_QSIZE) + 1;
> +        msix_max = MIN(msix_max, KVM_MAX_MSIX_PER_DEV);
> +        pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, msix_max -
> 1);
> 
>          /* Only enable and function mask bits are writable */
>          pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
> @@ -1281,9 +1283,7 @@ static int assigned_device_pci_cap_init(PCIDevice
> *pci_dev)
>          bar_nr = msix_table_entry & PCI_MSIX_FLAGS_BIRMASK;
>          msix_table_entry &= ~PCI_MSIX_FLAGS_BIRMASK;
>          dev->msix_table_addr = pci_region[bar_nr].base_addr +
> msix_table_entry;
> -        dev->msix_max = pci_get_word(pci_dev->config + pos +
> PCI_MSIX_FLAGS);
> -        dev->msix_max &= PCI_MSIX_FLAGS_QSIZE;
> -        dev->msix_max += 1;
> +        dev->msix_max = msix_max;
>      }
> 
>      /* Minimal PM support, nothing writable, device appears to NAK changes
> */

In the Guest OS, the pci config of VF as bellow:

linux:/sys/bus/pci/devices/0000:00:08.0 # hexdump config 
0000000 10df e228 0507 0010 0010 0200 0000 0000
0000010 000c fe00 0000 0000 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 10df e264
0000030 0000 0000 0054 0000 0000 0000 0000 0000
0000040 0000 0000 0008 0000 0000 0000 0000 0000
0000050 0000 0000 8409 0008 2b41 c002 0000 0000
0000060 0005 000a 0000 0000 0000 0000 0000 0000
0000070 0000 0000 0000 0000 6011 80ff 4000 0000
0000080 3400 0000 9403 0000 0000 0000 0000 0000
0000090 0000 0000 7810 0002 8724 1000 0000 0000
00000a0 dc83 0041 0000 0000 0000 0000 0000 0000
00000b0 0000 0000 0000 0000 001f 0010 0000 0000
00000c0 000e 0000 0000 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000100

# lspci
00:08.0 Ethernet controller: Emulex Corporation Device e228 (rev 10)

The msix table size is *0ff*. Expect a formal patch. Thanks.

Tested-by: Gonglei <arei.gonglei@huawei.com>


Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-10  2:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-03  5:18 [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed arei.gonglei
2014-04-03  5:18 ` [Qemu-devel] [PATCH 2/2] pci-assign: Fix memory out of bound when MSI-X table not fit in a single page arei.gonglei
2014-04-08 15:32   ` Michael S. Tsirkin
2014-04-09 10:56     ` Gonglei (Arei)
2014-04-09 13:52       ` Michael S. Tsirkin
2014-04-10  2:34         ` Gonglei (Arei)
2014-04-09 14:12   ` Laszlo Ersek
2014-04-10  2:05     ` Gonglei (Arei)
2014-04-08 14:02 ` [Qemu-devel] [PATCH 1/2] pci-assign: Fix a bug when map MSI-X table memory failed Gonglei (Arei)
2014-04-08 15:32 ` Michael S. Tsirkin
2014-04-09 14:21 ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).