qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally.
@ 2009-10-11 18:59 Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 2/5] Enable power button event generation Gleb Natapov
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-10-11 18:59 UTC (permalink / raw)
  To: kevin; +Cc: qemu-devel

VMware ESX requires an mptable even for uniprocessor guests.

Qemu pcbios commit 9869338791ca6f44e628b88e6335ffd66e1ed86b

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 src/mptable.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/src/mptable.c b/src/mptable.c
index 793968c..3945d2e 100644
--- a/src/mptable.c
+++ b/src/mptable.c
@@ -18,10 +18,6 @@ mptable_init(void)
 
     dprintf(3, "init MPTable\n");
 
-    if (MaxCountCPUs <= 1)
-        // Building an mptable on uniprocessor machines confuses some OSes.
-        return;
-
     int length = (sizeof(struct mptable_config_s)
                   + sizeof(struct mpt_cpu) * MaxCountCPUs
                   + sizeof(struct mpt_bus)
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 2/5] Enable power button event generation.
  2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
@ 2009-10-11 18:59 ` Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR Gleb Natapov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-10-11 18:59 UTC (permalink / raw)
  To: kevin; +Cc: qemu-devel

Qemu pcbios commit 4f5a9a84ff32e2445bf1c854e33b0b32f4459428

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 src/acpi.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/acpi.c b/src/acpi.c
index 1ff2290..b7af66e 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -336,8 +336,8 @@ build_fadt(int bdf)
     fadt->plvl3_lat = cpu_to_le16(0xfff); // C3 state not supported
     fadt->gpe0_blk = cpu_to_le32(0xafe0);
     fadt->gpe0_blk_len = 4;
-    /* WBINVD + PROC_C1 + PWR_BUTTON + SLP_BUTTON + FIX_RTC */
-    fadt->flags = cpu_to_le32((1 << 0) | (1 << 2) | (1 << 4) | (1 << 5) | (1 << 6));
+    /* WBINVD + PROC_C1 + SLP_BUTTON + FIX_RTC */
+    fadt->flags = cpu_to_le32((1 << 0) | (1 << 2) | (1 << 5) | (1 << 6));
 
     build_header((void*)fadt, FACP_SIGNATURE, sizeof(*fadt), 1);
 
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 2/5] Enable power button event generation Gleb Natapov
@ 2009-10-11 18:59 ` Gleb Natapov
  2009-10-11 21:53   ` [Qemu-devel] " Michael S. Tsirkin
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000 Gleb Natapov
  3 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-11 18:59 UTC (permalink / raw)
  To: kevin; +Cc: qemu-devel

Bit 0 is the enable bit, which we not only don't want to set, but
it will stick and make us think it's an I/O port resource.

Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 src/pciinit.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 1d0f784..29b3901 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
             int ofs;
             u32 val, size;
 
-            if (i == PCI_ROM_SLOT)
+            if (i == PCI_ROM_SLOT) {
                 ofs = PCI_ROM_ADDRESS;
-            else
+                pci_config_writel(bdf, ofs, 0xfffffffe);
+            } else {
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+                pci_config_writel(bdf, ofs, 0xffffffff);
+            }
             val = pci_config_readl(bdf, ofs);
             if (val != 0) {
                 size = (~(val & ~0xf)) + 1;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 2/5] Enable power button event generation Gleb Natapov
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR Gleb Natapov
@ 2009-10-11 18:59 ` Gleb Natapov
  2009-10-11 21:48   ` [Qemu-devel] " Michael S. Tsirkin
  2009-10-12 14:27   ` Kevin O'Connor
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000 Gleb Natapov
  3 siblings, 2 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-10-11 18:59 UTC (permalink / raw)
  To: kevin; +Cc: qemu-devel

MMIO of some devices are not page aligned, such as some EHCI
controllers and virtual Realtek NIC in guest. Current guest
bios doesn't guarantee the start address of MMIO page aligned.
This may result in failure of device assignment, because KVM
only allow to register page aligned memory slots. For example,
it fails to assign EHCI controller (its MMIO size is 1KB) with
virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
the EHCI controller will starts from 0xf2001400.

MMIO addresses in guest are allocated in guest bios. This patch
makes MMIO address page aligned in bios, then fixes the issue.

qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 src/pciinit.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 29b3901..53fbfcf 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -10,6 +10,7 @@
 #include "biosvar.h" // GET_EBDA
 #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
 #include "pci_regs.h" // PCI_COMMAND
+#include "paravirt.h"
 
 #define PCI_ROM_SLOT 6
 #define PCI_NUM_REGIONS 7
@@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
                 *paddr = ALIGN(*paddr, size);
                 pci_set_io_region_addr(bdf, i, *paddr);
                 *paddr += size;
+                if (kvm_para_available()) {
+                    /* make memory address page aligned */
+                    /* needed for device assignment on kvm */
+                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
+                        *paddr = (*paddr + 0xfff) & 0xfffff000;
+               }
             }
         }
         break;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000.
  2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
                   ` (2 preceding siblings ...)
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
@ 2009-10-11 18:59 ` Gleb Natapov
  2009-10-12 14:24   ` [Qemu-devel] " Kevin O'Connor
  3 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-11 18:59 UTC (permalink / raw)
  To: kevin; +Cc: qemu-devel

Qemu starts PCI hole at 0xe0000000, but lets set it to the same value as
pcbios uses for now.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
---
 src/pciinit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 53fbfcf..67d6a26 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -202,7 +202,7 @@ pci_setup(void)
     dprintf(3, "pci setup\n");
 
     pci_bios_io_addr = 0xc000;
-    pci_bios_mem_addr = 0xc0000000;
+    pci_bios_mem_addr = 0xf0000000;
     pci_bios_bigmem_addr = RamSize;
     if (pci_bios_bigmem_addr < 0x90000000)
         pci_bios_bigmem_addr = 0x90000000;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
@ 2009-10-11 21:48   ` Michael S. Tsirkin
  2009-10-12  6:44     ` Gleb Natapov
  2009-10-12 14:27   ` Kevin O'Connor
  1 sibling, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-11 21:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> MMIO of some devices are not page aligned, such as some EHCI
> controllers and virtual Realtek NIC in guest. Current guest
> bios doesn't guarantee the start address of MMIO page aligned.
> This may result in failure of device assignment, because KVM
> only allow to register page aligned memory slots. For example,
> it fails to assign EHCI controller (its MMIO size is 1KB) with
> virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
> of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
> the EHCI controller will starts from 0xf2001400.
> 
> MMIO addresses in guest are allocated in guest bios. This patch
> makes MMIO address page aligned in bios, then fixes the issue.
> 
> qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>

This wastes memory for non-assigned devices.  I think it's better, and
cleaner, to make qemu increase the BAR size up to 4K for assigned
devices if it wants page size alignment.


> ---
>  src/pciinit.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 29b3901..53fbfcf 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -10,6 +10,7 @@
>  #include "biosvar.h" // GET_EBDA
>  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
>  #include "pci_regs.h" // PCI_COMMAND
> +#include "paravirt.h"
>  
>  #define PCI_ROM_SLOT 6
>  #define PCI_NUM_REGIONS 7
> @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
>                  *paddr = ALIGN(*paddr, size);
>                  pci_set_io_region_addr(bdf, i, *paddr);
>                  *paddr += size;
> +                if (kvm_para_available()) {
> +                    /* make memory address page aligned */
> +                    /* needed for device assignment on kvm */
> +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> +               }
>              }
>          }
>          break;
> -- 
> 1.6.3.3
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR Gleb Natapov
@ 2009-10-11 21:53   ` Michael S. Tsirkin
  2009-10-12  6:50     ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-11 21:53 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Sun, Oct 11, 2009 at 08:59:05PM +0200, Gleb Natapov wrote:
> Bit 0 is the enable bit, which we not only don't want to set, but
> it will stick and make us think it's an I/O port resource.
> 
> Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> ---
>  src/pciinit.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 1d0f784..29b3901 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
>              int ofs;
>              u32 val, size;
>  
> -            if (i == PCI_ROM_SLOT)
> +            if (i == PCI_ROM_SLOT) {
>                  ofs = PCI_ROM_ADDRESS;
> -            else
> +                pci_config_writel(bdf, ofs, 0xfffffffe);
> +            } else {
>                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> -            pci_config_writel(bdf, ofs, 0xffffffff);
> +                pci_config_writel(bdf, ofs, 0xffffffff);
> +            }
>              val = pci_config_readl(bdf, ofs);
>              if (val != 0) {
>                  size = (~(val & ~0xf)) + 1;

Hmm, eithe rinterpreet the spec strictly or loosely.

If you implement the spec loosely, you can just write 
0xfffffffe unconditionally: low bit is readonly for i/o and memory.

Strict interpretation of spec requires that you write 0
into reserved bits. These are bits 1 to 10 for ROM,
and bit 1 for I/O.


> -- 
> 1.6.3.3
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-11 21:48   ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-10-12  6:44     ` Gleb Natapov
  2009-10-12  7:10       ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12  6:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Sun, Oct 11, 2009 at 11:48:20PM +0200, Michael S. Tsirkin wrote:
> On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> > MMIO of some devices are not page aligned, such as some EHCI
> > controllers and virtual Realtek NIC in guest. Current guest
> > bios doesn't guarantee the start address of MMIO page aligned.
> > This may result in failure of device assignment, because KVM
> > only allow to register page aligned memory slots. For example,
> > it fails to assign EHCI controller (its MMIO size is 1KB) with
> > virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
> > of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
> > the EHCI controller will starts from 0xf2001400.
> > 
> > MMIO addresses in guest are allocated in guest bios. This patch
> > makes MMIO address page aligned in bios, then fixes the issue.
> > 
> > qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f
> > 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> 
> This wastes memory for non-assigned devices.  I think it's better, and
> cleaner, to make qemu increase the BAR size up to 4K for assigned
> devices if it wants page size alignment.
> 
We have three and a half devices in QEUM so I don't think memory is a
big concern. Regardless, if you think that fiddle with assigned devices
responses is better idea go ahead and send patches. As it stands this
patch is in kvm's bios and is required for assigned devices to work
for some devices, so moving to seabios without this patch will introduce
a regression.

> 
> > ---
> >  src/pciinit.c |    7 +++++++
> >  1 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/src/pciinit.c b/src/pciinit.c
> > index 29b3901..53fbfcf 100644
> > --- a/src/pciinit.c
> > +++ b/src/pciinit.c
> > @@ -10,6 +10,7 @@
> >  #include "biosvar.h" // GET_EBDA
> >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> >  #include "pci_regs.h" // PCI_COMMAND
> > +#include "paravirt.h"
> >  
> >  #define PCI_ROM_SLOT 6
> >  #define PCI_NUM_REGIONS 7
> > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> >                  *paddr = ALIGN(*paddr, size);
> >                  pci_set_io_region_addr(bdf, i, *paddr);
> >                  *paddr += size;
> > +                if (kvm_para_available()) {
> > +                    /* make memory address page aligned */
> > +                    /* needed for device assignment on kvm */
> > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > +               }
> >              }
> >          }
> >          break;
> > -- 
> > 1.6.3.3
> > 
> > 

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-11 21:53   ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-10-12  6:50     ` Gleb Natapov
  2009-10-12  9:52       ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Sun, Oct 11, 2009 at 11:53:56PM +0200, Michael S. Tsirkin wrote:
> On Sun, Oct 11, 2009 at 08:59:05PM +0200, Gleb Natapov wrote:
> > Bit 0 is the enable bit, which we not only don't want to set, but
> > it will stick and make us think it's an I/O port resource.
> > 
> > Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > ---
> >  src/pciinit.c |    8 +++++---
> >  1 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/pciinit.c b/src/pciinit.c
> > index 1d0f784..29b3901 100644
> > --- a/src/pciinit.c
> > +++ b/src/pciinit.c
> > @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
> >              int ofs;
> >              u32 val, size;
> >  
> > -            if (i == PCI_ROM_SLOT)
> > +            if (i == PCI_ROM_SLOT) {
> >                  ofs = PCI_ROM_ADDRESS;
> > -            else
> > +                pci_config_writel(bdf, ofs, 0xfffffffe);
> > +            } else {
> >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > +                pci_config_writel(bdf, ofs, 0xffffffff);
> > +            }
> >              val = pci_config_readl(bdf, ofs);
> >              if (val != 0) {
> >                  size = (~(val & ~0xf)) + 1;
> 
> Hmm, eithe rinterpreet the spec strictly or loosely.
> 
> If you implement the spec loosely, you can just write 
> 0xfffffffe unconditionally: low bit is readonly for i/o and memory.
> 
> Strict interpretation of spec requires that you write 0
> into reserved bits. These are bits 1 to 10 for ROM,
> and bit 1 for I/O.
> 
> 
Send patch with your favorite interpretation to qemu pcbios/seabios.
The regression concern from my previous mail applicable here as well.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  6:44     ` Gleb Natapov
@ 2009-10-12  7:10       ` Michael S. Tsirkin
  2009-10-12  7:22         ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12  7:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 08:44:33AM +0200, Gleb Natapov wrote:
> On Sun, Oct 11, 2009 at 11:48:20PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> > > MMIO of some devices are not page aligned, such as some EHCI
> > > controllers and virtual Realtek NIC in guest. Current guest
> > > bios doesn't guarantee the start address of MMIO page aligned.
> > > This may result in failure of device assignment, because KVM
> > > only allow to register page aligned memory slots. For example,
> > > it fails to assign EHCI controller (its MMIO size is 1KB) with
> > > virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
> > > of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
> > > the EHCI controller will starts from 0xf2001400.
> > > 
> > > MMIO addresses in guest are allocated in guest bios. This patch
> > > makes MMIO address page aligned in bios, then fixes the issue.
> > > 
> > > qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f
> > > 
> > > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > 
> > This wastes memory for non-assigned devices.  I think it's better, and
> > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > devices if it wants page size alignment.
> > 
> We have three and a half devices in QEUM so I don't think memory is a
> big concern. Regardless, if you think that fiddle with assigned devices
> responses is better idea go ahead and send patches.

Even if you fiddle with BIOS, guest is allowed to reassign BARs,
breaking your assumptions.

> As it stands this
> patch is in kvm's bios and is required for assigned devices to work
> for some devices, so moving to seabios without this patch will introduce
> a regression.

I have a question here: if kvm maps a full physical page
into guest memory, while device only uses part of the page,
won't that mean that guest is granted access outside the
device, which it should not have?
Maybe the solution is to disable bypass for sub-page BARs and to
handle them in qemu, where we don't have alignment restrictions?

> > 
> > > ---
> > >  src/pciinit.c |    7 +++++++
> > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > index 29b3901..53fbfcf 100644
> > > --- a/src/pciinit.c
> > > +++ b/src/pciinit.c
> > > @@ -10,6 +10,7 @@
> > >  #include "biosvar.h" // GET_EBDA
> > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > >  #include "pci_regs.h" // PCI_COMMAND
> > > +#include "paravirt.h"
> > >  
> > >  #define PCI_ROM_SLOT 6
> > >  #define PCI_NUM_REGIONS 7
> > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > >                  *paddr = ALIGN(*paddr, size);
> > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > >                  *paddr += size;
> > > +                if (kvm_para_available()) {
> > > +                    /* make memory address page aligned */
> > > +                    /* needed for device assignment on kvm */
> > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > +               }
> > >              }
> > >          }
> > >          break;
> > > -- 
> > > 1.6.3.3
> > > 
> > > 
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  7:10       ` Michael S. Tsirkin
@ 2009-10-12  7:22         ` Gleb Natapov
  2009-10-12  8:13           ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12  7:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 09:10:52AM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 08:44:33AM +0200, Gleb Natapov wrote:
> > On Sun, Oct 11, 2009 at 11:48:20PM +0200, Michael S. Tsirkin wrote:
> > > On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> > > > MMIO of some devices are not page aligned, such as some EHCI
> > > > controllers and virtual Realtek NIC in guest. Current guest
> > > > bios doesn't guarantee the start address of MMIO page aligned.
> > > > This may result in failure of device assignment, because KVM
> > > > only allow to register page aligned memory slots. For example,
> > > > it fails to assign EHCI controller (its MMIO size is 1KB) with
> > > > virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
> > > > of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
> > > > the EHCI controller will starts from 0xf2001400.
> > > > 
> > > > MMIO addresses in guest are allocated in guest bios. This patch
> > > > makes MMIO address page aligned in bios, then fixes the issue.
> > > > 
> > > > qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f
> > > > 
> > > > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > > 
> > > This wastes memory for non-assigned devices.  I think it's better, and
> > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > devices if it wants page size alignment.
> > > 
> > We have three and a half devices in QEUM so I don't think memory is a
> > big concern. Regardless, if you think that fiddle with assigned devices
> > responses is better idea go ahead and send patches.
> 
> Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> breaking your assumptions.
Good point. So the fact that this patched helped its creator shows that
linux doesn't do this.

> > As it stands this
> > patch is in kvm's bios and is required for assigned devices to work
> > for some devices, so moving to seabios without this patch will introduce
> > a regression.
> 
> I have a question here: if kvm maps a full physical page
> into guest memory, while device only uses part of the page,
> won't that mean that guest is granted access outside the
> device, which it should not have?
And how is real HW different? It maps a full physical page into OS
memory even if BAR is smaller then page and grants OS access to
unassigned mmio region. Access unassigned mmio region shouldn't cause
any trouble, doesn't it?

> Maybe the solution is to disable bypass for sub-page BARs and to
> handle them in qemu, where we don't have alignment restrictions?
> 
Making fast path go through qemu for assigned devices? May be remove 
this pass through crap from kvm to save us all from this misery then? 

> > > 
> > > > ---
> > > >  src/pciinit.c |    7 +++++++
> > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > 
> > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > index 29b3901..53fbfcf 100644
> > > > --- a/src/pciinit.c
> > > > +++ b/src/pciinit.c
> > > > @@ -10,6 +10,7 @@
> > > >  #include "biosvar.h" // GET_EBDA
> > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > +#include "paravirt.h"
> > > >  
> > > >  #define PCI_ROM_SLOT 6
> > > >  #define PCI_NUM_REGIONS 7
> > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > >                  *paddr = ALIGN(*paddr, size);
> > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > >                  *paddr += size;
> > > > +                if (kvm_para_available()) {
> > > > +                    /* make memory address page aligned */
> > > > +                    /* needed for device assignment on kvm */
> > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > +               }
> > > >              }
> > > >          }
> > > >          break;
> > > > -- 
> > > > 1.6.3.3
> > > > 
> > > > 
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  7:22         ` Gleb Natapov
@ 2009-10-12  8:13           ` Michael S. Tsirkin
  2009-10-12  8:48             ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12  8:13 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 09:22:02AM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 09:10:52AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 08:44:33AM +0200, Gleb Natapov wrote:
> > > On Sun, Oct 11, 2009 at 11:48:20PM +0200, Michael S. Tsirkin wrote:
> > > > On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> > > > > MMIO of some devices are not page aligned, such as some EHCI
> > > > > controllers and virtual Realtek NIC in guest. Current guest
> > > > > bios doesn't guarantee the start address of MMIO page aligned.
> > > > > This may result in failure of device assignment, because KVM
> > > > > only allow to register page aligned memory slots. For example,
> > > > > it fails to assign EHCI controller (its MMIO size is 1KB) with
> > > > > virtual Realtek NIC (its MMIO size is 256Bytes), because MMIO
> > > > > of virtual Realtek NIC in guest starts from 0xf2001000, MMIO of
> > > > > the EHCI controller will starts from 0xf2001400.
> > > > > 
> > > > > MMIO addresses in guest are allocated in guest bios. This patch
> > > > > makes MMIO address page aligned in bios, then fixes the issue.
> > > > > 
> > > > > qemu-kvm commit ccc9b91a1fdfac4161461a519e10a233f5066d2f
> > > > > 
> > > > > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > > > 
> > > > This wastes memory for non-assigned devices.  I think it's better, and
> > > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > > devices if it wants page size alignment.
> > > > 
> > > We have three and a half devices in QEUM so I don't think memory is a
> > > big concern. Regardless, if you think that fiddle with assigned devices
> > > responses is better idea go ahead and send patches.
> > 
> > Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> > breaking your assumptions.
> Good point. So the fact that this patched helped its creator shows that
> linux doesn't do this.

Try hot-plugging the device instead of have it present on boot.
Patching BIOS won't help then, will it?  So my question is, if we need
to handle this in qemu, is it worth it to do it in kvm as well?

> > > As it stands this
> > > patch is in kvm's bios and is required for assigned devices to work
> > > for some devices, so moving to seabios without this patch will introduce
> > > a regression.
> > 
> > I have a question here: if kvm maps a full physical page
> > into guest memory, while device only uses part of the page,
> > won't that mean that guest is granted access outside the
> > device, which it should not have?
> And how is real HW different? It maps a full physical page into OS
> memory even if BAR is smaller then page and grants OS access to
> unassigned mmio region. Access unassigned mmio region shouldn't cause
> any trouble, doesn't it?

Unassigned - typically no, but there can be another device there, or a RAM
page.  It is different on real hardware where OS has access to all RAM and all
devices, anyway.

Here's an example from my laptop:

00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
        Subsystem: Lenovo Device 20e6
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
        Capabilities: <access denied>

...

00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
        Subsystem: Lenovo Device 20f8
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 28
        Region 0: I/O ports at 1c48 [size=8]
        Region 1: I/O ports at 183c [size=4]
        Region 2: I/O ports at 1c40 [size=8]
        Region 3: I/O ports at 1838 [size=4]
        Region 4: I/O ports at 1c20 [size=32]
        Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: <access denied>
        Kernel driver in use: ahci

In this setup, if you assign a page at address fc226000, for SATA,
I think that guest will be able to control Communication controller as well.

> > Maybe the solution is to disable bypass for sub-page BARs and to
> > handle them in qemu, where we don't have alignment restrictions?
> > 
> Making fast path go through qemu for assigned devices? May be remove 
> this pass through crap from kvm to save us all from this misery then? 

Another option is for KVM to check these scenarious and deny assignment if
there's such an overlap.

> > > > 
> > > > > ---
> > > > >  src/pciinit.c |    7 +++++++
> > > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > > 
> > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > index 29b3901..53fbfcf 100644
> > > > > --- a/src/pciinit.c
> > > > > +++ b/src/pciinit.c
> > > > > @@ -10,6 +10,7 @@
> > > > >  #include "biosvar.h" // GET_EBDA
> > > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > > +#include "paravirt.h"
> > > > >  
> > > > >  #define PCI_ROM_SLOT 6
> > > > >  #define PCI_NUM_REGIONS 7
> > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > > >                  *paddr = ALIGN(*paddr, size);
> > > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > > >                  *paddr += size;
> > > > > +                if (kvm_para_available()) {
> > > > > +                    /* make memory address page aligned */
> > > > > +                    /* needed for device assignment on kvm */
> > > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > > +               }
> > > > >              }
> > > > >          }
> > > > >          break;
> > > > > -- 
> > > > > 1.6.3.3
> > > > > 
> > > > > 
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  8:13           ` Michael S. Tsirkin
@ 2009-10-12  8:48             ` Gleb Natapov
  2009-10-12  9:43               ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12  8:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 10:13:14AM +0200, Michael S. Tsirkin wrote:
> > > > > 
> > > > > This wastes memory for non-assigned devices.  I think it's better, and
> > > > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > > > devices if it wants page size alignment.
> > > > > 
> > > > We have three and a half devices in QEUM so I don't think memory is a
> > > > big concern. Regardless, if you think that fiddle with assigned devices
> > > > responses is better idea go ahead and send patches.
> > > 
> > > Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> > > breaking your assumptions.
> > Good point. So the fact that this patched helped its creator shows that
> > linux doesn't do this.
> 
> Try hot-plugging the device instead of have it present on boot.
> Patching BIOS won't help then, will it?  So my question is, if we need
> to handle this in qemu, is it worth it to do it in kvm as well?
> 
It depend how linux assign mmio address to hot pluggable devices. How
can you be sure a device driver continue working if you'll misrepresent
BAR size BTW?

> > > > As it stands this
> > > > patch is in kvm's bios and is required for assigned devices to work
> > > > for some devices, so moving to seabios without this patch will introduce
> > > > a regression.
> > > 
> > > I have a question here: if kvm maps a full physical page
> > > into guest memory, while device only uses part of the page,
> > > won't that mean that guest is granted access outside the
> > > device, which it should not have?
> > And how is real HW different? It maps a full physical page into OS
> > memory even if BAR is smaller then page and grants OS access to
> > unassigned mmio region. Access unassigned mmio region shouldn't cause
> > any trouble, doesn't it?
> 
> Unassigned - typically no, but there can be another device there, or a RAM
> page.  It is different on real hardware where OS has access to all RAM and all
> devices, anyway.
> 
> Here's an example from my laptop:
> 
> 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
>         Subsystem: Lenovo Device 20e6
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
>         Latency: 0
>         Interrupt: pin A routed to IRQ 11
>         Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
>         Capabilities: <access denied>
> 
> ...
> 
> 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
>         Subsystem: Lenovo Device 20f8
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin B routed to IRQ 28
>         Region 0: I/O ports at 1c48 [size=8]
>         Region 1: I/O ports at 183c [size=4]
>         Region 2: I/O ports at 1c40 [size=8]
>         Region 3: I/O ports at 1838 [size=4]
>         Region 4: I/O ports at 1c20 [size=32]
>         Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
>         Capabilities: <access denied>
>         Kernel driver in use: ahci
> 
> In this setup, if you assign a page at address fc226000, for SATA,
> I think that guest will be able to control Communication controller as well.
Who configures BARs for assigned device guest or host? If host you can't
safely passthrough one of those devices. But passthrough is not secure
anyway since guest can DMA all over host memory.

> 
> > > Maybe the solution is to disable bypass for sub-page BARs and to
> > > handle them in qemu, where we don't have alignment restrictions?
> > > 
> > Making fast path go through qemu for assigned devices? May be remove 
> > this pass through crap from kvm to save us all from this misery then? 
> 
> Another option is for KVM to check these scenarious and deny assignment if
> there's such an overlap.
One more constrain for device assignment. Simple real life scenarios
don't work for our users as it is. Adding more constrains will not help.

> 
> > > > > 
> > > > > > ---
> > > > > >  src/pciinit.c |    7 +++++++
> > > > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > > > 
> > > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > > index 29b3901..53fbfcf 100644
> > > > > > --- a/src/pciinit.c
> > > > > > +++ b/src/pciinit.c
> > > > > > @@ -10,6 +10,7 @@
> > > > > >  #include "biosvar.h" // GET_EBDA
> > > > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > > > +#include "paravirt.h"
> > > > > >  
> > > > > >  #define PCI_ROM_SLOT 6
> > > > > >  #define PCI_NUM_REGIONS 7
> > > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > > > >                  *paddr = ALIGN(*paddr, size);
> > > > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > > > >                  *paddr += size;
> > > > > > +                if (kvm_para_available()) {
> > > > > > +                    /* make memory address page aligned */
> > > > > > +                    /* needed for device assignment on kvm */
> > > > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > > > +               }
> > > > > >              }
> > > > > >          }
> > > > > >          break;
> > > > > > -- 
> > > > > > 1.6.3.3
> > > > > > 
> > > > > > 
> > > > 
> > > > --
> > > > 			Gleb.
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  8:48             ` Gleb Natapov
@ 2009-10-12  9:43               ` Michael S. Tsirkin
  2009-10-12 10:06                 ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12  9:43 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 10:48:58AM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 10:13:14AM +0200, Michael S. Tsirkin wrote:
> > > > > > 
> > > > > > This wastes memory for non-assigned devices.  I think it's better, and
> > > > > > cleaner, to make qemu increase the BAR size up to 4K for assigned
> > > > > > devices if it wants page size alignment.
> > > > > > 
> > > > > We have three and a half devices in QEUM so I don't think memory is a
> > > > > big concern. Regardless, if you think that fiddle with assigned devices
> > > > > responses is better idea go ahead and send patches.
> > > > 
> > > > Even if you fiddle with BIOS, guest is allowed to reassign BARs,
> > > > breaking your assumptions.
> > > Good point. So the fact that this patched helped its creator shows that
> > > linux doesn't do this.
> > 
> > Try hot-plugging the device instead of have it present on boot.
> > Patching BIOS won't help then, will it?  So my question is, if we need
> > to handle this in qemu, is it worth it to do it in kvm as well?
> > 
> It depend how linux assign mmio address to hot pluggable devices. How
> can you be sure a device driver continue working if you'll misrepresent
> BAR size BTW?

Yes, this adds yet another way for device to discover it's running in a
VM, so this might break some drivers.  If we see many of these in
practice, we can try adding a PCI-to-PCI bridge with some dummy devices
behind it to the picture, to increase the chances to get a dedicated
memory page.

> > > > > As it stands this
> > > > > patch is in kvm's bios and is required for assigned devices to work
> > > > > for some devices, so moving to seabios without this patch will introduce
> > > > > a regression.
> > > > 
> > > > I have a question here: if kvm maps a full physical page
> > > > into guest memory, while device only uses part of the page,
> > > > won't that mean that guest is granted access outside the
> > > > device, which it should not have?
> > > And how is real HW different? It maps a full physical page into OS
> > > memory even if BAR is smaller then page and grants OS access to
> > > unassigned mmio region. Access unassigned mmio region shouldn't cause
> > > any trouble, doesn't it?
> > 
> > Unassigned - typically no, but there can be another device there, or a RAM
> > page.  It is different on real hardware where OS has access to all RAM and all
> > devices, anyway.
> > 
> > Here's an example from my laptop:
> > 
> > 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
> >         Subsystem: Lenovo Device 20e6
> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
> >         Latency: 0
> >         Interrupt: pin A routed to IRQ 11
> >         Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
> >         Capabilities: <access denied>
> > 
> > ...
> > 
> > 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
> >         Subsystem: Lenovo Device 20f8
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0
> >         Interrupt: pin B routed to IRQ 28
> >         Region 0: I/O ports at 1c48 [size=8]
> >         Region 1: I/O ports at 183c [size=4]
> >         Region 2: I/O ports at 1c40 [size=8]
> >         Region 3: I/O ports at 1838 [size=4]
> >         Region 4: I/O ports at 1c20 [size=32]
> >         Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
> >         Capabilities: <access denied>
> >         Kernel driver in use: ahci
> > 
> > In this setup, if you assign a page at address fc226000, for SATA,
> > I think that guest will be able to control Communication controller as well.
> Who configures BARs for assigned device guest or host?

Host.

> If host you can't safely passthrough one of those devices.

Why not?

> But passthrough is not secure anyway since guest can DMA all over host
> memory.


That's why we only enable it with I/O mmu, right?

> > 
> > > > Maybe the solution is to disable bypass for sub-page BARs and to
> > > > handle them in qemu, where we don't have alignment restrictions?
> > > > 
> > > Making fast path go through qemu for assigned devices? May be remove 
> > > this pass through crap from kvm to save us all from this misery then? 
> > 
> > Another option is for KVM to check these scenarious and deny assignment if
> > there's such an overlap.
> One more constrain for device assignment. Simple real life scenarios
> don't work for our users as it is. Adding more constrains will not help.

For linux host, you can force resource alignment using a kernel
parameter. What do you suggest? Ignore this issue?

> > 
> > > > > > 
> > > > > > > ---
> > > > > > >  src/pciinit.c |    7 +++++++
> > > > > > >  1 files changed, 7 insertions(+), 0 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > > > index 29b3901..53fbfcf 100644
> > > > > > > --- a/src/pciinit.c
> > > > > > > +++ b/src/pciinit.c
> > > > > > > @@ -10,6 +10,7 @@
> > > > > > >  #include "biosvar.h" // GET_EBDA
> > > > > > >  #include "pci_ids.h" // PCI_VENDOR_ID_INTEL
> > > > > > >  #include "pci_regs.h" // PCI_COMMAND
> > > > > > > +#include "paravirt.h"
> > > > > > >  
> > > > > > >  #define PCI_ROM_SLOT 6
> > > > > > >  #define PCI_NUM_REGIONS 7
> > > > > > > @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
> > > > > > >                  *paddr = ALIGN(*paddr, size);
> > > > > > >                  pci_set_io_region_addr(bdf, i, *paddr);
> > > > > > >                  *paddr += size;
> > > > > > > +                if (kvm_para_available()) {
> > > > > > > +                    /* make memory address page aligned */
> > > > > > > +                    /* needed for device assignment on kvm */
> > > > > > > +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> > > > > > > +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> > > > > > > +               }
> > > > > > >              }
> > > > > > >          }
> > > > > > >          break;
> > > > > > > -- 
> > > > > > > 1.6.3.3
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12  6:50     ` Gleb Natapov
@ 2009-10-12  9:52       ` Michael S. Tsirkin
  2009-10-12 10:08         ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12  9:52 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> Send patch with your favorite interpretation to qemu pcbios/seabios.
> The regression concern from my previous mail applicable here as well.

Okay. Can you ack the following?

For ROM, bit 0 is the enable bit, which we not only don't want
to set, but it will stick and make us think it's an I/O port
resource. For I/O and memory BARs bit 0 is read-only, so
it does not matter what we write there.

See Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 src/pciinit.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index eab082a..d315526 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -137,7 +137,7 @@ static void pci_bios_init_device(u16 bdf)
                 ofs = PCI_ROM_ADDRESS;
             else
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+            pci_config_writel(bdf, ofs, 0xfffffffe);
             val = pci_config_readl(bdf, ofs);
             if (val != 0) {
                 size = (~(val & ~0xf)) + 1;
-- 
1.6.5.rc2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-12  9:43               ` Michael S. Tsirkin
@ 2009-10-12 10:06                 ` Gleb Natapov
  0 siblings, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 10:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 11:43:35AM +0200, Michael S. Tsirkin wrote:
> > > Try hot-plugging the device instead of have it present on boot.
> > > Patching BIOS won't help then, will it?  So my question is, if we need
> > > to handle this in qemu, is it worth it to do it in kvm as well?
> > > 
> > It depend how linux assign mmio address to hot pluggable devices. How
> > can you be sure a device driver continue working if you'll misrepresent
> > BAR size BTW?
> 
> Yes, this adds yet another way for device to discover it's running in a
> VM, so this might break some drivers.  If we see many of these in
> practice, we can try adding a PCI-to-PCI bridge with some dummy devices
> behind it to the picture, to increase the chances to get a dedicated
> memory page.
> 
Go ahead. But why all this churn instead of asking linux to align pci
resources and do the same in a bios.

> > > > > > As it stands this
> > > > > > patch is in kvm's bios and is required for assigned devices to work
> > > > > > for some devices, so moving to seabios without this patch will introduce
> > > > > > a regression.
> > > > > 
> > > > > I have a question here: if kvm maps a full physical page
> > > > > into guest memory, while device only uses part of the page,
> > > > > won't that mean that guest is granted access outside the
> > > > > device, which it should not have?
> > > > And how is real HW different? It maps a full physical page into OS
> > > > memory even if BAR is smaller then page and grants OS access to
> > > > unassigned mmio region. Access unassigned mmio region shouldn't cause
> > > > any trouble, doesn't it?
> > > 
> > > Unassigned - typically no, but there can be another device there, or a RAM
> > > page.  It is different on real hardware where OS has access to all RAM and all
> > > devices, anyway.
> > > 
> > > Here's an example from my laptop:
> > > 
> > > 00:03.0 Communication controller: Intel Corporation Mobile 4 Series Chipset MEI Controller (rev 07)
> > >         Subsystem: Lenovo Device 20e6
> > >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
> > >         Latency: 0
> > >         Interrupt: pin A routed to IRQ 11
> > >         Region 0: Memory at fc226800 (64-bit, non-prefetchable) [size=16]
> > >         Capabilities: <access denied>
> > > 
> > > ...
> > > 
> > > 00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03) (prog-if 01 [AHCI 1.0])
> > >         Subsystem: Lenovo Device 20f8
> > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >         Latency: 0
> > >         Interrupt: pin B routed to IRQ 28
> > >         Region 0: I/O ports at 1c48 [size=8]
> > >         Region 1: I/O ports at 183c [size=4]
> > >         Region 2: I/O ports at 1c40 [size=8]
> > >         Region 3: I/O ports at 1838 [size=4]
> > >         Region 4: I/O ports at 1c20 [size=32]
> > >         Region 5: Memory at fc226000 (32-bit, non-prefetchable) [size=2K]
> > >         Capabilities: <access denied>
> > >         Kernel driver in use: ahci
> > > 
> > > In this setup, if you assign a page at address fc226000, for SATA,
> > > I think that guest will be able to control Communication controller as well.
> > Who configures BARs for assigned device guest or host?
> 
> Host.
Nice, device assignment is broken in one more way.

> 
> > If host you can't safely passthrough one of those devices.
> 
> Why not?
> 
For the reason you stated. If you map fc226000-fc227000 to a guest it
can control device at fc226800.

> > But passthrough is not secure anyway since guest can DMA all over host
> > memory.
> 
> 
> That's why we only enable it with I/O mmu, right?
> 
We do? Not sure about it. It is absolutely required if you want
security of course, but not everyone care.

> > > 
> > > > > Maybe the solution is to disable bypass for sub-page BARs and to
> > > > > handle them in qemu, where we don't have alignment restrictions?
> > > > > 
> > > > Making fast path go through qemu for assigned devices? May be remove 
> > > > this pass through crap from kvm to save us all from this misery then? 
> > > 
> > > Another option is for KVM to check these scenarious and deny assignment if
> > > there's such an overlap.
> > One more constrain for device assignment. Simple real life scenarios
> > don't work for our users as it is. Adding more constrains will not help.
> 
> For linux host, you can force resource alignment using a kernel
> parameter. What do you suggest? Ignore this issue?
> 
I suggest forcing resource alignment in a host using a kernel parameter if you
care about security.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12  9:52       ` Michael S. Tsirkin
@ 2009-10-12 10:08         ` Gleb Natapov
  2009-10-12 11:03           ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 10:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > The regression concern from my previous mail applicable here as well.
> 
> Okay. Can you ack the following?
> 
I can if you'll add PCI spec reference for me to double check.
Also I prefer strict spec reading :)

> For ROM, bit 0 is the enable bit, which we not only don't want
> to set, but it will stick and make us think it's an I/O port
> resource. For I/O and memory BARs bit 0 is read-only, so
> it does not matter what we write there.
> 
> See Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  src/pciinit.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index eab082a..d315526 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -137,7 +137,7 @@ static void pci_bios_init_device(u16 bdf)
>                  ofs = PCI_ROM_ADDRESS;
>              else
>                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> -            pci_config_writel(bdf, ofs, 0xffffffff);
> +            pci_config_writel(bdf, ofs, 0xfffffffe);
>              val = pci_config_readl(bdf, ofs);
>              if (val != 0) {
>                  size = (~(val & ~0xf)) + 1;
> -- 
> 1.6.5.rc2

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 10:08         ` Gleb Natapov
@ 2009-10-12 11:03           ` Michael S. Tsirkin
  2009-10-12 11:45             ` Michael S. Tsirkin
  2009-10-12 11:48             ` Gleb Natapov
  0 siblings, 2 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 11:03 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > The regression concern from my previous mail applicable here as well.
> > 
> > Okay. Can you ack the following?
> > 
> I can if you'll add PCI spec reference for me to double check.


> Also I prefer strict spec reading :)

OK, the issue is that reserved bits in BARs are not
defined as read-only.  So here's a strict one:
can you ack?

--->

seabios: fix ROM and I/O sizing

For ROM BARs, bit 0 is writeable (enable bit), which we not
only don't want to set, but it will stick and make us think
it's an I/O port resource.
Further, PCI spec defines the following bits as reserved:
- bit 1 in I/O BAR
- bits 10:1 in ROM BAR
and we should be careful and write 0 there.
For memory, bits 0-3 are reserved, so it's safe to handle it
in the same way as I/O.

See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
pages 225 and 228 in PCI spec revision 3.0.

See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 src/pciinit.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 1d0f784..29b3901 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
             int ofs;
             u32 val, size;
 
-            if (i == PCI_ROM_SLOT)
+            if (i == PCI_ROM_SLOT) {
                 ofs = PCI_ROM_ADDRESS;
-            else
+                pci_config_writel(bdf, ofs, PCI_ROM_ADDRESS_MASK);
+            } else {
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+                pci_config_writel(bdf, ofs, PCI_BASE_ADDRESS_IO_MASK);
+            }
             val = pci_config_readl(bdf, ofs);
             if (val != 0) {
                 size = (~(val & ~0xf)) + 1;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 11:03           ` Michael S. Tsirkin
@ 2009-10-12 11:45             ` Michael S. Tsirkin
  2009-10-12 11:48             ` Gleb Natapov
  1 sibling, 0 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 11:45 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 01:03:36PM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> > On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > > The regression concern from my previous mail applicable here as well.
> > > 
> > > Okay. Can you ack the following?
> > > 
> > I can if you'll add PCI spec reference for me to double check.
> 
> 
> > Also I prefer strict spec reading :)
> 
> OK, the issue is that reserved bits in BARs are not
> defined as read-only.  So here's a strict one:
> can you ack?

Actually, that's not right either.
It says on page 214:

	Software must take care to deal correctly with bit-encoded fields that have
	some bits reserved for future use. On reads, software must use appropriate
	masks to extract the defined bits, and may not rely on reserved bits being any
	particular value. On writes, software must ensure that the values of reserved
	bit positions are preserved; that is, the values of reserved bit positions must
	first be read, merged with the new values for other bit positions and the data
	then written back.

So let's do this. Patch forthcoming.


> --->
> 
> seabios: fix ROM and I/O sizing
> 
> For ROM BARs, bit 0 is writeable (enable bit), which we not
> only don't want to set, but it will stick and make us think
> it's an I/O port resource.
> Further, PCI spec defines the following bits as reserved:
> - bit 1 in I/O BAR
> - bits 10:1 in ROM BAR
> and we should be careful and write 0 there.
> For memory, bits 0-3 are reserved, so it's safe to handle it
> in the same way as I/O.
> 
> See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
> pages 225 and 228 in PCI spec revision 3.0.
> 
> See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  src/pciinit.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 1d0f784..29b3901 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
>              int ofs;
>              u32 val, size;
>  
> -            if (i == PCI_ROM_SLOT)
> +            if (i == PCI_ROM_SLOT) {
>                  ofs = PCI_ROM_ADDRESS;
> -            else
> +                pci_config_writel(bdf, ofs, PCI_ROM_ADDRESS_MASK);
> +            } else {
>                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> -            pci_config_writel(bdf, ofs, 0xffffffff);
> +                pci_config_writel(bdf, ofs, PCI_BASE_ADDRESS_IO_MASK);
> +            }
>              val = pci_config_readl(bdf, ofs);
>              if (val != 0) {
>                  size = (~(val & ~0xf)) + 1;
> -- 
> 1.6.3.3
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 11:03           ` Michael S. Tsirkin
  2009-10-12 11:45             ` Michael S. Tsirkin
@ 2009-10-12 11:48             ` Gleb Natapov
  2009-10-12 11:59               ` Michael S. Tsirkin
  1 sibling, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 11:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 01:03:36PM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> > On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > > The regression concern from my previous mail applicable here as well.
> > > 
> > > Okay. Can you ack the following?
> > > 
> > I can if you'll add PCI spec reference for me to double check.
> 
> 
> > Also I prefer strict spec reading :)
> 
> OK, the issue is that reserved bits in BARs are not
> defined as read-only.  So here's a strict one:
> can you ack?
> 
> --->
> 
> seabios: fix ROM and I/O sizing
> 
> For ROM BARs, bit 0 is writeable (enable bit), which we not
> only don't want to set, but it will stick and make us think
> it's an I/O port resource.
> Further, PCI spec defines the following bits as reserved:
> - bit 1 in I/O BAR
> - bits 10:1 in ROM BAR
> and we should be careful and write 0 there.
> For memory, bits 0-3 are reserved, so it's safe to handle it
> in the same way as I/O.
> 
> See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
> pages 225 and 228 in PCI spec revision 3.0.
> 
Section 6.2.5.1 says:
Software saves the original value of the Base Address register, writes
0 FFFF FFFFh to the register, then reads it back.

Section 6.2.5.2 says:
Device independent configuration software can determine how much address
space the device requires by writing a value of all 1's to the address
portion of the register and then reading the value back. (address
portion are bits 31-11).

So we should write PCI_ROM_ADDRESS_MASK in case of ROM and 0xffffffff
in case of regular BAR.

> See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  src/pciinit.c |    8 +++++---
>  1 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 1d0f784..29b3901 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
>              int ofs;
>              u32 val, size;
>  
> -            if (i == PCI_ROM_SLOT)
> +            if (i == PCI_ROM_SLOT) {
>                  ofs = PCI_ROM_ADDRESS;
> -            else
> +                pci_config_writel(bdf, ofs, PCI_ROM_ADDRESS_MASK);
> +            } else {
>                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> -            pci_config_writel(bdf, ofs, 0xffffffff);
> +                pci_config_writel(bdf, ofs, PCI_BASE_ADDRESS_IO_MASK);
> +            }
>              val = pci_config_readl(bdf, ofs);
>              if (val != 0) {
>                  size = (~(val & ~0xf)) + 1;
> -- 
> 1.6.3.3
> 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 11:48             ` Gleb Natapov
@ 2009-10-12 11:59               ` Michael S. Tsirkin
  2009-10-12 12:08                 ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 11:59 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 01:48:41PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 01:03:36PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> > > On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > > > The regression concern from my previous mail applicable here as well.
> > > > 
> > > > Okay. Can you ack the following?
> > > > 
> > > I can if you'll add PCI spec reference for me to double check.
> > 
> > 
> > > Also I prefer strict spec reading :)
> > 
> > OK, the issue is that reserved bits in BARs are not
> > defined as read-only.  So here's a strict one:
> > can you ack?
> > 
> > --->
> > 
> > seabios: fix ROM and I/O sizing
> > 
> > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > only don't want to set, but it will stick and make us think
> > it's an I/O port resource.
> > Further, PCI spec defines the following bits as reserved:
> > - bit 1 in I/O BAR
> > - bits 10:1 in ROM BAR
> > and we should be careful and write 0 there.
> > For memory, bits 0-3 are reserved, so it's safe to handle it
> > in the same way as I/O.
> > 
> > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
> > pages 225 and 228 in PCI spec revision 3.0.
> > 
> Section 6.2.5.1 says:
> Software saves the original value of the Base Address register, writes
> 0 FFFF FFFFh to the register, then reads it back.

I think you miss something.  Here it is in full:

	Decode (I/O or memory) of a register is disabled via the command register before sizing a
	Base Address register. Software saves the original value of the Base Address register, writes
	0 FFFF FFFFh to the register, then reads it back. Size calculation can be done from the
	32-bit value read by first clearing encoding information bits (bit 0 for I/O, bits 0-3 for
	memory), inverting all 32 bits (logical NOT), then incrementing by 1. The resultant 32-bit
	value is the memory/I/O range size decoded by the register. Note that the upper 16 bits of
	the result is ignored if the Base Address register is for I/O and bits 16-31 returned zero
	upon read. The original value in the Base Address register is restored before re-enabling
	decode in the command register of the device.

Note the bit about restoring back the original value.
You can not assume that reserved bits are read-only.

> Section 6.2.5.2 says:
> Device independent configuration software can determine how much address
> space the device requires by writing a value of all 1's to the address
> portion of the register and then reading the value back. (address
> portion are bits 31-11).

We must also save and restore the lower bits.

> So we should write PCI_ROM_ADDRESS_MASK in case of ROM and 0xffffffff
> in case of regular BAR.
> 
> > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  src/pciinit.c |    8 +++++---
> >  1 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/pciinit.c b/src/pciinit.c
> > index 1d0f784..29b3901 100644
> > --- a/src/pciinit.c
> > +++ b/src/pciinit.c
> > @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
> >              int ofs;
> >              u32 val, size;
> >  
> > -            if (i == PCI_ROM_SLOT)
> > +            if (i == PCI_ROM_SLOT) {
> >                  ofs = PCI_ROM_ADDRESS;
> > -            else
> > +                pci_config_writel(bdf, ofs, PCI_ROM_ADDRESS_MASK);
> > +            } else {
> >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > +                pci_config_writel(bdf, ofs, PCI_BASE_ADDRESS_IO_MASK);
> > +            }
> >              val = pci_config_readl(bdf, ofs);
> >              if (val != 0) {
> >                  size = (~(val & ~0xf)) + 1;
> > -- 
> > 1.6.3.3
> > 
> > 
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 11:59               ` Michael S. Tsirkin
@ 2009-10-12 12:08                 ` Gleb Natapov
  2009-10-12 13:20                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 12:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 01:59:16PM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 01:48:41PM +0200, Gleb Natapov wrote:
> > On Mon, Oct 12, 2009 at 01:03:36PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> > > > On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > > > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > > > > The regression concern from my previous mail applicable here as well.
> > > > > 
> > > > > Okay. Can you ack the following?
> > > > > 
> > > > I can if you'll add PCI spec reference for me to double check.
> > > 
> > > 
> > > > Also I prefer strict spec reading :)
> > > 
> > > OK, the issue is that reserved bits in BARs are not
> > > defined as read-only.  So here's a strict one:
> > > can you ack?
> > > 
> > > --->
> > > 
> > > seabios: fix ROM and I/O sizing
> > > 
> > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > only don't want to set, but it will stick and make us think
> > > it's an I/O port resource.
> > > Further, PCI spec defines the following bits as reserved:
> > > - bit 1 in I/O BAR
> > > - bits 10:1 in ROM BAR
> > > and we should be careful and write 0 there.
> > > For memory, bits 0-3 are reserved, so it's safe to handle it
> > > in the same way as I/O.
> > > 
> > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
> > > pages 225 and 228 in PCI spec revision 3.0.
> > > 
> > Section 6.2.5.1 says:
> > Software saves the original value of the Base Address register, writes
> > 0 FFFF FFFFh to the register, then reads it back.
> 
> I think you miss something.  Here it is in full:
> 
> 	Decode (I/O or memory) of a register is disabled via the command register before sizing a
> 	Base Address register. Software saves the original value of the Base Address register, writes
> 	0 FFFF FFFFh to the register, then reads it back. Size calculation can be done from the
> 	32-bit value read by first clearing encoding information bits (bit 0 for I/O, bits 0-3 for
> 	memory), inverting all 32 bits (logical NOT), then incrementing by 1. The resultant 32-bit
> 	value is the memory/I/O range size decoded by the register. Note that the upper 16 bits of
> 	the result is ignored if the Base Address register is for I/O and bits 16-31 returned zero
> 	upon read. The original value in the Base Address register is restored before re-enabling
> 	decode in the command register of the device.
> 
> Note the bit about restoring back the original value.
> You can not assume that reserved bits are read-only.
> 
I assume nothing. I am saying the code was correct in writing 0xffffffff
there. If coded does not restore original value that is another issue.

> > Section 6.2.5.2 says:
> > Device independent configuration software can determine how much address
> > space the device requires by writing a value of all 1's to the address
> > portion of the register and then reading the value back. (address
> > portion are bits 31-11).
> 
> We must also save and restore the lower bits.
> 
> > So we should write PCI_ROM_ADDRESS_MASK in case of ROM and 0xffffffff
> > in case of regular BAR.
> > 
> > > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  src/pciinit.c |    8 +++++---
> > >  1 files changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > index 1d0f784..29b3901 100644
> > > --- a/src/pciinit.c
> > > +++ b/src/pciinit.c
> > > @@ -139,11 +139,13 @@ static void pci_bios_init_device(u16 bdf)
> > >              int ofs;
> > >              u32 val, size;
> > >  
> > > -            if (i == PCI_ROM_SLOT)
> > > +            if (i == PCI_ROM_SLOT) {
> > >                  ofs = PCI_ROM_ADDRESS;
> > > -            else
> > > +                pci_config_writel(bdf, ofs, PCI_ROM_ADDRESS_MASK);
> > > +            } else {
> > >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > > +                pci_config_writel(bdf, ofs, PCI_BASE_ADDRESS_IO_MASK);
> > > +            }
> > >              val = pci_config_readl(bdf, ofs);
> > >              if (val != 0) {
> > >                  size = (~(val & ~0xf)) + 1;
> > > -- 
> > > 1.6.3.3
> > > 
> > > 
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 12:08                 ` Gleb Natapov
@ 2009-10-12 13:20                   ` Michael S. Tsirkin
  2009-10-12 13:29                     ` Gleb Natapov
  2009-10-12 14:20                     ` [Qemu-devel] seabios: fix low bits in ROM and I/O sizing Michael S. Tsirkin
  0 siblings, 2 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 13:20 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 02:08:57PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 01:59:16PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 01:48:41PM +0200, Gleb Natapov wrote:
> > > On Mon, Oct 12, 2009 at 01:03:36PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 12, 2009 at 12:08:21PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Oct 12, 2009 at 11:52:25AM +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Oct 12, 2009 at 08:50:24AM +0200, Gleb Natapov wrote:
> > > > > > > Send patch with your favorite interpretation to qemu pcbios/seabios.
> > > > > > > The regression concern from my previous mail applicable here as well.
> > > > > > 
> > > > > > Okay. Can you ack the following?
> > > > > > 
> > > > > I can if you'll add PCI spec reference for me to double check.
> > > > 
> > > > 
> > > > > Also I prefer strict spec reading :)
> > > > 
> > > > OK, the issue is that reserved bits in BARs are not
> > > > defined as read-only.  So here's a strict one:
> > > > can you ack?
> > > > 
> > > > --->
> > > > 
> > > > seabios: fix ROM and I/O sizing
> > > > 
> > > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > > only don't want to set, but it will stick and make us think
> > > > it's an I/O port resource.
> > > > Further, PCI spec defines the following bits as reserved:
> > > > - bit 1 in I/O BAR
> > > > - bits 10:1 in ROM BAR
> > > > and we should be careful and write 0 there.
> > > > For memory, bits 0-3 are reserved, so it's safe to handle it
> > > > in the same way as I/O.
> > > > 
> > > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM;
> > > > pages 225 and 228 in PCI spec revision 3.0.
> > > > 
> > > Section 6.2.5.1 says:
> > > Software saves the original value of the Base Address register, writes
> > > 0 FFFF FFFFh to the register, then reads it back.
> > 
> > I think you miss something.  Here it is in full:
> > 
> > 	Decode (I/O or memory) of a register is disabled via the command register before sizing a
> > 	Base Address register. Software saves the original value of the Base Address register, writes
> > 	0 FFFF FFFFh to the register, then reads it back. Size calculation can be done from the
> > 	32-bit value read by first clearing encoding information bits (bit 0 for I/O, bits 0-3 for
> > 	memory), inverting all 32 bits (logical NOT), then incrementing by 1. The resultant 32-bit
> > 	value is the memory/I/O range size decoded by the register. Note that the upper 16 bits of
> > 	the result is ignored if the Base Address register is for I/O and bits 16-31 returned zero
> > 	upon read. The original value in the Base Address register is restored before re-enabling
> > 	decode in the command register of the device.
> > 
> > Note the bit about restoring back the original value.
> > You can not assume that reserved bits are read-only.
> > 
> I assume nothing. I am saying the code was correct in writing 0xffffffff
> there. If coded does not restore original value that is another issue.


Right. And there's another bug that I see and that is that size for I/O
BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
could be 4 bytes.  So here's a patch that addresses all 3 issues:
Ack?

---->

Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing

This cleans up handling of low bits during BAR sizing,
to match PCI spec requirements, and to use symbolic
constants from pci_regs.h

Issues fixed:
For ROM BARs, bit 0 is writeable (enable bit), which we not
only don't want to set, but it will stick and make us think
it's an I/O port resource.
Further, PCI spec defines the following bits as reserved:
- bit 1 in I/O BAR
- bits 10:1 in ROM BAR
and we should be careful and preserve any values there.
Bits 3:2 in I/O BAR might be writeable, so it
is wrong to mask them when calculating BAR size.

Spec references:
See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
6.1 for reserved bit handling;
pages 225, 228 and 214 in PCI spec revision 3.0.

See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Original-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/src/pciinit.c b/src/pciinit.c
index 7d2ea00..f9ebd61 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
         /* default memory mappings */
         for (i = 0; i < PCI_NUM_REGIONS; i++) {
             int ofs;
-            u32 val, size;
-
+            u32 val, mask, size;
             if (i == PCI_ROM_SLOT)
                 ofs = PCI_ROM_ADDRESS;
             else
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+
+            val = pci_config_readl(bdf, ofs);
+            if (i == PCI_ROM_SLOT)
+		mask = PCI_ROM_ADDRESS_MASK;
+            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
+                mask = PCI_BASE_ADDRESS_IO_MASK;
+            else
+                mask = PCI_BASE_ADDRESS_MEM_MASK;
+            pci_config_writel(bdf, ofs, val | mask);
             val = pci_config_readl(bdf, ofs);
             if (val != 0) {
-                size = (~(val & ~0xf)) + 1;
+                size = (~(val & mask)) + 1;
                 if (val & PCI_BASE_ADDRESS_SPACE_IO)
                     paddr = &pci_bios_io_addr;
                 else if (size >= 0x04000000)
-- 
1.6.5.rc2

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 13:20                   ` Michael S. Tsirkin
@ 2009-10-12 13:29                     ` Gleb Natapov
  2009-10-12 13:51                       ` Michael S. Tsirkin
  2009-10-12 14:20                     ` [Qemu-devel] seabios: fix low bits in ROM and I/O sizing Michael S. Tsirkin
  1 sibling, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 13:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> Right. And there's another bug that I see and that is that size for I/O
> BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> could be 4 bytes.  So here's a patch that addresses all 3 issues:
> Ack?
> 
Pleas do whatever spec recommends and use 0xffffffff to size the BAR.
Specs says this, linux does this, lets not be different.

> 
> ---->
> 
> Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> 
> This cleans up handling of low bits during BAR sizing,
> to match PCI spec requirements, and to use symbolic
> constants from pci_regs.h
> 
> Issues fixed:
> For ROM BARs, bit 0 is writeable (enable bit), which we not
> only don't want to set, but it will stick and make us think
> it's an I/O port resource.
> Further, PCI spec defines the following bits as reserved:
> - bit 1 in I/O BAR
> - bits 10:1 in ROM BAR
> and we should be careful and preserve any values there.
> Bits 3:2 in I/O BAR might be writeable, so it
> is wrong to mask them when calculating BAR size.
> 
> Spec references:
> See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> 6.1 for reserved bit handling;
> pages 225, 228 and 214 in PCI spec revision 3.0.
> 
> See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> 
> Original-by: Gleb Natapov <gleb@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 7d2ea00..f9ebd61 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
>          /* default memory mappings */
>          for (i = 0; i < PCI_NUM_REGIONS; i++) {
>              int ofs;
> -            u32 val, size;
> -
> +            u32 val, mask, size;
>              if (i == PCI_ROM_SLOT)
>                  ofs = PCI_ROM_ADDRESS;
>              else
>                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> -            pci_config_writel(bdf, ofs, 0xffffffff);
> +
> +            val = pci_config_readl(bdf, ofs);
> +            if (i == PCI_ROM_SLOT)
> +		mask = PCI_ROM_ADDRESS_MASK;
> +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> +                mask = PCI_BASE_ADDRESS_IO_MASK;
> +            else
> +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> +            pci_config_writel(bdf, ofs, val | mask);
>              val = pci_config_readl(bdf, ofs);
>              if (val != 0) {
> -                size = (~(val & ~0xf)) + 1;
> +                size = (~(val & mask)) + 1;
>                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
>                      paddr = &pci_bios_io_addr;
>                  else if (size >= 0x04000000)
> -- 
> 1.6.5.rc2
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 13:29                     ` Gleb Natapov
@ 2009-10-12 13:51                       ` Michael S. Tsirkin
  2009-10-12 14:04                         ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 13:51 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 03:29:25PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> > Right. And there's another bug that I see and that is that size for I/O
> > BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> > could be 4 bytes.  So here's a patch that addresses all 3 issues:
> > Ack?
> > 
> Pleas do whatever spec recommends and use 0xffffffff to size the BAR.

What we do is equivalent: since next thing we override the BAR value,
there's no reason for the save/restore dance.  But if you like it this
way, OK. Send a patch. Note this applies to I/O and memory only,
ROM has to be sized the way I coded it.

> Specs says this, linux does this, lets not be different.

BTW, note that what linux does for ROM is incorrect:
the implementation note you cite only applies to I/O and memory BARs.
I'll send a patch to lkml when I have the time.

> > 
> > ---->
> > 
> > Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> > 
> > This cleans up handling of low bits during BAR sizing,
> > to match PCI spec requirements, and to use symbolic
> > constants from pci_regs.h
> > 
> > Issues fixed:
> > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > only don't want to set, but it will stick and make us think
> > it's an I/O port resource.
> > Further, PCI spec defines the following bits as reserved:
> > - bit 1 in I/O BAR
> > - bits 10:1 in ROM BAR
> > and we should be careful and preserve any values there.
> > Bits 3:2 in I/O BAR might be writeable, so it
> > is wrong to mask them when calculating BAR size.
> > 
> > Spec references:
> > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> > 6.1 for reserved bit handling;
> > pages 225, 228 and 214 in PCI spec revision 3.0.
> > 
> > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > 
> > Original-by: Gleb Natapov <gleb@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/src/pciinit.c b/src/pciinit.c
> > index 7d2ea00..f9ebd61 100644
> > --- a/src/pciinit.c
> > +++ b/src/pciinit.c
> > @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
> >          /* default memory mappings */
> >          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> >              int ofs;
> > -            u32 val, size;
> > -
> > +            u32 val, mask, size;
> >              if (i == PCI_ROM_SLOT)
> >                  ofs = PCI_ROM_ADDRESS;
> >              else
> >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > +
> > +            val = pci_config_readl(bdf, ofs);
> > +            if (i == PCI_ROM_SLOT)
> > +		mask = PCI_ROM_ADDRESS_MASK;
> > +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > +                mask = PCI_BASE_ADDRESS_IO_MASK;
> > +            else
> > +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> > +            pci_config_writel(bdf, ofs, val | mask);
> >              val = pci_config_readl(bdf, ofs);
> >              if (val != 0) {
> > -                size = (~(val & ~0xf)) + 1;
> > +                size = (~(val & mask)) + 1;
> >                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
> >                      paddr = &pci_bios_io_addr;
> >                  else if (size >= 0x04000000)
> > -- 
> > 1.6.5.rc2
> > 
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 13:51                       ` Michael S. Tsirkin
@ 2009-10-12 14:04                         ` Gleb Natapov
  2009-10-12 14:11                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 14:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 03:51:12PM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 03:29:25PM +0200, Gleb Natapov wrote:
> > On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> > > Right. And there's another bug that I see and that is that size for I/O
> > > BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> > > could be 4 bytes.  So here's a patch that addresses all 3 issues:
> > > Ack?
> > > 
> > Pleas do whatever spec recommends and use 0xffffffff to size the BAR.
> 
> What we do is equivalent: since next thing we override the BAR value,
> there's no reason for the save/restore dance.  But if you like it this
That what spec says if you are not trying to interpret it too hard. And
simple is always better.

> way, OK. Send a patch. Note this applies to I/O and memory only,
> ROM has to be sized the way I coded it.
> 
> > Specs says this, linux does this, lets not be different.
> 
> BTW, note that what linux does for ROM is incorrect:
> the implementation note you cite only applies to I/O and memory BARs.
> I'll send a patch to lkml when I have the time.
> 
It does mask = type ? ~PCI_ROM_ADDRESS_ENABLE : ~0; so it definitely
treats ROM specially, You may argue that it shouldn't set bits 1-10
though.

The patch that currently in KVM bios (the one I've sent to seabios
initially) uses the same technique to size the BAR as Linux kernel. The only
thing missing is restoring of old value.

> > > 
> > > ---->
> > > 
> > > Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> > > 
> > > This cleans up handling of low bits during BAR sizing,
> > > to match PCI spec requirements, and to use symbolic
> > > constants from pci_regs.h
> > > 
> > > Issues fixed:
> > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > only don't want to set, but it will stick and make us think
> > > it's an I/O port resource.
> > > Further, PCI spec defines the following bits as reserved:
> > > - bit 1 in I/O BAR
> > > - bits 10:1 in ROM BAR
> > > and we should be careful and preserve any values there.
> > > Bits 3:2 in I/O BAR might be writeable, so it
> > > is wrong to mask them when calculating BAR size.
> > > 
> > > Spec references:
> > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> > > 6.1 for reserved bit handling;
> > > pages 225, 228 and 214 in PCI spec revision 3.0.
> > > 
> > > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > > 
> > > Original-by: Gleb Natapov <gleb@redhat.com>
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > index 7d2ea00..f9ebd61 100644
> > > --- a/src/pciinit.c
> > > +++ b/src/pciinit.c
> > > @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
> > >          /* default memory mappings */
> > >          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > >              int ofs;
> > > -            u32 val, size;
> > > -
> > > +            u32 val, mask, size;
> > >              if (i == PCI_ROM_SLOT)
> > >                  ofs = PCI_ROM_ADDRESS;
> > >              else
> > >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > > +
> > > +            val = pci_config_readl(bdf, ofs);
> > > +            if (i == PCI_ROM_SLOT)
> > > +		mask = PCI_ROM_ADDRESS_MASK;
> > > +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > +                mask = PCI_BASE_ADDRESS_IO_MASK;
> > > +            else
> > > +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> > > +            pci_config_writel(bdf, ofs, val | mask);
> > >              val = pci_config_readl(bdf, ofs);
> > >              if (val != 0) {
> > > -                size = (~(val & ~0xf)) + 1;
> > > +                size = (~(val & mask)) + 1;
> > >                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > >                      paddr = &pci_bios_io_addr;
> > >                  else if (size >= 0x04000000)
> > > -- 
> > > 1.6.5.rc2
> > > 
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 14:04                         ` Gleb Natapov
@ 2009-10-12 14:11                           ` Michael S. Tsirkin
  2009-10-12 14:17                             ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 14:11 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 04:04:25PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 03:51:12PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 03:29:25PM +0200, Gleb Natapov wrote:
> > > On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> > > > Right. And there's another bug that I see and that is that size for I/O
> > > > BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> > > > could be 4 bytes.  So here's a patch that addresses all 3 issues:
> > > > Ack?
> > > > 
> > > Pleas do whatever spec recommends and use 0xffffffff to size the BAR.
> > 
> > What we do is equivalent: since next thing we override the BAR value,
> > there's no reason for the save/restore dance.  But if you like it this
> That what spec says if you are not trying to interpret it too hard.
> And simple is always better.

Implementation note is not a recommentation, just one way to do things.
We don't need to restore old BAR so no reason to do this, but
I won't lose sleep if we do this either. So go ahead.

> 
> > way, OK. Send a patch. Note this applies to I/O and memory only,
> > ROM has to be sized the way I coded it.
> > 
> > > Specs says this, linux does this, lets not be different.
> > 
> > BTW, note that what linux does for ROM is incorrect:
> > the implementation note you cite only applies to I/O and memory BARs.
> > I'll send a patch to lkml when I have the time.
> > 
> It does mask = type ? ~PCI_ROM_ADDRESS_ENABLE : ~0; so it definitely
> treats ROM specially, You may argue that it shouldn't set bits 1-10
> though.

Right.

> The patch that currently in KVM bios (the one I've sent to seabios
> initially) uses the same technique to size the BAR as Linux kernel. The only
> thing missing is restoring of old value.

That, too. Also let's use symbolic constants, and fix size math:
(val & ~0xf) is wrong for both ROM and I/O.

> > > > 
> > > > ---->
> > > > 
> > > > Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> > > > 
> > > > This cleans up handling of low bits during BAR sizing,
> > > > to match PCI spec requirements, and to use symbolic
> > > > constants from pci_regs.h
> > > > 
> > > > Issues fixed:
> > > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > > only don't want to set, but it will stick and make us think
> > > > it's an I/O port resource.
> > > > Further, PCI spec defines the following bits as reserved:
> > > > - bit 1 in I/O BAR
> > > > - bits 10:1 in ROM BAR
> > > > and we should be careful and preserve any values there.
> > > > Bits 3:2 in I/O BAR might be writeable, so it
> > > > is wrong to mask them when calculating BAR size.
> > > > 
> > > > Spec references:
> > > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> > > > 6.1 for reserved bit handling;
> > > > pages 225, 228 and 214 in PCI spec revision 3.0.
> > > > 
> > > > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > > > 
> > > > Original-by: Gleb Natapov <gleb@redhat.com>
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > 
> > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > index 7d2ea00..f9ebd61 100644
> > > > --- a/src/pciinit.c
> > > > +++ b/src/pciinit.c
> > > > @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
> > > >          /* default memory mappings */
> > > >          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > >              int ofs;
> > > > -            u32 val, size;
> > > > -
> > > > +            u32 val, mask, size;
> > > >              if (i == PCI_ROM_SLOT)
> > > >                  ofs = PCI_ROM_ADDRESS;
> > > >              else
> > > >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > > > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > > > +
> > > > +            val = pci_config_readl(bdf, ofs);
> > > > +            if (i == PCI_ROM_SLOT)
> > > > +		mask = PCI_ROM_ADDRESS_MASK;
> > > > +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > > +                mask = PCI_BASE_ADDRESS_IO_MASK;
> > > > +            else
> > > > +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> > > > +            pci_config_writel(bdf, ofs, val | mask);
> > > >              val = pci_config_readl(bdf, ofs);
> > > >              if (val != 0) {
> > > > -                size = (~(val & ~0xf)) + 1;
> > > > +                size = (~(val & mask)) + 1;
> > > >                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > >                      paddr = &pci_bios_io_addr;
> > > >                  else if (size >= 0x04000000)
> > > > -- 
> > > > 1.6.5.rc2
> > > > 
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 14:11                           ` Michael S. Tsirkin
@ 2009-10-12 14:17                             ` Gleb Natapov
  2009-10-12 14:24                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-12 14:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 04:11:29PM +0200, Michael S. Tsirkin wrote:
> On Mon, Oct 12, 2009 at 04:04:25PM +0200, Gleb Natapov wrote:
> > On Mon, Oct 12, 2009 at 03:51:12PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 12, 2009 at 03:29:25PM +0200, Gleb Natapov wrote:
> > > > On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> > > > > Right. And there's another bug that I see and that is that size for I/O
> > > > > BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> > > > > could be 4 bytes.  So here's a patch that addresses all 3 issues:
> > > > > Ack?
> > > > > 
> > > > Pleas do whatever spec recommends and use 0xffffffff to size the BAR.
> > > 
> > > What we do is equivalent: since next thing we override the BAR value,
> > > there's no reason for the save/restore dance.  But if you like it this
> > That what spec says if you are not trying to interpret it too hard.
> > And simple is always better.
> 
> Implementation note is not a recommentation, just one way to do things.
Agree. But I prefer to use a way recommended by spec. It much less
likely to break.

> We don't need to restore old BAR so no reason to do this, but
We need to restore lower bits.

> I won't lose sleep if we do this either. So go ahead.
> 
> > 
> > > way, OK. Send a patch. Note this applies to I/O and memory only,
> > > ROM has to be sized the way I coded it.
> > > 
> > > > Specs says this, linux does this, lets not be different.
> > > 
> > > BTW, note that what linux does for ROM is incorrect:
> > > the implementation note you cite only applies to I/O and memory BARs.
> > > I'll send a patch to lkml when I have the time.
> > > 
> > It does mask = type ? ~PCI_ROM_ADDRESS_ENABLE : ~0; so it definitely
> > treats ROM specially, You may argue that it shouldn't set bits 1-10
> > though.
> 
> Right.
> 
> > The patch that currently in KVM bios (the one I've sent to seabios
> > initially) uses the same technique to size the BAR as Linux kernel. The only
> > thing missing is restoring of old value.
> 
> That, too. Also let's use symbolic constants, and fix size math:
> (val & ~0xf) is wrong for both ROM and I/O.
> 
0xf fix should be really in a separate patch.

> > > > > 
> > > > > ---->
> > > > > 
> > > > > Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> > > > > 
> > > > > This cleans up handling of low bits during BAR sizing,
> > > > > to match PCI spec requirements, and to use symbolic
> > > > > constants from pci_regs.h
> > > > > 
> > > > > Issues fixed:
> > > > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > > > only don't want to set, but it will stick and make us think
> > > > > it's an I/O port resource.
> > > > > Further, PCI spec defines the following bits as reserved:
> > > > > - bit 1 in I/O BAR
> > > > > - bits 10:1 in ROM BAR
> > > > > and we should be careful and preserve any values there.
> > > > > Bits 3:2 in I/O BAR might be writeable, so it
> > > > > is wrong to mask them when calculating BAR size.
> > > > > 
> > > > > Spec references:
> > > > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> > > > > 6.1 for reserved bit handling;
> > > > > pages 225, 228 and 214 in PCI spec revision 3.0.
> > > > > 
> > > > > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > > > > 
> > > > > Original-by: Gleb Natapov <gleb@redhat.com>
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > 
> > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > index 7d2ea00..f9ebd61 100644
> > > > > --- a/src/pciinit.c
> > > > > +++ b/src/pciinit.c
> > > > > @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
> > > > >          /* default memory mappings */
> > > > >          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > > >              int ofs;
> > > > > -            u32 val, size;
> > > > > -
> > > > > +            u32 val, mask, size;
> > > > >              if (i == PCI_ROM_SLOT)
> > > > >                  ofs = PCI_ROM_ADDRESS;
> > > > >              else
> > > > >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > > > > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > > > > +
> > > > > +            val = pci_config_readl(bdf, ofs);
> > > > > +            if (i == PCI_ROM_SLOT)
> > > > > +		mask = PCI_ROM_ADDRESS_MASK;
> > > > > +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > > > +                mask = PCI_BASE_ADDRESS_IO_MASK;
> > > > > +            else
> > > > > +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> > > > > +            pci_config_writel(bdf, ofs, val | mask);
> > > > >              val = pci_config_readl(bdf, ofs);
> > > > >              if (val != 0) {
> > > > > -                size = (~(val & ~0xf)) + 1;
> > > > > +                size = (~(val & mask)) + 1;
> > > > >                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > > >                      paddr = &pci_bios_io_addr;
> > > > >                  else if (size >= 0x04000000)
> > > > > -- 
> > > > > 1.6.5.rc2
> > > > > 
> > > > 
> > > > --
> > > > 			Gleb.
> > 
> > --
> > 			Gleb.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] seabios: fix low bits in ROM and I/O sizing
  2009-10-12 13:20                   ` Michael S. Tsirkin
  2009-10-12 13:29                     ` Gleb Natapov
@ 2009-10-12 14:20                     ` Michael S. Tsirkin
  2009-10-13 13:39                       ` [Qemu-devel] " Gleb Natapov
  1 sibling, 1 reply; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 14:20 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

This cleans up handling of low bits during BAR sizing,
to match PCI spec requirements, and to use symbolic
constants from pci_regs.h

Issues fixed:
For ROM BARs, bit 0 is writeable (enable bit), which we not
only don't want to set, but it will stick and make us think
it's an I/O port resource.
Further, PCI spec defines the following bits as reserved:
- bit 1 in I/O BAR
- bits 10:1 in ROM BAR
and we should be careful and preserve any values there,
and should ignore anything we read from these registers.
Bits 3:2 in I/O BAR might be writeable, so it
is wrong to mask them when calculating BAR size.

Spec references:
See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
6.1 for reserved bit handling;
pages 225, 228 and 214 in PCI spec revision 3.0.

See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Original-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

So, here's my latest version, I fixed some TABs versus spaces here.
Gleb, if you want to rewrite it using 0xffffffff, or go back to your
original patch, go ahead. It's not really important, either way.
If you are OK as is, let me know and I will try to get this merged.

diff --git a/src/pciinit.c b/src/pciinit.c
index 7d2ea00..031d2f9 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
         /* default memory mappings */
         for (i = 0; i < PCI_NUM_REGIONS; i++) {
             int ofs;
-            u32 val, size;
-
+            u32 val, mask, size;
             if (i == PCI_ROM_SLOT)
                 ofs = PCI_ROM_ADDRESS;
             else
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+
+            val = pci_config_readl(bdf, ofs);
+            if (i == PCI_ROM_SLOT)
+                mask = PCI_ROM_ADDRESS_MASK;
+            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
+                mask = PCI_BASE_ADDRESS_IO_MASK;
+            else
+                mask = PCI_BASE_ADDRESS_MEM_MASK;
+            pci_config_writel(bdf, ofs, val | mask);
             val = pci_config_readl(bdf, ofs);
             if (val != 0) {
-                size = (~(val & ~0xf)) + 1;
+                size = (~(val & mask)) + 1;
                 if (val & PCI_BASE_ADDRESS_SPACE_IO)
                     paddr = &pci_bios_io_addr;
                 else if (size >= 0x04000000)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR.
  2009-10-12 14:17                             ` Gleb Natapov
@ 2009-10-12 14:24                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 34+ messages in thread
From: Michael S. Tsirkin @ 2009-10-12 14:24 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kevin, qemu-devel

On Mon, Oct 12, 2009 at 04:17:11PM +0200, Gleb Natapov wrote:
> On Mon, Oct 12, 2009 at 04:11:29PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 12, 2009 at 04:04:25PM +0200, Gleb Natapov wrote:
> > > On Mon, Oct 12, 2009 at 03:51:12PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 12, 2009 at 03:29:25PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Oct 12, 2009 at 03:20:27PM +0200, Michael S. Tsirkin wrote:
> > > > > > Right. And there's another bug that I see and that is that size for I/O
> > > > > > BAR was calculated incorrectly: val & ~0xf is wrong for I/O as the size
> > > > > > could be 4 bytes.  So here's a patch that addresses all 3 issues:
> > > > > > Ack?
> > > > > > 
> > > > > Pleas do whatever spec recommends and use 0xffffffff to size the BAR.
> > > > 
> > > > What we do is equivalent: since next thing we override the BAR value,
> > > > there's no reason for the save/restore dance.  But if you like it this
> > > That what spec says if you are not trying to interpret it too hard.
> > > And simple is always better.
> > 
> > Implementation note is not a recommentation, just one way to do things.
> Agree. But I prefer to use a way recommended by spec. It much less
> likely to break.
> 
> > We don't need to restore old BAR so no reason to do this, but
> We need to restore lower bits.

The way I did it was always keeping original values for the lower bits.

> > I won't lose sleep if we do this either. So go ahead.
> > 
> > > 
> > > > way, OK. Send a patch. Note this applies to I/O and memory only,
> > > > ROM has to be sized the way I coded it.
> > > > 
> > > > > Specs says this, linux does this, lets not be different.
> > > > 
> > > > BTW, note that what linux does for ROM is incorrect:
> > > > the implementation note you cite only applies to I/O and memory BARs.
> > > > I'll send a patch to lkml when I have the time.
> > > > 
> > > It does mask = type ? ~PCI_ROM_ADDRESS_ENABLE : ~0; so it definitely
> > > treats ROM specially, You may argue that it shouldn't set bits 1-10
> > > though.
> > 
> > Right.
> > 
> > > The patch that currently in KVM bios (the one I've sent to seabios
> > > initially) uses the same technique to size the BAR as Linux kernel. The only
> > > thing missing is restoring of old value.
> > 
> > That, too. Also let's use symbolic constants, and fix size math:
> > (val & ~0xf) is wrong for both ROM and I/O.
> > 
> 0xf fix should be really in a separate patch.

It could be, but it won't be an independent patch: I'm using
the same mask value all over the code.

> > > > > > 
> > > > > > ---->
> > > > > > 
> > > > > > Subject: [PATCH] seabios: fix low bits in ROM and I/O sizing
> > > > > > 
> > > > > > This cleans up handling of low bits during BAR sizing,
> > > > > > to match PCI spec requirements, and to use symbolic
> > > > > > constants from pci_regs.h
> > > > > > 
> > > > > > Issues fixed:
> > > > > > For ROM BARs, bit 0 is writeable (enable bit), which we not
> > > > > > only don't want to set, but it will stick and make us think
> > > > > > it's an I/O port resource.
> > > > > > Further, PCI spec defines the following bits as reserved:
> > > > > > - bit 1 in I/O BAR
> > > > > > - bits 10:1 in ROM BAR
> > > > > > and we should be careful and preserve any values there.
> > > > > > Bits 3:2 in I/O BAR might be writeable, so it
> > > > > > is wrong to mask them when calculating BAR size.
> > > > > > 
> > > > > > Spec references:
> > > > > > See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
> > > > > > 6.1 for reserved bit handling;
> > > > > > pages 225, 228 and 214 in PCI spec revision 3.0.
> > > > > > 
> > > > > > See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806
> > > > > > 
> > > > > > Original-by: Gleb Natapov <gleb@redhat.com>
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > 
> > > > > > diff --git a/src/pciinit.c b/src/pciinit.c
> > > > > > index 7d2ea00..f9ebd61 100644
> > > > > > --- a/src/pciinit.c
> > > > > > +++ b/src/pciinit.c
> > > > > > @@ -136,16 +136,23 @@ static void pci_bios_init_device(u16 bdf)
> > > > > >          /* default memory mappings */
> > > > > >          for (i = 0; i < PCI_NUM_REGIONS; i++) {
> > > > > >              int ofs;
> > > > > > -            u32 val, size;
> > > > > > -
> > > > > > +            u32 val, mask, size;
> > > > > >              if (i == PCI_ROM_SLOT)
> > > > > >                  ofs = PCI_ROM_ADDRESS;
> > > > > >              else
> > > > > >                  ofs = PCI_BASE_ADDRESS_0 + i * 4;
> > > > > > -            pci_config_writel(bdf, ofs, 0xffffffff);
> > > > > > +
> > > > > > +            val = pci_config_readl(bdf, ofs);
> > > > > > +            if (i == PCI_ROM_SLOT)
> > > > > > +		mask = PCI_ROM_ADDRESS_MASK;
> > > > > > +            else if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > > > > +                mask = PCI_BASE_ADDRESS_IO_MASK;
> > > > > > +            else
> > > > > > +                mask = PCI_BASE_ADDRESS_MEM_MASK;
> > > > > > +            pci_config_writel(bdf, ofs, val | mask);
> > > > > >              val = pci_config_readl(bdf, ofs);
> > > > > >              if (val != 0) {
> > > > > > -                size = (~(val & ~0xf)) + 1;
> > > > > > +                size = (~(val & mask)) + 1;
> > > > > >                  if (val & PCI_BASE_ADDRESS_SPACE_IO)
> > > > > >                      paddr = &pci_bios_io_addr;
> > > > > >                  else if (size >= 0x04000000)
> > > > > > -- 
> > > > > > 1.6.5.rc2
> > > > > > 
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > 
> > > --
> > > 			Gleb.
> 
> --
> 			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 5/5] Set the PCI base address to 0xf0000000.
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000 Gleb Natapov
@ 2009-10-12 14:24   ` Kevin O'Connor
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin O'Connor @ 2009-10-12 14:24 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel

On Sun, Oct 11, 2009 at 08:59:07PM +0200, Gleb Natapov wrote:
> Qemu starts PCI hole at 0xe0000000, but lets set it to the same value as
> pcbios uses for now.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>

I committed patch 1,2,5.  I await consensus on patch 3,4.

-Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: [PATCH 4/5] Make MMIO address page aligned in guest.
  2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
  2009-10-11 21:48   ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-10-12 14:27   ` Kevin O'Connor
  1 sibling, 0 replies; 34+ messages in thread
From: Kevin O'Connor @ 2009-10-12 14:27 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel

On Sun, Oct 11, 2009 at 08:59:06PM +0200, Gleb Natapov wrote:
> MMIO of some devices are not page aligned, such as some EHCI
[...]
> @@ -158,6 +159,12 @@ static void pci_bios_init_device(u16 bdf)
>                  *paddr = ALIGN(*paddr, size);
>                  pci_set_io_region_addr(bdf, i, *paddr);
>                  *paddr += size;
> +                if (kvm_para_available()) {
> +                    /* make memory address page aligned */
> +                    /* needed for device assignment on kvm */
> +                    if (!(val & PCI_BASE_ADDRESS_SPACE_IO))
> +                        *paddr = (*paddr + 0xfff) & 0xfffff000;
> +               }
>              }

I don't see an issue with doing this.  However, I can't see why it
would be just done for kvm - why not do it for all hosts?

Also, please use the ALIGN() macro - something like:
      *paddr = ALIGN(*paddr, PAGE_SIZE);

-Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] Re: seabios: fix low bits in ROM and I/O sizing
  2009-10-12 14:20                     ` [Qemu-devel] seabios: fix low bits in ROM and I/O sizing Michael S. Tsirkin
@ 2009-10-13 13:39                       ` Gleb Natapov
  2009-10-14 23:29                         ` Kevin O'Connor
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2009-10-13 13:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kevin, qemu-devel

This cleans up handling of low bits during BAR sizing,
to match PCI spec requirements, and to use symbolic
constants from pci_regs.h

Issues fixed:
For ROM BARs, bit 0 is writeable (enable bit), which we not
only don't want to set, but it will stick and make us think
it's an I/O port resource.
Further, PCI spec defines the following bits as reserved:
- bit 1 in I/O BAR
- bits 10:1 in ROM BAR
and we should be careful and preserve any values there,
and should ignore anything we read from these registers.
Bits 3:2 in I/O BAR might be writeable, so it
is wrong to mask them when calculating BAR size.

Spec references:
See 6.2.5.1 for I/O and memory, and 6.2.5.2 for ROM,
6.1 for reserved bit handling;
pages 225, 228 and 214 in PCI spec revision 3.0.

See also Qemu pcbios commit 6ddb9f5c742b2b82b1755d7ec2a127f6e20e3806

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---

Write 0xffffffff to non rom bar to check its size and restore old
value afterwards just like PCI spec recommends.

diff --git a/src/pciinit.c b/src/pciinit.c
index 71177bc..39c8f0f 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -126,16 +126,28 @@ static void pci_bios_init_device(u16 bdf)
         /* default memory mappings */
         for (i = 0; i < PCI_NUM_REGIONS; i++) {
             int ofs;
-            u32 val, size;
-
+            u32 old, val, mask, size;
             if (i == PCI_ROM_SLOT)
                 ofs = PCI_ROM_ADDRESS;
             else
                 ofs = PCI_BASE_ADDRESS_0 + i * 4;
-            pci_config_writel(bdf, ofs, 0xffffffff);
+
+            old = pci_config_readl(bdf, ofs);
+            if (i == PCI_ROM_SLOT) {
+                mask = PCI_ROM_ADDRESS_MASK;
+                pci_config_writel(bdf, ofs, mask);
+            } else {
+                if (val & PCI_BASE_ADDRESS_SPACE_IO)
+                    mask = PCI_BASE_ADDRESS_IO_MASK;
+                else
+                    mask = PCI_BASE_ADDRESS_MEM_MASK;
+                pci_config_writel(bdf, ofs, ~0);
+            }
             val = pci_config_readl(bdf, ofs);
+            pci_config_writel(bdf, ofs, old);
+
             if (val != 0) {
-                size = (~(val & ~0xf)) + 1;
+                size = (~(val & mask)) + 1;
                 if (val & PCI_BASE_ADDRESS_SPACE_IO)
                     paddr = &pci_bios_io_addr;
                 else if (size >= 0x04000000)
--
			Gleb.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] Re: seabios: fix low bits in ROM and I/O sizing
  2009-10-13 13:39                       ` [Qemu-devel] " Gleb Natapov
@ 2009-10-14 23:29                         ` Kevin O'Connor
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin O'Connor @ 2009-10-14 23:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel, Michael S. Tsirkin

On Tue, Oct 13, 2009 at 03:39:53PM +0200, Gleb Natapov wrote:
> This cleans up handling of low bits during BAR sizing,
> to match PCI spec requirements, and to use symbolic
> constants from pci_regs.h

Thanks - commit 2f442fd566d683253cf23560867a34243c185550

-Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2009-10-14 23:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-11 18:59 [Qemu-devel] [PATCH 1/5] Generate mptable unconditionally Gleb Natapov
2009-10-11 18:59 ` [Qemu-devel] [PATCH 2/5] Enable power button event generation Gleb Natapov
2009-10-11 18:59 ` [Qemu-devel] [PATCH 3/5] Use the correct mask to size the PCI option ROM BAR Gleb Natapov
2009-10-11 21:53   ` [Qemu-devel] " Michael S. Tsirkin
2009-10-12  6:50     ` Gleb Natapov
2009-10-12  9:52       ` Michael S. Tsirkin
2009-10-12 10:08         ` Gleb Natapov
2009-10-12 11:03           ` Michael S. Tsirkin
2009-10-12 11:45             ` Michael S. Tsirkin
2009-10-12 11:48             ` Gleb Natapov
2009-10-12 11:59               ` Michael S. Tsirkin
2009-10-12 12:08                 ` Gleb Natapov
2009-10-12 13:20                   ` Michael S. Tsirkin
2009-10-12 13:29                     ` Gleb Natapov
2009-10-12 13:51                       ` Michael S. Tsirkin
2009-10-12 14:04                         ` Gleb Natapov
2009-10-12 14:11                           ` Michael S. Tsirkin
2009-10-12 14:17                             ` Gleb Natapov
2009-10-12 14:24                               ` Michael S. Tsirkin
2009-10-12 14:20                     ` [Qemu-devel] seabios: fix low bits in ROM and I/O sizing Michael S. Tsirkin
2009-10-13 13:39                       ` [Qemu-devel] " Gleb Natapov
2009-10-14 23:29                         ` Kevin O'Connor
2009-10-11 18:59 ` [Qemu-devel] [PATCH 4/5] Make MMIO address page aligned in guest Gleb Natapov
2009-10-11 21:48   ` [Qemu-devel] " Michael S. Tsirkin
2009-10-12  6:44     ` Gleb Natapov
2009-10-12  7:10       ` Michael S. Tsirkin
2009-10-12  7:22         ` Gleb Natapov
2009-10-12  8:13           ` Michael S. Tsirkin
2009-10-12  8:48             ` Gleb Natapov
2009-10-12  9:43               ` Michael S. Tsirkin
2009-10-12 10:06                 ` Gleb Natapov
2009-10-12 14:27   ` Kevin O'Connor
2009-10-11 18:59 ` [Qemu-devel] [PATCH 5/5] Set the PCI base address to 0xf0000000 Gleb Natapov
2009-10-12 14:24   ` [Qemu-devel] " Kevin O'Connor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).