* [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present @ 2012-01-25 5:46 Alexey Korolev 2012-01-25 12:51 ` Michael S. Tsirkin ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Alexey Korolev @ 2012-01-25 5:46 UTC (permalink / raw) To: qemu-devel@nongnu.org, Michael S. Tsirkin, Kevin O'Connor; +Cc: sfd Hi, In this post http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've mentioned about the issues when 64Bit PCI BAR is present and 32bit address range is selected for it. The issue affects all recent qemu releases and all old and recent guest Linux kernel versions. We've done some investigations. Let me explain what happens. Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - 0xF2000000] When Linux guest starts it does PCI bus enumeration. The OS enumerates 64BIT bars using the following procedure. 1. Write all FF's to lower half of 64bit BAR 2. Write address back to lower half of 64bit BAR 3. Write all FF's to higher half of 64bit BAR 4. Write address back to higher half of 64bit BAR Linux code is here: http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 What does it mean for qemu? At step 1. qemu pci_default_write_config() recevies all FFs for lower part of the 64bit BAR. Then it applies the mask and converts the value to "All FF's - size + 1" (FE000000 if size is 32MB). Then pci_bar_address() checks if BAR address is valid. Since it is a 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu updates topology and sends request to update mappings in KVM with new range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF range, which is quite common. The following patch fixes the issue. It affects 64bit PCI BAR's only. The idea of the patch is: we introduce the states for low and high BARs whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY - someone has requested size of one half of the 64bit PCI BAR, PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the address of one half of the 64bit PCI BAR. The state becomes BAR_VALID when both halfs are in the same state. We ignore BAR value until both states become BAR_VALID Note: Please use the latest Seabios version (commit 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions didn't initialize high part of 64bit BAR. The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server Signed-off-by: Alexey Korolev <alexey.korolev@endace.com> --- hw/pci.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ hw/pci.h | 7 +++++++ 2 files changed, 52 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 57ec104..3a7deb2 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1055,6 +1055,40 @@ static pcibus_t pci_bar_address(PCIDevice *d, return new_addr; } +static void pci_update_region_state(PCIDevice *d, uint32_t addr, uint32_t val) +{ + PCIIORegion *r; + int barnum = (addr - PCI_BASE_ADDRESS_0) >> 2; + PCIBARState *state; + + r = &d->io_regions[barnum]; + + if (d->io_regions[barnum].type & PCI_BASE_ADDRESS_MEM_TYPE_64) { + /* Programming low part of the 64bit BAR */ + r = &d->io_regions[barnum]; + state = &r->state_lo; + } else if (barnum > 0 && + (d->io_regions[barnum - 1].type & PCI_BASE_ADDRESS_MEM_TYPE_64)) { + /* Programming high part of the 64bit BAR */ + r = &d->io_regions[barnum - 1]; + state = &r->state_hi; + } else { + /* Not a 64bit BAR's */ + d->io_regions[barnum].state_lo = PCIBAR_VALID; + return; + } + + /* Request to read BAR size */ + if (val == -1U) + *state = PCIBAR64_PARTIAL_SIZE_QUERY; + else + *state = PCIBAR64_PARTIAL_ADDR_PROGRAM; + + + if (r->state_lo == r->state_hi) + r->state_lo = r->state_hi = PCIBAR_VALID; +} + static void pci_update_mappings(PCIDevice *d) { PCIIORegion *r; @@ -1068,6 +1102,13 @@ static void pci_update_mappings(PCIDevice *d) if (!r->size) continue; + /* this region state is invalid */ + if (r->state_lo != PCIBAR_VALID) + continue; + if ((r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) && + (r->state_hi != PCIBAR_VALID)) + continue; + new_addr = pci_bar_address(d, i, r->type, r->size); /* This bar isn't changed */ @@ -1117,6 +1158,7 @@ uint32_t pci_default_read_config(PCIDevice *d, void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { int i, was_irq_disabled = pci_irq_disabled(d); + uint32_t orig_val = val; for (i = 0; i < l; val >>= 8, ++i) { uint8_t wmask = d->wmask[addr + i]; @@ -1133,6 +1175,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) assigned_dev_update_irqs(); #endif /* CONFIG_KVM_DEVICE_ASSIGNMENT */ + if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24)) + pci_update_region_state(d, addr, orig_val); + if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || diff --git a/hw/pci.h b/hw/pci.h index 4220151..5d1e529 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -86,12 +86,19 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev, typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num, pcibus_t addr, pcibus_t size, int type); typedef int PCIUnregisterFunc(PCIDevice *pci_dev); +typedef enum PCIBARState { + PCIBAR_VALID = 0, + PCIBAR64_PARTIAL_SIZE_QUERY, + PCIBAR64_PARTIAL_ADDR_PROGRAM +} PCIBARState; typedef struct PCIIORegion { pcibus_t addr; /* current PCI mapping address. -1 means not mapped */ #define PCI_BAR_UNMAPPED (~(pcibus_t)0) pcibus_t size; uint8_t type; + PCIBARState state_lo; + PCIBARState state_hi; MemoryRegion *memory; MemoryRegion *address_space; } PCIIORegion; -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev @ 2012-01-25 12:51 ` Michael S. Tsirkin 2012-01-26 3:20 ` Alexey Korolev 2012-01-25 15:38 ` Michael S. Tsirkin 2012-01-26 9:14 ` Michael S. Tsirkin 2 siblings, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2012-01-25 12:51 UTC (permalink / raw) To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > Hi, > In this post > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > mentioned about the issues when 64Bit PCI BAR is present and 32bit > address range is selected for it. > The issue affects all recent qemu releases and all > old and recent guest Linux kernel versions. > > We've done some investigations. Let me explain what happens. > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > 0xF2000000] > > When Linux guest starts it does PCI bus enumeration. > The OS enumerates 64BIT bars using the following procedure. > 1. Write all FF's to lower half of 64bit BAR > 2. Write address back to lower half of 64bit BAR > 3. Write all FF's to higher half of 64bit BAR > 4. Write address back to higher half of 64bit BAR > > Linux code is here: > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > What does it mean for qemu? > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > part of the 64bit BAR. Then it applies the mask and converts the value > to "All FF's - size + 1" (FE000000 if size is 32MB). > Then pci_bar_address() checks if BAR address is valid. Since it is a > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > updates topology and sends request to update mappings in KVM with new > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > range, which is quite common. > > > The following patch fixes the issue. It affects 64bit PCI BAR's only. > The idea of the patch is: we introduce the states for low and high BARs > whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY > - someone has requested size of one half of the 64bit PCI BAR, > PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the > address of one half of the 64bit PCI BAR. The state becomes BAR_VALID > when both halfs are in the same state. We ignore BAR value until both > states become BAR_VALID > > Note: Please use the latest Seabios version (commit > 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions > didn't initialize high part of 64bit BAR. > > The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server > > Signed-off-by: Alexey Korolev <alexey.korolev@endace.com> Interesting. However, looking at guest code, I note that memory and io are disabled during BAR sizing unless mmio always on is set. pci_bar_address should return PCI_BAR_UNMAPPED in this case, and we should never map this BAR until it's enabled. What's going on? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 12:51 ` Michael S. Tsirkin @ 2012-01-26 3:20 ` Alexey Korolev 0 siblings, 0 replies; 21+ messages in thread From: Alexey Korolev @ 2012-01-26 3:20 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org On 26/01/12 01:51, Michael S. Tsirkin wrote: > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: >> Hi, >> In this post >> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've >> mentioned about the issues when 64Bit PCI BAR is present and 32bit >> address range is selected for it. >> The issue affects all recent qemu releases and all >> old and recent guest Linux kernel versions. >> >> We've done some investigations. Let me explain what happens. >> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - >> 0xF2000000] >> >> When Linux guest starts it does PCI bus enumeration. >> The OS enumerates 64BIT bars using the following procedure. >> 1. Write all FF's to lower half of 64bit BAR >> 2. Write address back to lower half of 64bit BAR >> 3. Write all FF's to higher half of 64bit BAR >> 4. Write address back to higher half of 64bit BAR >> >> Linux code is here: >> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 >> >> What does it mean for qemu? >> >> At step 1. qemu pci_default_write_config() recevies all FFs for lower >> part of the 64bit BAR. Then it applies the mask and converts the value >> to "All FF's - size + 1" (FE000000 if size is 32MB). >> Then pci_bar_address() checks if BAR address is valid. Since it is a >> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu >> updates topology and sends request to update mappings in KVM with new >> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel >> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF >> range, which is quite common. >> >> >> The following patch fixes the issue. It affects 64bit PCI BAR's only. >> The idea of the patch is: we introduce the states for low and high BARs >> whose can have 3 possible values: BAR_VALID, PCIBAR64_PARTIAL_SIZE_QUERY >> - someone has requested size of one half of the 64bit PCI BAR, >> PCIBAR64_PARTIAL_ADDR_PROGRAM - someone has sent a request to update the >> address of one half of the 64bit PCI BAR. The state becomes BAR_VALID >> when both halfs are in the same state. We ignore BAR value until both >> states become BAR_VALID >> >> Note: Please use the latest Seabios version (commit >> 139d5ac037de828f89c36e39c6dd15610650cede and later), as older versions >> didn't initialize high part of 64bit BAR. >> >> The patch is tested on Linux 2.6.18 - 3.1.0 and Windows 2008 Server >> >> Signed-off-by: Alexey Korolev <alexey.korolev@endace.com> > Interesting. However, looking at guest code, > I note that memory and io are disabled > during BAR sizing unless mmio always on is set. > pci_bar_address should return PCI_BAR_UNMAPPED > in this case, and we should never map this BAR > until it's enabled. What's going on? > > Oh. Good point. You are right here. Linux developers have added a protection starting 2.6.36 for lower part of PCI BAR. So this issue affects all guest kernels before 2.6.36. Sorry about confusion. The code without protection is here: http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 To solve this issue for older kernel versions the submitted patch is still relevant. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev 2012-01-25 12:51 ` Michael S. Tsirkin @ 2012-01-25 15:38 ` Michael S. Tsirkin 2012-01-25 18:59 ` Alex Williamson 2012-01-26 9:14 ` Michael S. Tsirkin 2 siblings, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2012-01-25 15:38 UTC (permalink / raw) To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > Hi, > In this post > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > mentioned about the issues when 64Bit PCI BAR is present and 32bit > address range is selected for it. > The issue affects all recent qemu releases and all > old and recent guest Linux kernel versions. > For testing, I applied the following patch to qemu, converting msix bar to 64 bit. Guest did not seem to crash. I booted Fedora Live CD 32 bit guest on a 32 bit host to level 3 without crash, and verified that the BAR is a 64 bit one, and that I got assigned an address at fe000000. command line I used: qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci -cdrom Fedora-15-i686-Live-LXDE.iso At boot prompt type tab and add '3' to kernel command line to have guest boot into a fast text console instead of a graphical one which is very slow. diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 2ac87ea..5271394 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) memory_region_init(&proxy->msix_bar, "virtio-msix", 4096); if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, &proxy->msix_bar, 1, 0)) { - pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, + pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_TYPE_64, &proxy->msix_bar); } else vdev->nvectors = 0; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 15:38 ` Michael S. Tsirkin @ 2012-01-25 18:59 ` Alex Williamson 2012-01-26 3:19 ` Alexey Korolev 0 siblings, 1 reply; 21+ messages in thread From: Alex Williamson @ 2012-01-25 18:59 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel@nongnu.org On Wed, 2012-01-25 at 17:38 +0200, Michael S. Tsirkin wrote: > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > Hi, > > In this post > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > mentioned about the issues when 64Bit PCI BAR is present and 32bit > > address range is selected for it. > > The issue affects all recent qemu releases and all > > old and recent guest Linux kernel versions. > > > > For testing, I applied the following patch to qemu, > converting msix bar to 64 bit. > Guest did not seem to crash. > I booted Fedora Live CD 32 bit guest on a 32 bit host > to level 3 without crash, and verified that > the BAR is a 64 bit one, and that I got assigned an address > at fe000000. > command line I used: > qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive > file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe > -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci > -cdrom Fedora-15-i686-Live-LXDE.iso > > At boot prompt type tab and add '3' to kernel command line > to have guest boot into a fast text console instead > of a graphical one which is very slow. > > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c > index 2ac87ea..5271394 100644 > --- a/hw/virtio-pci.c > +++ b/hw/virtio-pci.c > @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) > memory_region_init(&proxy->msix_bar, "virtio-msix", 4096); > if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, > &proxy->msix_bar, 1, 0)) { > - pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, > + pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY | > + PCI_BASE_ADDRESS_MEM_TYPE_64, > &proxy->msix_bar); > } else > vdev->nvectors = 0; > I was also able to add MEM64 BARs to device assignment pretty trivially and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it to an fexxxxxx address and it works. Alex ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 18:59 ` Alex Williamson @ 2012-01-26 3:19 ` Alexey Korolev 2012-01-26 13:51 ` Avi Kivity 0 siblings, 1 reply; 21+ messages in thread From: Alexey Korolev @ 2012-01-26 3:19 UTC (permalink / raw) To: Alex Williamson Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin Hi Alex and Michael >> For testing, I applied the following patch to qemu, >> converting msix bar to 64 bit. >> Guest did not seem to crash. >> I booted Fedora Live CD 32 bit guest on a 32 bit host >> to level 3 without crash, and verified that >> the BAR is a 64 bit one, and that I got assigned an address >> at fe000000. >> command line I used: >> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive >> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe >> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci >> -cdrom Fedora-15-i686-Live-LXDE.iso >> >> At boot prompt type tab and add '3' to kernel command line >> to have guest boot into a fast text console instead >> of a graphical one which is very slow. >> >> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c >> index 2ac87ea..5271394 100644 >> --- a/hw/virtio-pci.c >> +++ b/hw/virtio-pci.c >> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) >> memory_region_init(&proxy->msix_bar, "virtio-msix", 4096); >> if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, >> &proxy->msix_bar, 1, 0)) { >> - pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, >> + pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY | >> + PCI_BASE_ADDRESS_MEM_TYPE_64, >> &proxy->msix_bar); >> } else >> vdev->nvectors = 0; >> > I was also able to add MEM64 BARs to device assignment pretty trivially > and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it > to an fexxxxxx address and it works. > > Alex > I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 2.6.18 guest for example. The msix case is not failing because: 1. Buffer size is just 4KB - it will reprogram range from 0xFFFFE000-0xFFFFFFFF (it doesn't overlap critical resources to cause immediate panic) 2. The memory_region_init -function doesn't create backing user memory region. So kvm does nothing about remapping in this case. If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm" --- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 1aa9e3b..71f8c21 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { memory_region_add_subregion(&s->bar, 0, &s->ivshmem); /* region for shared memory */ - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) } static void close_guest_eventfds(IVShmemState *s, int posn) --- You can get the following bootup log: Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0) Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f400 (usable) BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000007fffd000 (usable) BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved) BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved) BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) DMI 2.4 present. No NUMA configuration found Faking a node at 0000000000000000-000000007fffd000 Bootmem setup node 0 0000000000000000-000000007fffd000 ACPI: PM-Timer IO Port: 0xb008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 6:2 APIC version 17 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) Setting APIC routing to physical flat ACPI: HPET id: 0x8086a201 base: 0xfed00000 Using ACPI (MADT) for SMP configuration information Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000) SMP: Allowing 1 CPUs, 0 hotplug CPUs Built 1 zonelists. Total pages: 515393 Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2500.081 MHz processor. Console: colour VGA+ 80x25 Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Checking aperture... Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init) Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155) Mount-cache hash table entries: 256 CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 4096K MCE: warning: using only 10 banks SMP alternatives: switching to UP code Freeing SMP alternatives: 36k freed ACPI: Core revision 20060707 activating NMI Watchdog ... done. Using local APIC timer interrupts. result 62501506 Detected 62.501 MHz APIC timer. Brought up 1 CPUs testing NMI watchdog ... OK. migration_cost=0 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI quirk: region b000-b03f claimed by PIIX4 ACPI PCI quirk: region b100-b10f claimed by PIIX4 SMB ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled. SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report divide error: 0000 [1] SMP CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18 #3 RIP: 0010:[<ffffffff80388299>] [<ffffffff80388299>] hpet_alloc+0x12a/0x30c RSP: 0000:ffff81007e3a1e20 EFLAGS: 00010246 RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510 R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50 R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) Stack: 0000000000000000 ffffffff80847470 0000000000000000 0000000000000000 0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000 0000000300010001 0000000800000002 0000000000000000 0000000000000000 Call Trace: [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2 [<ffffffff8020717f>] init+0x139/0x2fe [<ffffffff8020a5b4>] child_rip+0xa/0x12 DWARF2 unwinder stuck at child_rip+0xa/0x12 Leftover inexact backtrace: [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82 [<ffffffff80207046>] init+0x0/0x2fe [<ffffffff8020a5aa>] child_rip+0x0/0x12 Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48 RIP [<ffffffff80388299>] hpet_alloc+0x12a/0x30c RSP <ffff81007e3a1e20> <0>Kernel panic - not syncing: Attempted to kill init! NMI Watchdog detected LOCKUP on CPU 0 CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.18 #3 RIP: 0010:[<ffffffff8033fa93>] [<ffffffff8033fa93>] __delay+0x6/0x10 RSP: 0000:ffff81007e3a1b50 EFLAGS: 00000293 RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28 RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4 R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000 R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) Stack: ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78 0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510 0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000 Call Trace: [<ffffffff80230a09>] panic+0x12c/0x12f [<ffffffff802338c5>] do_exit+0x85/0x87b [<ffffffff8020b0df>] kernel_math_error+0x0/0x90 Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98 console shuts up ... <0>Kernel panic - not syncing: Attempted to kill init! Please look at HPET lines. HPET is mapped to 0xfed00000. Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff. It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic. Thanks, Alexey ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 3:19 ` Alexey Korolev @ 2012-01-26 13:51 ` Avi Kivity 2012-01-26 14:05 ` Michael S. Tsirkin 0 siblings, 1 reply; 21+ messages in thread From: Avi Kivity @ 2012-01-26 13:51 UTC (permalink / raw) To: Alexey Korolev Cc: sfd, Alex Williamson, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 01/26/2012 05:19 AM, Alexey Korolev wrote: > If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm" > --- > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > index 1aa9e3b..71f8c21 100644 > --- a/hw/ivshmem.c > +++ b/hw/ivshmem.c > @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { > memory_region_add_subregion(&s->bar, 0, &s->ivshmem); > > /* region for shared memory */ > - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); > + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) > } > > static void close_guest_eventfds(IVShmemState *s, int posn) > --- > > You can get the following bootup log: > > > Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0) > Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012 > BIOS-provided physical RAM map: > BIOS-e820: 0000000000000000 - 000000000009f400 (usable) > BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) > BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) > BIOS-e820: 0000000000100000 - 000000007fffd000 (usable) > BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved) > BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved) > BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) > DMI 2.4 present. > No NUMA configuration found > Faking a node at 0000000000000000-000000007fffd000 > Bootmem setup node 0 0000000000000000-000000007fffd000 > ACPI: PM-Timer IO Port: 0xb008 > ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) > Processor #0 6:2 APIC version 17 > ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) > IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) > ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) > Setting APIC routing to physical flat > ACPI: HPET id: 0x8086a201 base: 0xfed00000 > Using ACPI (MADT) for SMP configuration information > Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000) > SMP: Allowing 1 CPUs, 0 hotplug CPUs > Built 1 zonelists. Total pages: 515393 > Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0 > Initializing CPU#0 > PID hash table entries: 4096 (order: 12, 32768 bytes) > time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer. > time.c: Detected 2500.081 MHz processor. > Console: colour VGA+ 80x25 > Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > Checking aperture... > Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init) > Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155) > Mount-cache hash table entries: 256 > CPU: L1 I cache: 32K, L1 D cache: 32K > CPU: L2 cache: 4096K > MCE: warning: using only 10 banks > SMP alternatives: switching to UP code > Freeing SMP alternatives: 36k freed > ACPI: Core revision 20060707 > activating NMI Watchdog ... done. > Using local APIC timer interrupts. > result 62501506 > Detected 62.501 MHz APIC timer. > Brought up 1 CPUs > testing NMI watchdog ... OK. > migration_cost=0 > NET: Registered protocol family 16 > ACPI: bus type pci registered > PCI: Using configuration type 1 > ACPI: Interpreter enabled > ACPI: Using IOAPIC for interrupt routing > ACPI: PCI Root Bridge [PCI0] (0000:00) > ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 > PCI quirk: region b000-b03f claimed by PIIX4 ACPI > PCI quirk: region b100-b10f claimed by PIIX4 SMB > ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) > ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) > ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) > ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) > ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled. > SCSI subsystem initialized > usbcore: registered new driver usbfs > usbcore: registered new driver hub > PCI: Using ACPI for IRQ routing > PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report > divide error: 0000 [1] SMP > CPU 0 > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.18 #3 > RIP: 0010:[<ffffffff80388299>] [<ffffffff80388299>] hpet_alloc+0x12a/0x30c > RSP: 0000:ffff81007e3a1e20 EFLAGS: 00010246 > RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b > RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510 > R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50 > R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 > Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) > Stack: 0000000000000000 ffffffff80847470 0000000000000000 0000000000000000 > 0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000 > 0000000300010001 0000000800000002 0000000000000000 0000000000000000 > Call Trace: > [<ffffffff8081e187>] late_hpet_init+0xa7/0xb2 > [<ffffffff8020717f>] init+0x139/0x2fe > [<ffffffff8020a5b4>] child_rip+0xa/0x12 > DWARF2 unwinder stuck at child_rip+0xa/0x12 > Leftover inexact backtrace: > [<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82 > [<ffffffff80207046>] init+0x0/0x2fe > [<ffffffff8020a5aa>] child_rip+0x0/0x12 > > > Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48 > RIP [<ffffffff80388299>] hpet_alloc+0x12a/0x30c > RSP <ffff81007e3a1e20> > <0>Kernel panic - not syncing: Attempted to kill init! > NMI Watchdog detected LOCKUP on CPU 0 > CPU 0 > Modules linked in: > Pid: 1, comm: swapper Not tainted 2.6.18 #3 > RIP: 0010:[<ffffffff8033fa93>] [<ffffffff8033fa93>] __delay+0x6/0x10 > RSP: 0000:ffff81007e3a1b50 EFLAGS: 00000293 > RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a > RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28 > RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4 > R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000 > R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 > Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510) > Stack: ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78 > 0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510 > 0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000 > Call Trace: > [<ffffffff80230a09>] panic+0x12c/0x12f > [<ffffffff802338c5>] do_exit+0x85/0x87b > [<ffffffff8020b0df>] kernel_math_error+0x0/0x90 > > Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98 > console shuts up ... > <0>Kernel panic - not syncing: Attempted to kill init! > > > Please look at HPET lines. HPET is mapped to 0xfed00000. > Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff. > It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic. > Let me see if I get this right: during BAR sizing, the guest sets the BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET. If so, that's expected behaviour. If the guest doesn't want this memory there, it should disable mmio. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 13:51 ` Avi Kivity @ 2012-01-26 14:05 ` Michael S. Tsirkin 2012-01-26 14:33 ` Avi Kivity 0 siblings, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2012-01-26 14:05 UTC (permalink / raw) To: Avi Kivity Cc: Alexey Korolev, sfd, Alex Williamson, Kevin O'Connor, qemu-devel@nongnu.org On Thu, Jan 26, 2012 at 03:51:06PM +0200, Avi Kivity wrote: > > Please look at HPET lines. HPET is mapped to 0xfed00000. > > Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff. > > It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic. > > > > Let me see if I get this right: during BAR sizing, the guest sets the > BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET. If so, > that's expected behaviour. Yes BAR sizing temporarily sets the BAR to an invalid value then restores it. What I don't understand is how come something accesses the HPET range in between. > If the guest doesn't want this memory there, > it should disable mmio. Recent kernels do this for most devices, but not for platform devices. > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 14:05 ` Michael S. Tsirkin @ 2012-01-26 14:33 ` Avi Kivity 0 siblings, 0 replies; 21+ messages in thread From: Avi Kivity @ 2012-01-26 14:33 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alexey Korolev, sfd, Alex Williamson, Kevin O'Connor, qemu-devel@nongnu.org On 01/26/2012 04:05 PM, Michael S. Tsirkin wrote: > > > > Let me see if I get this right: during BAR sizing, the guest sets the > > BAR to ~1, which means 4GB-32MB -> 4GB, which overlaps the HPET. If so, > > that's expected behaviour. > > Yes BAR sizing temporarily sets the BAR to an invalid value then > restores it. What I don't understand is how come something accesses the > HPET range in between. Interrupt -> read time. > > If the guest doesn't want this memory there, > > it should disable mmio. > > Recent kernels do this for most devices, but not for > platform devices. Then they are vulnerable to this issue. The i440fx spec states that the entire top-of-memory range to 4GB if forwarded to PCI, so qemu appears to be correct here. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-25 5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev 2012-01-25 12:51 ` Michael S. Tsirkin 2012-01-25 15:38 ` Michael S. Tsirkin @ 2012-01-26 9:14 ` Michael S. Tsirkin 2012-01-26 13:52 ` Avi Kivity 2 siblings, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2012-01-26 9:14 UTC (permalink / raw) To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, avi On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > Hi, > In this post > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > mentioned about the issues when 64Bit PCI BAR is present and 32bit > address range is selected for it. > The issue affects all recent qemu releases and all > old and recent guest Linux kernel versions. > > We've done some investigations. Let me explain what happens. > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > 0xF2000000] > > When Linux guest starts it does PCI bus enumeration. > The OS enumerates 64BIT bars using the following procedure. > 1. Write all FF's to lower half of 64bit BAR > 2. Write address back to lower half of 64bit BAR > 3. Write all FF's to higher half of 64bit BAR > 4. Write address back to higher half of 64bit BAR > > Linux code is here: > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > What does it mean for qemu? > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > part of the 64bit BAR. Then it applies the mask and converts the value > to "All FF's - size + 1" (FE000000 if size is 32MB). > Then pci_bar_address() checks if BAR address is valid. Since it is a > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > updates topology and sends request to update mappings in KVM with new > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > range, which is quite common. Do you know why does it panic? As far as I can see from code at http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 171 pci_read_config_dword(dev, pos, &l); 172 pci_write_config_dword(dev, pos, l | mask); 173 pci_read_config_dword(dev, pos, &sz); 174 pci_write_config_dword(dev, pos, l); BAR is restored: what triggers an access between lines 172 and 174? Also, what you describe happens on a 32 bit BAR in the same way, no? -- MST ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 9:14 ` Michael S. Tsirkin @ 2012-01-26 13:52 ` Avi Kivity 2012-01-26 14:36 ` Michael S. Tsirkin 0 siblings, 1 reply; 21+ messages in thread From: Avi Kivity @ 2012-01-26 13:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel@nongnu.org On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > Hi, > > In this post > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > mentioned about the issues when 64Bit PCI BAR is present and 32bit > > address range is selected for it. > > The issue affects all recent qemu releases and all > > old and recent guest Linux kernel versions. > > > > We've done some investigations. Let me explain what happens. > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > > 0xF2000000] > > > > When Linux guest starts it does PCI bus enumeration. > > The OS enumerates 64BIT bars using the following procedure. > > 1. Write all FF's to lower half of 64bit BAR > > 2. Write address back to lower half of 64bit BAR > > 3. Write all FF's to higher half of 64bit BAR > > 4. Write address back to higher half of 64bit BAR > > > > Linux code is here: > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > > > What does it mean for qemu? > > > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > > part of the 64bit BAR. Then it applies the mask and converts the value > > to "All FF's - size + 1" (FE000000 if size is 32MB). > > Then pci_bar_address() checks if BAR address is valid. Since it is a > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > > updates topology and sends request to update mappings in KVM with new > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > > range, which is quite common. > > Do you know why does it panic? As far as I can see > from code at > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > > 171 pci_read_config_dword(dev, pos, &l); > 172 pci_write_config_dword(dev, pos, l | mask); > 173 pci_read_config_dword(dev, pos, &sz); > 174 pci_write_config_dword(dev, pos, l); > > BAR is restored: what triggers an access between lines 172 and 174? Random interrupt reading the time, likely. > Also, what you describe happens on a 32 bit BAR in the same way, no? So it seems. Btw, is this procedure correct for sizing a BAR which is larger than 4GB? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 13:52 ` Avi Kivity @ 2012-01-26 14:36 ` Michael S. Tsirkin 2012-01-26 15:12 ` Avi Kivity 2012-01-27 4:40 ` Alexey Korolev 0 siblings, 2 replies; 21+ messages in thread From: Michael S. Tsirkin @ 2012-01-26 14:36 UTC (permalink / raw) To: Avi Kivity; +Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel@nongnu.org On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > > Hi, > > > In this post > > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > > mentioned about the issues when 64Bit PCI BAR is present and 32bit > > > address range is selected for it. > > > The issue affects all recent qemu releases and all > > > old and recent guest Linux kernel versions. > > > > > > We've done some investigations. Let me explain what happens. > > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > > > 0xF2000000] > > > > > > When Linux guest starts it does PCI bus enumeration. > > > The OS enumerates 64BIT bars using the following procedure. > > > 1. Write all FF's to lower half of 64bit BAR > > > 2. Write address back to lower half of 64bit BAR > > > 3. Write all FF's to higher half of 64bit BAR > > > 4. Write address back to higher half of 64bit BAR > > > > > > Linux code is here: > > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > > > > > What does it mean for qemu? > > > > > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > > > part of the 64bit BAR. Then it applies the mask and converts the value > > > to "All FF's - size + 1" (FE000000 if size is 32MB). > > > Then pci_bar_address() checks if BAR address is valid. Since it is a > > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > > > updates topology and sends request to update mappings in KVM with new > > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > > > range, which is quite common. > > > > Do you know why does it panic? As far as I can see > > from code at > > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > > > > 171 pci_read_config_dword(dev, pos, &l); > > 172 pci_write_config_dword(dev, pos, l | mask); > > 173 pci_read_config_dword(dev, pos, &sz); > > 174 pci_write_config_dword(dev, pos, l); > > > > BAR is restored: what triggers an access between lines 172 and 174? > > Random interrupt reading the time, likely. Weird, what the backtrace shows is init, unrelated to interrupts. > > Also, what you describe happens on a 32 bit BAR in the same way, no? > > So it seems. Btw, is this procedure correct for sizing a BAR which is > larger than 4GB? There's more code sizing 64 bit BARs, but generally software is allowed to write any junk into enabled BARs as long as there aren't any memory accesses. > -- > error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 14:36 ` Michael S. Tsirkin @ 2012-01-26 15:12 ` Avi Kivity 2012-01-27 4:42 ` Alexey Korolev 2012-01-27 4:40 ` Alexey Korolev 1 sibling, 1 reply; 21+ messages in thread From: Avi Kivity @ 2012-01-26 15:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alexey Korolev, sfd, Kevin O'Connor, qemu-devel@nongnu.org On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > > On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > > > On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > > > Hi, > > > > In this post > > > > http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > > > mentioned about the issues when 64Bit PCI BAR is present and 32bit > > > > address range is selected for it. > > > > The issue affects all recent qemu releases and all > > > > old and recent guest Linux kernel versions. > > > > > > > > We've done some investigations. Let me explain what happens. > > > > Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > > > > 0xF2000000] > > > > > > > > When Linux guest starts it does PCI bus enumeration. > > > > The OS enumerates 64BIT bars using the following procedure. > > > > 1. Write all FF's to lower half of 64bit BAR > > > > 2. Write address back to lower half of 64bit BAR > > > > 3. Write all FF's to higher half of 64bit BAR > > > > 4. Write address back to higher half of 64bit BAR > > > > > > > > Linux code is here: > > > > http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > > > > > > > What does it mean for qemu? > > > > > > > > At step 1. qemu pci_default_write_config() recevies all FFs for lower > > > > part of the 64bit BAR. Then it applies the mask and converts the value > > > > to "All FF's - size + 1" (FE000000 if size is 32MB). > > > > Then pci_bar_address() checks if BAR address is valid. Since it is a > > > > 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > > > > updates topology and sends request to update mappings in KVM with new > > > > range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > > > > panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > > > > range, which is quite common. > > > > > > Do you know why does it panic? As far as I can see > > > from code at > > > http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > > > > > > 171 pci_read_config_dword(dev, pos, &l); > > > 172 pci_write_config_dword(dev, pos, l | mask); > > > 173 pci_read_config_dword(dev, pos, &sz); > > > 174 pci_write_config_dword(dev, pos, l); > > > > > > BAR is restored: what triggers an access between lines 172 and 174? > > > > Random interrupt reading the time, likely. > > Weird, what the backtrace shows is init, unrelated > to interrupts. > It's a bug then. qemu doesn't undo the mapping correctly. If you have clear instructions, I'll try to reproduce it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 15:12 ` Avi Kivity @ 2012-01-27 4:42 ` Alexey Korolev 2012-01-31 9:40 ` Avi Kivity 2012-01-31 10:51 ` Avi Kivity 0 siblings, 2 replies; 21+ messages in thread From: Alexey Korolev @ 2012-01-27 4:42 UTC (permalink / raw) To: Avi Kivity Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 27/01/12 04:12, Avi Kivity wrote: > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: >>>>> Hi, >>>>> In this post >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit >>>>> address range is selected for it. >>>>> The issue affects all recent qemu releases and all >>>>> old and recent guest Linux kernel versions. >>>>> >>>>> We've done some investigations. Let me explain what happens. >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - >>>>> 0xF2000000] >>>>> >>>>> When Linux guest starts it does PCI bus enumeration. >>>>> The OS enumerates 64BIT bars using the following procedure. >>>>> 1. Write all FF's to lower half of 64bit BAR >>>>> 2. Write address back to lower half of 64bit BAR >>>>> 3. Write all FF's to higher half of 64bit BAR >>>>> 4. Write address back to higher half of 64bit BAR >>>>> >>>>> Linux code is here: >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 >>>>> >>>>> What does it mean for qemu? >>>>> >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower >>>>> part of the 64bit BAR. Then it applies the mask and converts the value >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu >>>>> updates topology and sends request to update mappings in KVM with new >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF >>>>> range, which is quite common. >>>> Do you know why does it panic? As far as I can see >>>> from code at >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 >>>> >>>> 171 pci_read_config_dword(dev, pos, &l); >>>> 172 pci_write_config_dword(dev, pos, l | mask); >>>> 173 pci_read_config_dword(dev, pos, &sz); >>>> 174 pci_write_config_dword(dev, pos, l); >>>> >>>> BAR is restored: what triggers an access between lines 172 and 174? >>> Random interrupt reading the time, likely. >> Weird, what the backtrace shows is init, unrelated >> to interrupts. >> > It's a bug then. qemu doesn't undo the mapping correctly. > > If you have clear instructions, I'll try to reproduce it. > Well the easiest way to reproduce this is: 1. Get kernel bzImage (version < 2.6.36) 2. Apply patch to ivshmem.c --- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 1aa9e3b..71f8c21 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { memory_region_add_subregion(&s->bar, 0, &s->ivshmem); /* region for shared memory */ - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) } static void close_guest_eventfds(IVShmemState *s, int posn) --- 3. Launch qemu with a command like that /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" in other words add: --device ivshmem,size=32,shm="shm" That is all. Note: it won't necessary cause panic message on some kernels it just hangs or reboots. ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-27 4:42 ` Alexey Korolev @ 2012-01-31 9:40 ` Avi Kivity 2012-01-31 9:43 ` Avi Kivity 2012-01-31 10:51 ` Avi Kivity 1 sibling, 1 reply; 21+ messages in thread From: Avi Kivity @ 2012-01-31 9:40 UTC (permalink / raw) To: Alexey Korolev Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 01/27/2012 06:42 AM, Alexey Korolev wrote: > On 27/01/12 04:12, Avi Kivity wrote: > > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > >>>>> Hi, > >>>>> In this post > >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit > >>>>> address range is selected for it. > >>>>> The issue affects all recent qemu releases and all > >>>>> old and recent guest Linux kernel versions. > >>>>> > >>>>> We've done some investigations. Let me explain what happens. > >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > >>>>> 0xF2000000] > >>>>> > >>>>> When Linux guest starts it does PCI bus enumeration. > >>>>> The OS enumerates 64BIT bars using the following procedure. > >>>>> 1. Write all FF's to lower half of 64bit BAR > >>>>> 2. Write address back to lower half of 64bit BAR > >>>>> 3. Write all FF's to higher half of 64bit BAR > >>>>> 4. Write address back to higher half of 64bit BAR > >>>>> > >>>>> Linux code is here: > >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > >>>>> > >>>>> What does it mean for qemu? > >>>>> > >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower > >>>>> part of the 64bit BAR. Then it applies the mask and converts the value > >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). > >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a > >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > >>>>> updates topology and sends request to update mappings in KVM with new > >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > >>>>> range, which is quite common. > >>>> Do you know why does it panic? As far as I can see > >>>> from code at > >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > >>>> > >>>> 171 pci_read_config_dword(dev, pos, &l); > >>>> 172 pci_write_config_dword(dev, pos, l | mask); > >>>> 173 pci_read_config_dword(dev, pos, &sz); > >>>> 174 pci_write_config_dword(dev, pos, l); > >>>> > >>>> BAR is restored: what triggers an access between lines 172 and 174? > >>> Random interrupt reading the time, likely. > >> Weird, what the backtrace shows is init, unrelated > >> to interrupts. > >> > > It's a bug then. qemu doesn't undo the mapping correctly. > > > > If you have clear instructions, I'll try to reproduce it. > > > Well the easiest way to reproduce this is: > > > 1. Get kernel bzImage (version < 2.6.36) > 2. Apply patch to ivshmem.c > > --- > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > index 1aa9e3b..71f8c21 100644 > --- a/hw/ivshmem.c > +++ b/hw/ivshmem.c > @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { > memory_region_add_subregion(&s->bar, 0, &s->ivshmem); > > /* region for shared memory */ > - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); > + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) > } > > static void close_guest_eventfds(IVShmemState *s, int posn) > --- > > 3. Launch qemu with a command like that > > /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid > d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc > base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device > ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive > file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device > ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device > isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device > virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append > "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" > > in other words add: --device ivshmem,size=32,shm="shm" > > That is all. > > Note: it won't necessary cause panic message on some kernels it just hangs or reboots. > In fact qemu segfaults for me, since registering a ram region not on a page boundary is broken. This happens when the ivshmem bar is split by the hpet region, which is less than page long. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-31 9:40 ` Avi Kivity @ 2012-01-31 9:43 ` Avi Kivity 2012-02-01 5:44 ` Alexey Korolev 0 siblings, 1 reply; 21+ messages in thread From: Avi Kivity @ 2012-01-31 9:43 UTC (permalink / raw) To: Alexey Korolev Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 01/31/2012 11:40 AM, Avi Kivity wrote: > On 01/27/2012 06:42 AM, Alexey Korolev wrote: > > On 27/01/12 04:12, Avi Kivity wrote: > > > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > > >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > > >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > > >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > > >>>>> Hi, > > >>>>> In this post > > >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > > >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit > > >>>>> address range is selected for it. > > >>>>> The issue affects all recent qemu releases and all > > >>>>> old and recent guest Linux kernel versions. > > >>>>> > > >>>>> We've done some investigations. Let me explain what happens. > > >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > > >>>>> 0xF2000000] > > >>>>> > > >>>>> When Linux guest starts it does PCI bus enumeration. > > >>>>> The OS enumerates 64BIT bars using the following procedure. > > >>>>> 1. Write all FF's to lower half of 64bit BAR > > >>>>> 2. Write address back to lower half of 64bit BAR > > >>>>> 3. Write all FF's to higher half of 64bit BAR > > >>>>> 4. Write address back to higher half of 64bit BAR > > >>>>> > > >>>>> Linux code is here: > > >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > > >>>>> > > >>>>> What does it mean for qemu? > > >>>>> > > >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower > > >>>>> part of the 64bit BAR. Then it applies the mask and converts the value > > >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). > > >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a > > >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > > >>>>> updates topology and sends request to update mappings in KVM with new > > >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > > >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > > >>>>> range, which is quite common. > > >>>> Do you know why does it panic? As far as I can see > > >>>> from code at > > >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > > >>>> > > >>>> 171 pci_read_config_dword(dev, pos, &l); > > >>>> 172 pci_write_config_dword(dev, pos, l | mask); > > >>>> 173 pci_read_config_dword(dev, pos, &sz); > > >>>> 174 pci_write_config_dword(dev, pos, l); > > >>>> > > >>>> BAR is restored: what triggers an access between lines 172 and 174? > > >>> Random interrupt reading the time, likely. > > >> Weird, what the backtrace shows is init, unrelated > > >> to interrupts. > > >> > > > It's a bug then. qemu doesn't undo the mapping correctly. > > > > > > If you have clear instructions, I'll try to reproduce it. > > > > > Well the easiest way to reproduce this is: > > > > > > 1. Get kernel bzImage (version < 2.6.36) > > 2. Apply patch to ivshmem.c > > > > --- > > diff --git a/hw/ivshmem.c b/hw/ivshmem.c > > index 1aa9e3b..71f8c21 100644 > > --- a/hw/ivshmem.c > > +++ b/hw/ivshmem.c > > @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { > > memory_region_add_subregion(&s->bar, 0, &s->ivshmem); > > > > /* region for shared memory */ > > - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); > > + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) > > } > > > > static void close_guest_eventfds(IVShmemState *s, int posn) > > --- > > > > 3. Launch qemu with a command like that > > > > /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid > > d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev > > socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc > > base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device > > ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive > > file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device > > ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device > > isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device > > virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append > > "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" > > > > in other words add: --device ivshmem,size=32,shm="shm" > > > > That is all. > > > > Note: it won't necessary cause panic message on some kernels it just hangs or reboots. > > > > In fact qemu segfaults for me, since registering a ram region not on a > page boundary is broken. This happens when the ivshmem bar is split by > the hpet region, which is less than page long. > Happens only with qemu-kvm for some reason. Two separate bugs. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-31 9:43 ` Avi Kivity @ 2012-02-01 5:44 ` Alexey Korolev 2012-02-01 7:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 21+ messages in thread From: Alexey Korolev @ 2012-02-01 5:44 UTC (permalink / raw) To: Avi Kivity Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 31/01/12 22:43, Avi Kivity wrote: > On 01/31/2012 11:40 AM, Avi Kivity wrote: >> On 01/27/2012 06:42 AM, Alexey Korolev wrote: >>> On 27/01/12 04:12, Avi Kivity wrote: >>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: >>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: >>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: >>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: >>>>>>>> Hi, >>>>>>>> In this post >>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've >>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit >>>>>>>> address range is selected for it. >>>>>>>> The issue affects all recent qemu releases and all >>>>>>>> old and recent guest Linux kernel versions. >>>>>>>> >>>>>>>> We've done some investigations. Let me explain what happens. >>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - >>>>>>>> 0xF2000000] >>>>>>>> >>>>>>>> When Linux guest starts it does PCI bus enumeration. >>>>>>>> The OS enumerates 64BIT bars using the following procedure. >>>>>>>> 1. Write all FF's to lower half of 64bit BAR >>>>>>>> 2. Write address back to lower half of 64bit BAR >>>>>>>> 3. Write all FF's to higher half of 64bit BAR >>>>>>>> 4. Write address back to higher half of 64bit BAR >>>>>>>> >>>>>>>> Linux code is here: >>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 >>>>>>>> >>>>>>>> What does it mean for qemu? >>>>>>>> >>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower >>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value >>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). >>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a >>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu >>>>>>>> updates topology and sends request to update mappings in KVM with new >>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel >>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF >>>>>>>> range, which is quite common. >>>>>>> Do you know why does it panic? As far as I can see >>>>>>> from code at >>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 >>>>>>> >>>>>>> 171 pci_read_config_dword(dev, pos, &l); >>>>>>> 172 pci_write_config_dword(dev, pos, l | mask); >>>>>>> 173 pci_read_config_dword(dev, pos, &sz); >>>>>>> 174 pci_write_config_dword(dev, pos, l); >>>>>>> >>>>>>> BAR is restored: what triggers an access between lines 172 and 174? >>>>>> Random interrupt reading the time, likely. >>>>> Weird, what the backtrace shows is init, unrelated >>>>> to interrupts. >>>>> >>>> It's a bug then. qemu doesn't undo the mapping correctly. >>>> >>>> If you have clear instructions, I'll try to reproduce it. >>>> >>> Well the easiest way to reproduce this is: >>> >>> >>> 1. Get kernel bzImage (version < 2.6.36) >>> 2. Apply patch to ivshmem.c >>> >>> --- >>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c >>> index 1aa9e3b..71f8c21 100644 >>> --- a/hw/ivshmem.c >>> +++ b/hw/ivshmem.c >>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { >>> memory_region_add_subregion(&s->bar, 0, &s->ivshmem); >>> >>> /* region for shared memory */ >>> - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); >>> + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) >>> } >>> >>> static void close_guest_eventfds(IVShmemState *s, int posn) >>> --- >>> >>> 3. Launch qemu with a command like that >>> >>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid >>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc >>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device >>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive >>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device >>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device >>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device >>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append >>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" >>> >>> in other words add: --device ivshmem,size=32,shm="shm" >>> >>> That is all. >>> >>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots. >>> >> In fact qemu segfaults for me, since registering a ram region not on a >> page boundary is broken. This happens when the ivshmem bar is split by >> the hpet region, which is less than page long. >> > Happens only with qemu-kvm for some reason. Two separate bugs. > Well it's quite possible that there are two separate problems. 1. Page boundary related 2. Another is related to invalid mapping, when we request region size on 64bit BAR. The patch sent previously addresses this sizing behaviour, and so avoids the mapping error. Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR. This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels. Will your core rewrite address the invalid mapping issue? Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-02-01 5:44 ` Alexey Korolev @ 2012-02-01 7:04 ` Michael S. Tsirkin 2012-02-02 2:22 ` Alexey Korolev 0 siblings, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2012-02-01 7:04 UTC (permalink / raw) To: Alexey Korolev; +Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel@nongnu.org On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote: > On 31/01/12 22:43, Avi Kivity wrote: > > On 01/31/2012 11:40 AM, Avi Kivity wrote: > >> On 01/27/2012 06:42 AM, Alexey Korolev wrote: > >>> On 27/01/12 04:12, Avi Kivity wrote: > >>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > >>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > >>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > >>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > >>>>>>>> Hi, > >>>>>>>> In this post > >>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > >>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit > >>>>>>>> address range is selected for it. > >>>>>>>> The issue affects all recent qemu releases and all > >>>>>>>> old and recent guest Linux kernel versions. > >>>>>>>> > >>>>>>>> We've done some investigations. Let me explain what happens. > >>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > >>>>>>>> 0xF2000000] > >>>>>>>> > >>>>>>>> When Linux guest starts it does PCI bus enumeration. > >>>>>>>> The OS enumerates 64BIT bars using the following procedure. > >>>>>>>> 1. Write all FF's to lower half of 64bit BAR > >>>>>>>> 2. Write address back to lower half of 64bit BAR > >>>>>>>> 3. Write all FF's to higher half of 64bit BAR > >>>>>>>> 4. Write address back to higher half of 64bit BAR > >>>>>>>> > >>>>>>>> Linux code is here: > >>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > >>>>>>>> > >>>>>>>> What does it mean for qemu? > >>>>>>>> > >>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower > >>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value > >>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). > >>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a > >>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > >>>>>>>> updates topology and sends request to update mappings in KVM with new > >>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > >>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > >>>>>>>> range, which is quite common. > >>>>>>> Do you know why does it panic? As far as I can see > >>>>>>> from code at > >>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > >>>>>>> > >>>>>>> 171 pci_read_config_dword(dev, pos, &l); > >>>>>>> 172 pci_write_config_dword(dev, pos, l | mask); > >>>>>>> 173 pci_read_config_dword(dev, pos, &sz); > >>>>>>> 174 pci_write_config_dword(dev, pos, l); > >>>>>>> > >>>>>>> BAR is restored: what triggers an access between lines 172 and 174? > >>>>>> Random interrupt reading the time, likely. > >>>>> Weird, what the backtrace shows is init, unrelated > >>>>> to interrupts. > >>>>> > >>>> It's a bug then. qemu doesn't undo the mapping correctly. > >>>> > >>>> If you have clear instructions, I'll try to reproduce it. > >>>> > >>> Well the easiest way to reproduce this is: > >>> > >>> > >>> 1. Get kernel bzImage (version < 2.6.36) > >>> 2. Apply patch to ivshmem.c > >>> > >>> --- > >>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c > >>> index 1aa9e3b..71f8c21 100644 > >>> --- a/hw/ivshmem.c > >>> +++ b/hw/ivshmem.c > >>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { > >>> memory_region_add_subregion(&s->bar, 0, &s->ivshmem); > >>> > >>> /* region for shared memory */ > >>> - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); > >>> + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) > >>> } > >>> > >>> static void close_guest_eventfds(IVShmemState *s, int posn) > >>> --- > >>> > >>> 3. Launch qemu with a command like that > >>> > >>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid > >>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev > >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc > >>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device > >>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive > >>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device > >>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device > >>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device > >>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append > >>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" > >>> > >>> in other words add: --device ivshmem,size=32,shm="shm" > >>> > >>> That is all. > >>> > >>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots. > >>> > >> In fact qemu segfaults for me, since registering a ram region not on a > >> page boundary is broken. This happens when the ivshmem bar is split by > >> the hpet region, which is less than page long. > >> > > Happens only with qemu-kvm for some reason. Two separate bugs. > > > Well it's quite possible that there are two separate problems. > > 1. Page boundary related > 2. Another is related to invalid mapping, when we request region size on 64bit BAR. > The patch sent previously addresses this sizing behaviour, and so > avoids the mapping error. The patch catches what the specific guest is doing but it's a hack. It's completely OK to write random values into BARs as long as the claimed range is not accessed. > Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR. > > This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels. > > Will your core rewrite address the invalid mapping issue? > > Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release. -- MST ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-02-01 7:04 ` Michael S. Tsirkin @ 2012-02-02 2:22 ` Alexey Korolev 0 siblings, 0 replies; 21+ messages in thread From: Alexey Korolev @ 2012-02-02 2:22 UTC (permalink / raw) To: Michael S. Tsirkin Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel@nongnu.org On 01/02/12 20:04, Michael S. Tsirkin wrote: > On Wed, Feb 01, 2012 at 06:44:42PM +1300, Alexey Korolev wrote: >> On 31/01/12 22:43, Avi Kivity wrote: >>> On 01/31/2012 11:40 AM, Avi Kivity wrote: >>>> On 01/27/2012 06:42 AM, Alexey Korolev wrote: >>>>> On 27/01/12 04:12, Avi Kivity wrote: >>>>>> On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: >>>>>>> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: >>>>>>>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: >>>>>>>>>> Hi, >>>>>>>>>> In this post >>>>>>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've >>>>>>>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit >>>>>>>>>> address range is selected for it. >>>>>>>>>> The issue affects all recent qemu releases and all >>>>>>>>>> old and recent guest Linux kernel versions. >>>>>>>>>> >>>>>>>>>> We've done some investigations. Let me explain what happens. >>>>>>>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - >>>>>>>>>> 0xF2000000] >>>>>>>>>> >>>>>>>>>> When Linux guest starts it does PCI bus enumeration. >>>>>>>>>> The OS enumerates 64BIT bars using the following procedure. >>>>>>>>>> 1. Write all FF's to lower half of 64bit BAR >>>>>>>>>> 2. Write address back to lower half of 64bit BAR >>>>>>>>>> 3. Write all FF's to higher half of 64bit BAR >>>>>>>>>> 4. Write address back to higher half of 64bit BAR >>>>>>>>>> >>>>>>>>>> Linux code is here: >>>>>>>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 >>>>>>>>>> >>>>>>>>>> What does it mean for qemu? >>>>>>>>>> >>>>>>>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower >>>>>>>>>> part of the 64bit BAR. Then it applies the mask and converts the value >>>>>>>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). >>>>>>>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a >>>>>>>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu >>>>>>>>>> updates topology and sends request to update mappings in KVM with new >>>>>>>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel >>>>>>>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF >>>>>>>>>> range, which is quite common. >>>>>>>>> Do you know why does it panic? As far as I can see >>>>>>>>> from code at >>>>>>>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 >>>>>>>>> >>>>>>>>> 171 pci_read_config_dword(dev, pos, &l); >>>>>>>>> 172 pci_write_config_dword(dev, pos, l | mask); >>>>>>>>> 173 pci_read_config_dword(dev, pos, &sz); >>>>>>>>> 174 pci_write_config_dword(dev, pos, l); >>>>>>>>> >>>>>>>>> BAR is restored: what triggers an access between lines 172 and 174? >>>>>>>> Random interrupt reading the time, likely. >>>>>>> Weird, what the backtrace shows is init, unrelated >>>>>>> to interrupts. >>>>>>> >>>>>> It's a bug then. qemu doesn't undo the mapping correctly. >>>>>> >>>>>> If you have clear instructions, I'll try to reproduce it. >>>>>> >>>>> Well the easiest way to reproduce this is: >>>>> >>>>> >>>>> 1. Get kernel bzImage (version < 2.6.36) >>>>> 2. Apply patch to ivshmem.c >>>>> >>>>> --- >>>>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c >>>>> index 1aa9e3b..71f8c21 100644 >>>>> --- a/hw/ivshmem.c >>>>> +++ b/hw/ivshmem.c >>>>> @@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { >>>>> memory_region_add_subregion(&s->bar, 0, &s->ivshmem); >>>>> >>>>> /* region for shared memory */ >>>>> - pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); >>>>> + pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar) >>>>> } >>>>> >>>>> static void close_guest_eventfds(IVShmemState *s, int posn) >>>>> --- >>>>> >>>>> 3. Launch qemu with a command like that >>>>> >>>>> /usr/bin/qemu-system-x86_64 -M pc-0.14 -enable-kvm -m 2048 -smp 1,socket=1,cores=1,threads=1 -name centos54 -uuid >>>>> d37daefd-75bd-4387-cee1-7f0b153ee2af -nodefconfig -nodefaults -chardev >>>>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos54.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc >>>>> base=utc -drive file=/dev/dock200-1/centos54,if=none,id=drive-ide0-0-0,format=raw -device >>>>> ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive >>>>> file=/data/CentOS-5.4-x86_64-bin-DVD.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device >>>>> ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -chardev file,id=charserial0,path=/home/alexey/cent54.log -device >>>>> isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device >>>>> virtio-balloon-pci,id=balloon0,bus=pci.0,multifunction=on,addr=0x4.0x0 --device ivshmem,size=32,shm="shm" -kernel bzImage -append >>>>> "root=/dev/hda1 console=ttyS0,115200n8 console=tty0" >>>>> >>>>> in other words add: --device ivshmem,size=32,shm="shm" >>>>> >>>>> That is all. >>>>> >>>>> Note: it won't necessary cause panic message on some kernels it just hangs or reboots. >>>>> >>>> In fact qemu segfaults for me, since registering a ram region not on a >>>> page boundary is broken. This happens when the ivshmem bar is split by >>>> the hpet region, which is less than page long. >>>> >>> Happens only with qemu-kvm for some reason. Two separate bugs. >>> >> Well it's quite possible that there are two separate problems. >> >> 1. Page boundary related >> 2. Another is related to invalid mapping, when we request region size on 64bit BAR. >> The patch sent previously addresses this sizing behaviour, and so >> avoids the mapping error. > The patch catches what the specific guest is doing but it's a hack. It's > completely OK to write random values into BARs as long as the claimed > range is not accessed. At the moment temporary writing random values into PCI BAR (both 32bit and 64bit) may cause quite bad consequences to VM. Considering that the core will be rewritten anyway, I just wanted to make sure that these problems will be addressed. Ideally I just wanted to have new core before release to make sure 64bit BAR support is not causing problems. > Not sure if it is valid to temporary occupy completely wrong memory region when we request size of PCI BAR. > > This issue needs to be addressed to allow 64-bit PCI allocations to work correctly with older Linux guest kernels. > > Will your core rewrite address the invalid mapping issue? > > Is it possible to have an early version of new core so we could check the 64bit BAR issues before the release. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-27 4:42 ` Alexey Korolev 2012-01-31 9:40 ` Avi Kivity @ 2012-01-31 10:51 ` Avi Kivity 1 sibling, 0 replies; 21+ messages in thread From: Avi Kivity @ 2012-01-31 10:51 UTC (permalink / raw) To: Alexey Korolev Cc: sfd, Kevin O'Connor, qemu-devel@nongnu.org, Michael S. Tsirkin On 01/27/2012 06:42 AM, Alexey Korolev wrote: > On 27/01/12 04:12, Avi Kivity wrote: > > On 01/26/2012 04:36 PM, Michael S. Tsirkin wrote: > >> On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: > >>> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: > >>>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: > >>>>> Hi, > >>>>> In this post > >>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've > >>>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit > >>>>> address range is selected for it. > >>>>> The issue affects all recent qemu releases and all > >>>>> old and recent guest Linux kernel versions. > >>>>> > >>>>> We've done some investigations. Let me explain what happens. > >>>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - > >>>>> 0xF2000000] > >>>>> > >>>>> When Linux guest starts it does PCI bus enumeration. > >>>>> The OS enumerates 64BIT bars using the following procedure. > >>>>> 1. Write all FF's to lower half of 64bit BAR > >>>>> 2. Write address back to lower half of 64bit BAR > >>>>> 3. Write all FF's to higher half of 64bit BAR > >>>>> 4. Write address back to higher half of 64bit BAR > >>>>> > >>>>> Linux code is here: > >>>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 > >>>>> > >>>>> What does it mean for qemu? > >>>>> > >>>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower > >>>>> part of the 64bit BAR. Then it applies the mask and converts the value > >>>>> to "All FF's - size + 1" (FE000000 if size is 32MB). > >>>>> Then pci_bar_address() checks if BAR address is valid. Since it is a > >>>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu > >>>>> updates topology and sends request to update mappings in KVM with new > >>>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel > >>>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF > >>>>> range, which is quite common. > >>>> Do you know why does it panic? As far as I can see > >>>> from code at > >>>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 > >>>> > >>>> 171 pci_read_config_dword(dev, pos, &l); > >>>> 172 pci_write_config_dword(dev, pos, l | mask); > >>>> 173 pci_read_config_dword(dev, pos, &sz); > >>>> 174 pci_write_config_dword(dev, pos, l); > >>>> > >>>> BAR is restored: what triggers an access between lines 172 and 174? > >>> Random interrupt reading the time, likely. > >> Weird, what the backtrace shows is init, unrelated > >> to interrupts. > >> > > It's a bug then. qemu doesn't undo the mapping correctly. > > > > If you have clear instructions, I'll try to reproduce it. > > > Well the easiest way to reproduce this is: > > > 1. Get kernel bzImage (version < 2.6.36) > 2. Apply patch to ivshmem.c > > I have some patches that fix this, but they're very hacky since they're dealing with the old and rotten core. I much prefer to let this resolve itself in my continuing rewrite. Is this an urgent problem for you or can you live with this for a while? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present 2012-01-26 14:36 ` Michael S. Tsirkin 2012-01-26 15:12 ` Avi Kivity @ 2012-01-27 4:40 ` Alexey Korolev 1 sibling, 0 replies; 21+ messages in thread From: Alexey Korolev @ 2012-01-27 4:40 UTC (permalink / raw) To: Michael S. Tsirkin Cc: sfd, Kevin O'Connor, Avi Kivity, qemu-devel@nongnu.org On 27/01/12 03:36, Michael S. Tsirkin wrote: > On Thu, Jan 26, 2012 at 03:52:27PM +0200, Avi Kivity wrote: >> On 01/26/2012 11:14 AM, Michael S. Tsirkin wrote: >>> On Wed, Jan 25, 2012 at 06:46:03PM +1300, Alexey Korolev wrote: >>>> Hi, >>>> In this post >>>> http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg03171.html I've >>>> mentioned about the issues when 64Bit PCI BAR is present and 32bit >>>> address range is selected for it. >>>> The issue affects all recent qemu releases and all >>>> old and recent guest Linux kernel versions. >>>> >>>> We've done some investigations. Let me explain what happens. >>>> Assume we have 64bit BAR with size 32MB mapped at [0xF0000000 - >>>> 0xF2000000] >>>> >>>> When Linux guest starts it does PCI bus enumeration. >>>> The OS enumerates 64BIT bars using the following procedure. >>>> 1. Write all FF's to lower half of 64bit BAR >>>> 2. Write address back to lower half of 64bit BAR >>>> 3. Write all FF's to higher half of 64bit BAR >>>> 4. Write address back to higher half of 64bit BAR >>>> >>>> Linux code is here: >>>> http://lxr.linux.no/#linux+v3.2.1/drivers/pci/probe.c#L149 >>>> >>>> What does it mean for qemu? >>>> >>>> At step 1. qemu pci_default_write_config() recevies all FFs for lower >>>> part of the 64bit BAR. Then it applies the mask and converts the value >>>> to "All FF's - size + 1" (FE000000 if size is 32MB). >>>> Then pci_bar_address() checks if BAR address is valid. Since it is a >>>> 64bit bar it reads 0x00000000FE000000 - this address is valid. So qemu >>>> updates topology and sends request to update mappings in KVM with new >>>> range for the 64bit BAR FE000000 - 0xFFFFFFFF. This usually means kernel >>>> panic on boot, if there is another mapping in the FE000000 - 0xFFFFFFFF >>>> range, which is quite common. >>> Do you know why does it panic? As far as I can see >>> from code at >>> http://lxr.linux.no/#linux+v2.6.35.9/drivers/pci/probe.c#L162 >>> >>> 171 pci_read_config_dword(dev, pos, &l); >>> 172 pci_write_config_dword(dev, pos, l | mask); >>> 173 pci_read_config_dword(dev, pos, &sz); >>> 174 pci_write_config_dword(dev, pos, l); >>> >>> BAR is restored: what triggers an access between lines 172 and 174? >> Random interrupt reading the time, likely. > Weird, what the backtrace shows is init, unrelated > to interrupts. Yes, it fails during ordered late_hpet_init() call. Which is a part of kernel fs_initcall list. So no time interrupts are involved here. Basically once the region is programmed (even temporary), area behind it is lost. I mean if we even temporary overlap the HPET region with our BAR, backed by host user space memory, and commit a mapping request to kvm, the information about the old mappings belonging to HPET are lost. Even if we did this for short period of time, and later restore the original address. >>> Also, what you describe happens on a 32 bit BAR in the same way, no? >> So it seems. Btw, is this procedure correct for sizing a BAR which is >> larger than 4GB? > There's more code sizing 64 bit BARs, but generally > software is allowed to write any junk into enabled BARs > as long as there aren't any memory accesses. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-02-02 2:22 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-25 5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev 2012-01-25 12:51 ` Michael S. Tsirkin 2012-01-26 3:20 ` Alexey Korolev 2012-01-25 15:38 ` Michael S. Tsirkin 2012-01-25 18:59 ` Alex Williamson 2012-01-26 3:19 ` Alexey Korolev 2012-01-26 13:51 ` Avi Kivity 2012-01-26 14:05 ` Michael S. Tsirkin 2012-01-26 14:33 ` Avi Kivity 2012-01-26 9:14 ` Michael S. Tsirkin 2012-01-26 13:52 ` Avi Kivity 2012-01-26 14:36 ` Michael S. Tsirkin 2012-01-26 15:12 ` Avi Kivity 2012-01-27 4:42 ` Alexey Korolev 2012-01-31 9:40 ` Avi Kivity 2012-01-31 9:43 ` Avi Kivity 2012-02-01 5:44 ` Alexey Korolev 2012-02-01 7:04 ` Michael S. Tsirkin 2012-02-02 2:22 ` Alexey Korolev 2012-01-31 10:51 ` Avi Kivity 2012-01-27 4:40 ` Alexey Korolev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).