From: Alexey Korolev <alexey.korolev@endace.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: sfd@endace.com, Kevin O'Connor <kevin@koconnor.net>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present
Date: Thu, 26 Jan 2012 16:19:45 +1300 [thread overview]
Message-ID: <4F20C651.2010108@endace.com> (raw)
In-Reply-To: <1327517961.26484.124.camel@bling.home>
Hi Alex and Michael
>> For testing, I applied the following patch to qemu,
>> converting msix bar to 64 bit.
>> Guest did not seem to crash.
>> I booted Fedora Live CD 32 bit guest on a 32 bit host
>> to level 3 without crash, and verified that
>> the BAR is a 64 bit one, and that I got assigned an address
>> at fe000000.
>> command line I used:
>> qemu-system-x86_64 -bios /scm/seabios/out/bios.bin -snapshot -drive
>> file=qemu-images/f15-test.qcow2,if=none,id=diskid,cache=unsafe
>> -device virtio-blk-pci,drive=diskid -net user -net nic,model=ne2k_pci
>> -cdrom Fedora-15-i686-Live-LXDE.iso
>>
>> At boot prompt type tab and add '3' to kernel command line
>> to have guest boot into a fast text console instead
>> of a graphical one which is very slow.
>>
>> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
>> index 2ac87ea..5271394 100644
>> --- a/hw/virtio-pci.c
>> +++ b/hw/virtio-pci.c
>> @@ -711,7 +711,8 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev)
>> memory_region_init(&proxy->msix_bar, "virtio-msix", 4096);
>> if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
>> &proxy->msix_bar, 1, 0)) {
>> - pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
>> + pci_register_bar(&proxy->pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY |
>> + PCI_BASE_ADDRESS_MEM_TYPE_64,
>> &proxy->msix_bar);
>> } else
>> vdev->nvectors = 0;
>>
> I was also able to add MEM64 BARs to device assignment pretty trivially
> and it seems to work, guest sees 64bit BARs for an 82576 VF, programs it
> to an fexxxxxx address and it works.
>
> Alex
>
I'd suggest using ivshmem with buffer size 32MB to reproduce the problem in 2.6.18 guest for example.
The msix case is not failing because:
1. Buffer size is just 4KB - it will reprogram range from 0xFFFFE000-0xFFFFFFFF (it doesn't overlap critical resources to cause immediate panic)
2. The memory_region_init -function doesn't create backing user memory region. So kvm does nothing about remapping in this case.
If you apply the following patch and add to qemu command: --device ivshmem,size=32,shm="shm"
---
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 1aa9e3b..71f8c21 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -341,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) {
memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
/* region for shared memory */
- pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar);
+ pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY|PCI_BASE_ADDRESS_MEM_TYPE_64, &s->bar)
}
static void close_guest_eventfds(IVShmemState *s, int posn)
---
You can get the following bootup log:
Bootdata ok (command line is root=/dev/hda1 console=ttyS0,115200n8 console=tty0)
Linux version 2.6.18 (root@localhost.localdomain) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #3 SMP Tue Jan 17 16:37:33 NZDT 2012
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000007fffd000 (usable)
BIOS-e820: 000000007fffd000 - 0000000080000000 (reserved)
BIOS-e820: 00000000feffc000 - 00000000ff000000 (reserved)
BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
DMI 2.4 present.
No NUMA configuration found
Faking a node at 0000000000000000-000000007fffd000
Bootmem setup node 0 0000000000000000-000000007fffd000
ACPI: PM-Timer IO Port: 0xb008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:2 APIC version 17
ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
Setting APIC routing to physical flat
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 88000000 (gap: 80000000:7effc000)
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists. Total pages: 515393
Kernel command line: root=/dev/hda1 console=ttyS0,115200n8 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
time.c: Using 100.000000 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2500.081 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Checking aperture...
Memory: 2058096k/2097140k available (3256k kernel code, 38656k reserved, 2266k data, 204k init)
Calibrating delay using timer specific routine.. 5030.07 BogoMIPS (lpj=10060155)
Mount-cache hash table entries: 256
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
MCE: warning: using only 10 banks
SMP alternatives: switching to UP code
Freeing SMP alternatives: 36k freed
ACPI: Core revision 20060707
activating NMI Watchdog ... done.
Using local APIC timer interrupts.
result 62501506
Detected 62.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
PCI quirk: region b000-b03f claimed by PIIX4 ACPI
PCI quirk: region b100-b10f claimed by PIIX4 SMB
ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
ACPI: PCI Interrupt Link [LNKS] (IRQs 9) *0, disabled.
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
divide error: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff80388299>] [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
RSP: 0000:ffff81007e3a1e20 EFLAGS: 00010246
RAX: 00038d7ea4c68000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8057fc2b
RBP: ffff81007e2e28c0 R08: ffffffff8055b492 R09: ffff81007e39f510
R10: ffff81007e3a1e50 R11: 0000000000000098 R12: ffff81007e3a1e50
R13: 0000000000000000 R14: ffffffffff5fe000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack: 0000000000000000 ffffffff80847470 0000000000000000 0000000000000000
0000000000000000 ffffffff8081e187 00000000fed00000 ffffffffff5fe000
0000000300010001 0000000800000002 0000000000000000 0000000000000000
Call Trace:
[<ffffffff8081e187>] late_hpet_init+0xa7/0xb2
[<ffffffff8020717f>] init+0x139/0x2fe
[<ffffffff8020a5b4>] child_rip+0xa/0x12
DWARF2 unwinder stuck at child_rip+0xa/0x12
Leftover inexact backtrace:
[<ffffffff803544b6>] acpi_ds_init_one_object+0x0/0x82
[<ffffffff80207046>] init+0x0/0x2fe
[<ffffffff8020a5aa>] child_rip+0x0/0x12
Code: 48 f7 f6 83 7d 30 01 8b 75 34 48 89 45 20 49 8b 4c 24 08 48
RIP [<ffffffff80388299>] hpet_alloc+0x12a/0x30c
RSP <ffff81007e3a1e20>
<0>Kernel panic - not syncing: Attempted to kill init!
NMI Watchdog detected LOCKUP on CPU 0
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.18 #3
RIP: 0010:[<ffffffff8033fa93>] [<ffffffff8033fa93>] __delay+0x6/0x10
RSP: 0000:ffff81007e3a1b50 EFLAGS: 00000293
RAX: 00000000000480f3 RBX: 0000000000000000 RCX: 000000008dea8c6a
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000265e28
RBP: 00000000000009b0 R08: 0000000000000000 R09: ffff8100010503d4
R10: 0000000000000001 R11: ffffffff8034e288 R12: 0000000000000000
R13: 000000000000000b R14: ffffffff8055bc9f R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff807fc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81007e3a0000, task ffff81007e39f510)
Stack: ffffffff80230a09 0000003000000008 ffff81007e3a1c48 ffff81007e3a1b78
0000000000000246 ffffffff8055bc9f 0000000000000246 ffff81007e39f510
0000000000000000 0000000000000000 ffff8100010503d4 0000000000000000
Call Trace:
[<ffffffff80230a09>] panic+0x12c/0x12f
[<ffffffff802338c5>] do_exit+0x85/0x87b
[<ffffffff8020b0df>] kernel_math_error+0x0/0x90
Code: 0f 31 29 c8 48 39 f8 72 f5 c3 65 8b 04 25 2c 00 00 00 48 98
console shuts up ...
<0>Kernel panic - not syncing: Attempted to kill init!
Please look at HPET lines. HPET is mapped to 0xfed00000.
Size of ivshmem is 32MB. During pci enumeration ivshmem will corrupt the range from 0xfe000000 - 0xffffffff.
It overlaps HPET memory. When Linux does late_hpet init, it finds garbage and this is causing panic.
Thanks,
Alexey
next prev parent reply other threads:[~2012-01-26 3:20 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-25 5:46 [Qemu-devel] [RFC/PATCH] Fix guest OS panic when 64bit BAR is present Alexey Korolev
2012-01-25 12:51 ` Michael S. Tsirkin
2012-01-26 3:20 ` Alexey Korolev
2012-01-25 15:38 ` Michael S. Tsirkin
2012-01-25 18:59 ` Alex Williamson
2012-01-26 3:19 ` Alexey Korolev [this message]
2012-01-26 13:51 ` Avi Kivity
2012-01-26 14:05 ` Michael S. Tsirkin
2012-01-26 14:33 ` Avi Kivity
2012-01-26 9:14 ` Michael S. Tsirkin
2012-01-26 13:52 ` Avi Kivity
2012-01-26 14:36 ` Michael S. Tsirkin
2012-01-26 15:12 ` Avi Kivity
2012-01-27 4:42 ` Alexey Korolev
2012-01-31 9:40 ` Avi Kivity
2012-01-31 9:43 ` Avi Kivity
2012-02-01 5:44 ` Alexey Korolev
2012-02-01 7:04 ` Michael S. Tsirkin
2012-02-02 2:22 ` Alexey Korolev
2012-01-31 10:51 ` Avi Kivity
2012-01-27 4:40 ` Alexey Korolev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F20C651.2010108@endace.com \
--to=alexey.korolev@endace.com \
--cc=alex.williamson@redhat.com \
--cc=kevin@koconnor.net \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sfd@endace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.