From: "Michael S. Tsirkin" <mst@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: alex.williamson@redhat.com, edk2-devel@lists.sourceforge.net,
Wei Liu <wei.liu2@citrix.com>, Gerd Hoffmann <kraxel@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [edk2] [edk2 PATCH 0/1] OvmfPkg: grab ACPI tables from QEMU
Date: Wed, 20 Nov 2013 00:03:06 +0200 [thread overview]
Message-ID: <20131119220306.GF15004@redhat.com> (raw)
In-Reply-To: <528BDA7A.9090701@redhat.com>
On Tue, Nov 19, 2013 at 10:39:06PM +0100, Laszlo Ersek wrote:
> On 11/19/13 22:13, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
> >> On 11/19/13 09:54, Gerd Hoffmann wrote:
> >>> Hi,
> >>>
> >>>> ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
> >>>
> >>>> begin32=c0000000 end32=fec00000 begin64=0 end64=0
> >>>
> >>> qemu & seabios pick a 32bit window size which allows to cover it with
> >>> a single mtrr entry. Size must be a power of two for that to work.
> >>>
> >>> [root@fedora ~]# cat /proc/mtrr
> >>> reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> >>>
> >>> So we have three cases for piix (as you've figured in the source code
> >>> already). Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> >>> 0xe0000000 (512M window).
> >>>
> >>> btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> >>> remaining address space (0xb0000000 -> 0xc0000000) is used for
> >>> mmconfig.
> >>>
> >>>> I guess the range starting at 0xc0000000 is somehow "incompatible"
> >>>> with the EFI memory map. (I can't actually explain this idea because,
> >>>> again, this second range is a proper subset of the former, and its
> >>>> size is still 1004MB.)
> >>>
> >>> Probably efi places the gfx memory bar somewhere between 0xa0000000
> >>> and 0xc0000000. Sets up efifb accordingly. Then the linux kernel
> >>> loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> >>> goes remap it -> efifb writes go into nowhere.
> >>
> >> Thank you for the explanation.
> >>
> >> How do I fix it though? :) I could change the MMIO HOBs in PEI
> >> (OvmfPkg/PlatformPei/Platform.c):
> >>
> >> //
> >> // Video memory + Legacy BIOS region
> >> //
> >> AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
> >>
> >> //
> >> // address purpose size
> >> // ------------ -------- -------------------------
> >> // max(top, 2g) PCI MMIO 0xFC000000 - max(top, 2g)
> >> // 0xFC000000 gap 44 MB
> >> // 0xFEC00000 IO-APIC 4 KB
> >> // 0xFEC01000 gap 1020 KB
> >> // 0xFED00000 HPET 1 KB
> >> // 0xFED00400 gap 1023 KB
> >> // 0xFEE00000 LAPIC 1 MB
> >> //
> >> AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
> >> AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
> >> AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
> >> AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> >>
> >> to imitate the same three cases. The HOB with the lowest address
> >> produced here would affect the BAR placement as well.
> >>
> >> ... Yes, I tested the attached patch, and it makes the display work in
> >> both 2560MB guests.
> >>
> >> However, I don't like the idea of hardwiring those boundaries here. (The
> >> current values are also hardwired, but they are better: they are not the
> >> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
> >> dependency, especially when they aren't each other's payloads, is bad
> >> IMHO.) We'd need something dynamic here, like a memory map from qemu.
> >>
> >> ... Which puts us in the same boat with Wei :)
> >>
> >> Thanks
> >> Laszlo
> >
> > So the bug is really qemu being too conservative with it's ACPI
> > tables, isn't it?
>
> Here's what I think happens (and I'll shamelessly steal Gerd's explanation):
>
> (1) OVMF configures its own memory map very early (in PEI). The ranges
> configured are quite static. The only variable that goes into this
> configuration is "top of RAM below 4G", which is capped at 3.5GB by qemu
> (the variable comes from the CMOS):
>
> TopOfMemory <= 2G ---> PCI hole starts at 2G
> TopOfMemory > 2G ---> PCI hole starts at TopOfMemory
> TopOfMemory > 3.5G ---> not possible
>
> The low bound of the 32-bit PCI hole determined thus further affects two
> things (in a later phase (DXE)) of OVMF:
>
> (2) PCI resource allocation -- in practice the video framebuffer gets
> allocated right at the start of the hole,
> (3) the value that OVMF's own ACPI table population code advertises in
> the PCI bridge's _CRS.
>
>
> > (2) PCI resource alloc
> /
> (1) memmap
> \
> > (3) ACPI
> \
> > (4) guest OS access -- match (2)
>
> So, until this patch, everything was in sync. With this patch, the low
> bound exported over ACPI -- in (3) -- got replaced (whole-sale) by
> qemu's ACPI table. Therefore (1) only determined (2), and a conflict
> ensued between (2) and (3). (Which is where I refer back to Gerd's
> explanation -- the guest OS tries to access the framebuffer according to
> (3), the value in the ACPI table, but that's not where the framebuffer
> has actually been configured in (2), which had keyed off (1).)
>
> > (2) PCI resource alloc
> /
> (1) memmap
>
> (X) qemu
> \
> > (3) ACPI
> \
> > (4) guest OS access -- should match (2),
> but doesn't
>
>
> The *real* bug is that OVMF's memory map configuration (1) is not
> *dynamically* synchronized with qemu's ACPI table population logic (X).
> My patch that I attached earlier to this thread fixes this (by simply
> duplicating the same "quantization" logic), but it's ugly:
>
> > (2) PCI resource alloc
> /
> (1) memmap
> ||
> (X) qemu
> \
> > (3) ACPI
> \
> > (4) guest OS access -- matches (2) by way of
> code duplication
> between (1) and (X)
>
> What we really should implement is this:
>
> > (2) PCI resource alloc
> /
> > (1) memmap
> /
> (X) qemu
> \
> > (3) ACPI
> \
> > (4) guest OS access -- matches (2) by (X)
> being the common root
> of everything
>
> That is, OVMF should compose its own memmap in (1) based on dynamic
> information from qemu (importantly, the start address of the 32-bit PCI
> hole that qemu is going to advertise later in the ACPI tables).
>
> > The MTRRs in BIOS don't affect much: AFAIK they are completely ignored
> > if it's exiting to the host, but I'm not sure about assigned
> > devices.
> > Alex, can you remind us all what happens there?
>
> If you simply relax (drop) the stage-wise hole-setting, that will at
> best create a situation where OVMF and qemu duplicate the logic. This
> info needs to be passed down to OVMF (maybe it already is, somewhere in
> fw_cfg!), in some form that's more approachable than an ACPI table. (We
> absolutely don't want to parse ACPI (AML) in OVMF.)
>
> Thanks!
> Laszlo
So the argument of whether we should make qemu control
PCI BAR mapping for guest is a completely separate issue
I think.
Even if we should, I think making ACPI depend on guest
obeying this logic is a mistake.
I think the whole confusion comes from the pci hole terminology.
There's no PCI hole, at least not on PIIX.
On PIIX what's not memory apic etc, is PCI.
So guest can use everything that isn't memory for PCI.
The only issue is _CRS in our ACPI is kind of seabios specific.
So let's just make it generic, make it really describe hardware:
Two possible fixes, both of them in QEMU:
1. declare all available regions (that resolve to PCI) in _CRS
2. detect where did guest put PCI devices and declare the result in _CRS
2 is exactly what we did for 64 bit BARs.
The only issue I worry about is MTRRs.
Maybe we are lucky and they do not matter for KVM.
--
MST
next prev parent reply other threads:[~2013-11-19 22:28 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-12 15:11 [Qemu-devel] [edk2 PATCH 0/1] OvmfPkg: grab ACPI tables from QEMU Laszlo Ersek
2013-11-12 15:11 ` [Qemu-devel] [edk2 PATCH 1/1] OvmfPkg/AcpiPlatformDxe: download " Laszlo Ersek
2013-11-22 18:10 ` Jordan Justen
2013-11-22 18:49 ` Laszlo Ersek
2013-11-22 21:44 ` Jordan Justen
2013-11-12 16:05 ` [Qemu-devel] [edk2 PATCH 0/1] OvmfPkg: grab " Igor Mammedov
2013-11-12 16:24 ` Laszlo Ersek
2013-11-18 23:34 ` [Qemu-devel] [edk2] " Laszlo Ersek
2013-11-19 0:04 ` Igor Mammedov
2013-11-19 0:41 ` Laszlo Ersek
2013-11-19 8:54 ` Gerd Hoffmann
2013-11-19 12:31 ` Laszlo Ersek
2013-11-19 20:54 ` Michael S. Tsirkin
2013-11-20 8:18 ` Gerd Hoffmann
2013-11-19 21:13 ` Michael S. Tsirkin
2013-11-19 21:39 ` Laszlo Ersek
2013-11-19 22:03 ` Michael S. Tsirkin [this message]
2013-11-20 8:36 ` Gerd Hoffmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131119220306.GF15004@redhat.com \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=edk2-devel@lists.sourceforge.net \
--cc=kraxel@redhat.com \
--cc=lersek@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wei.liu2@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.