qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: Pedro Principeza <pedro.principeza@canonical.com>,
	Dann Frazier <dann.frazier@canonical.com>,
	Guilherme Piccoli <gpiccoli@canonical.com>,
	qemu-devel@nongnu.org,
	Christian Ehrhardt <christian.ehrhardt@canonical.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Laszlo Ersek <lersek@redhat.com>,
	fw@gpiccoli.net
Subject: Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture
Date: Wed, 17 Jun 2020 17:04:12 +0100	[thread overview]
Message-ID: <20200617160412.GG2776@work-vm> (raw)
In-Reply-To: <20200617154959.GZ2366737@habkost.net>

* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Wed, Jun 17, 2020 at 02:46:52PM +0100, Dr. David Alan Gilbert wrote:
> > * Laszlo Ersek (lersek@redhat.com) wrote:
> > > On 06/16/20 19:14, Guilherme Piccoli wrote:
> > > > Thanks Gerd, Dave and Eduardo for the prompt responses!
> > > > 
> > > > So, I understand that when we use "-host-physical-bits", we are
> > > > passing the *real* number for the guest, correct? So, in this case we
> > > > can trust that the guest physbits matches the true host physbits.
> > > > 
> > > > What if then we have OVMF relying in the physbits *iff*
> > > > "-host-phys-bits" is used (which is the default in RH and a possible
> > > > machine configuration on libvirt XML in Ubuntu), and we have OVMF
> > > > fallbacks to 36-bit otherwise?
> > > 
> > > I've now read the commit message on QEMU commit 258fe08bd341d, and the
> > > complexity is simply stunning.
> > > 
> > > Right now, OVMF calculates the guest physical address space size from
> > > various range sizes (such as hotplug memory area end, default or
> > > user-configured PCI64 MMIO aperture), and derives the minimum suitable
> > > guest-phys address width from that address space size. This width is
> > > then exposed to the rest of the firmware with the CPU HOB (hand-off
> > > block), which in turn controls how the GCD (global coherency domain)
> > > memory space map is sized. Etc.
> > > 
> > > If QEMU can provide a *reliable* GPA width, in some info channel (CPUID
> > > or even fw_cfg), then the above calculation could be reversed in OVMF.
> > > We could take the width as a given (-> produce the CPU HOB directly),
> > > plus calculate the *remaining* address space between the GPA space size
> > > given by the width, and the end of the memory hotplug area end. If the
> > > "remaining size" were negative, then obviously QEMU would have been
> > > misconfigured, so we'd halt the boot. Otherwise, the remaining area
> > > could be used as PCI64 MMIO aperture (PEI memory footprint of DXE page
> > > tables be darned).
> > > 
> > > > Now, regarding the problem "to trust or not" in the guests' physbits,
> > > > I think it's an orthogonal discussion to some extent. It'd be nice to
> > > > have that check, and as Eduardo said, prevent migration in such cases.
> > > > But it's not really preventing OVMF big PCI64 aperture if we only
> > > > increase the aperture _when  "-host-physical-bits" is used_.
> > > 
> > > I don't know what exactly those flags do, but I doubt they are clearly
> > > visible to OVMF in any particular way.
> > 
> > The firmware should trust whatever it reads from the cpuid and thus gets
> > told from qemu; if qemu is doing the wrong thing there then that's our
> > problem and we need to fix it in qemu.
> 
> It is impossible to provide a MAXPHYADDR that the guest can trust
> unconditionally and allow live migration to hosts with different
> sizes at the same time.

It would be nice to get to a point where we could say that the reported
size is no bigger than the physical hardware.
The gotcha here is that (upstream) qemu is still reporting 40 by default
when even modern Intel desktop chips are 39.

> Unless we want to drop support live migration to hosts with
> different sizes entirely, we need additional bits to tell the
> guest how much it can trust MAXPHYADDR.

Could we go with host-phys-bits=true by default, that at least means the
normal behaviour is correct; if people want to migrate between different
hosts with different sizes they should set phys-bits (or
host-phys-limit) to the lowest in their set of hardware.

Dave
> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  parent reply	other threads:[~2020-06-17 16:06 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 15:16 ovmf / PCI passthrough impaired due to very limiting PCI64 aperture Guilherme G. Piccoli
2020-06-16 16:50 ` Gerd Hoffmann
2020-06-16 16:57   ` Dr. David Alan Gilbert
2020-06-16 17:10     ` Eduardo Habkost
2020-06-17  8:17       ` Christophe de Dinechin
2020-06-17 16:25         ` Eduardo Habkost
2020-06-17  8:50       ` Daniel P. Berrangé
2020-06-17 10:28         ` Dr. David Alan Gilbert
2020-06-17 14:11         ` Eduardo Habkost
2020-06-16 17:10     ` Gerd Hoffmann
2020-06-16 17:16       ` Dr. David Alan Gilbert
2020-06-16 17:14     ` Guilherme Piccoli
2020-06-17  6:40       ` Gerd Hoffmann
2020-06-17 13:25         ` Laszlo Ersek
2020-06-17 13:26         ` Laszlo Ersek
2020-06-17 13:22       ` Laszlo Ersek
2020-06-17 13:43         ` Guilherme Piccoli
2020-06-17 15:57           ` Laszlo Ersek
2020-06-17 16:01             ` Guilherme Piccoli
2020-06-18  7:56               ` Laszlo Ersek
2020-06-17 13:46         ` Dr. David Alan Gilbert
2020-06-17 15:49           ` Eduardo Habkost
2020-06-17 15:57             ` Guilherme Piccoli
2020-06-17 16:33               ` Eduardo Habkost
2020-06-17 16:40                 ` Guilherme Piccoli
2020-06-18  8:00                 ` Laszlo Ersek
2020-06-17 16:04             ` Dr. David Alan Gilbert [this message]
2020-06-17 16:17               ` Daniel P. Berrangé
2020-06-17 16:22                 ` Eduardo Habkost
2020-06-17 16:41                   ` Dr. David Alan Gilbert
2020-06-17 17:17                     ` Daniel P. Berrangé
2020-06-17 17:23                       ` Dr. David Alan Gilbert
2020-06-17 16:28               ` Eduardo Habkost
2020-06-19 16:13               ` Dr. David Alan Gilbert
2020-06-17 16:14           ` Laszlo Ersek
2020-06-17 16:43             ` Laszlo Ersek
2020-06-17 17:02               ` Eduardo Habkost
2020-06-18  8:29                 ` Laszlo Ersek
2020-06-17  8:16   ` Christophe de Dinechin
2020-06-17 10:12     ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617160412.GG2776@work-vm \
    --to=dgilbert@redhat.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=dann.frazier@canonical.com \
    --cc=ehabkost@redhat.com \
    --cc=fw@gpiccoli.net \
    --cc=gpiccoli@canonical.com \
    --cc=kraxel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pedro.principeza@canonical.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).