qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: "Liu Yi L" <yi.l.liu@intel.com>,
	"Pedro Principeza" <pedro.principeza@canonical.com>,
	"Like Xu" <like.xu@linux.intel.com>,
	"Dann Frazier" <dann.frazier@canonical.com>,
	"Guilherme Piccoli" <gpiccoli@canonical.com>,
	qemu-devel@nongnu.org,
	"Christian Ehrhardt" <christian.ehrhardt@canonical.com>,
	"Robert Hoo" <robert.hu@linux.intel.com>,
	"Babu Moger" <babu.moger@amd.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Chenyi Qiang" <chenyi.qiang@intel.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Laszlo Ersek" <lersek@redhat.com>,
	fw@gpiccoli.net
Subject: Re: ovmf / PCI passthrough impaired due to very limiting PCI64 aperture
Date: Wed, 17 Jun 2020 17:41:41 +0100	[thread overview]
Message-ID: <20200617164141.GH2776@work-vm> (raw)
In-Reply-To: <20200617162243.GB2366737@habkost.net>

* Eduardo Habkost (ehabkost@redhat.com) wrote:
> On Wed, Jun 17, 2020 at 05:17:17PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Jun 17, 2020 at 05:04:12PM +0100, Dr. David Alan Gilbert wrote:
> > > * Eduardo Habkost (ehabkost@redhat.com) wrote:
> > > > On Wed, Jun 17, 2020 at 02:46:52PM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Laszlo Ersek (lersek@redhat.com) wrote:
> > > > > > On 06/16/20 19:14, Guilherme Piccoli wrote:
> > > > > > > Thanks Gerd, Dave and Eduardo for the prompt responses!
> > > > > > > 
> > > > > > > So, I understand that when we use "-host-physical-bits", we are
> > > > > > > passing the *real* number for the guest, correct? So, in this case we
> > > > > > > can trust that the guest physbits matches the true host physbits.
> > > > > > > 
> > > > > > > What if then we have OVMF relying in the physbits *iff*
> > > > > > > "-host-phys-bits" is used (which is the default in RH and a possible
> > > > > > > machine configuration on libvirt XML in Ubuntu), and we have OVMF
> > > > > > > fallbacks to 36-bit otherwise?
> > > > > > 
> > > > > > I've now read the commit message on QEMU commit 258fe08bd341d, and the
> > > > > > complexity is simply stunning.
> > > > > > 
> > > > > > Right now, OVMF calculates the guest physical address space size from
> > > > > > various range sizes (such as hotplug memory area end, default or
> > > > > > user-configured PCI64 MMIO aperture), and derives the minimum suitable
> > > > > > guest-phys address width from that address space size. This width is
> > > > > > then exposed to the rest of the firmware with the CPU HOB (hand-off
> > > > > > block), which in turn controls how the GCD (global coherency domain)
> > > > > > memory space map is sized. Etc.
> > > > > > 
> > > > > > If QEMU can provide a *reliable* GPA width, in some info channel (CPUID
> > > > > > or even fw_cfg), then the above calculation could be reversed in OVMF.
> > > > > > We could take the width as a given (-> produce the CPU HOB directly),
> > > > > > plus calculate the *remaining* address space between the GPA space size
> > > > > > given by the width, and the end of the memory hotplug area end. If the
> > > > > > "remaining size" were negative, then obviously QEMU would have been
> > > > > > misconfigured, so we'd halt the boot. Otherwise, the remaining area
> > > > > > could be used as PCI64 MMIO aperture (PEI memory footprint of DXE page
> > > > > > tables be darned).
> > > > > > 
> > > > > > > Now, regarding the problem "to trust or not" in the guests' physbits,
> > > > > > > I think it's an orthogonal discussion to some extent. It'd be nice to
> > > > > > > have that check, and as Eduardo said, prevent migration in such cases.
> > > > > > > But it's not really preventing OVMF big PCI64 aperture if we only
> > > > > > > increase the aperture _when  "-host-physical-bits" is used_.
> > > > > > 
> > > > > > I don't know what exactly those flags do, but I doubt they are clearly
> > > > > > visible to OVMF in any particular way.
> > > > > 
> > > > > The firmware should trust whatever it reads from the cpuid and thus gets
> > > > > told from qemu; if qemu is doing the wrong thing there then that's our
> > > > > problem and we need to fix it in qemu.
> > > > 
> > > > It is impossible to provide a MAXPHYADDR that the guest can trust
> > > > unconditionally and allow live migration to hosts with different
> > > > sizes at the same time.
> > > 
> > > It would be nice to get to a point where we could say that the reported
> > > size is no bigger than the physical hardware.
> > > The gotcha here is that (upstream) qemu is still reporting 40 by default
> > > when even modern Intel desktop chips are 39.
> > > 
> > > > Unless we want to drop support live migration to hosts with
> > > > different sizes entirely, we need additional bits to tell the
> > > > guest how much it can trust MAXPHYADDR.
> > > 
> > > Could we go with host-phys-bits=true by default, that at least means the
> > > normal behaviour is correct; if people want to migrate between different
> > > hosts with different sizes they should set phys-bits (or
> > > host-phys-limit) to the lowest in their set of hardware.
> > 
> > Is there any sense in picking the default value based on -cpu selection ?
> > 
> > If user has asked for -cpu host, there's no downside to host-phys-bits=true,
> > as the user has intentionally traded off live migration portability already.
> 
> Setting host-phys-bits=true when using -cpu host makes a lot of
> sense, and we could start doing that immediately.
> 
> > 
> > If the user askes for -cpu $MODEL, then could we set phys-bits=NNN for some
> > NNN that is the lowest value for CPUs that are capable of running $MODEL ?
> > Or will that get too complicated with the wide range of SKU variants, in
> > particular server vs desktop CPUs.
> 
> This makes sense too.  We need some help from CPU vendors to get
> us this data added to our CPU model table.  I'm CCing some Intel
> and AMD people that could help us.

That bit worries me because I think I agree it's SKU dependent and has
been for a long time (on Intel at least) and we don't even have CPU
models for all Intel devices. (My laptop for example is a Kaby Lake, 39
bits physical).  Maybe it works on the more modern ones where we have
'Icelake-Client' and 'Icelake-Server'.

Dave

> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-06-17 16:44 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16 15:16 ovmf / PCI passthrough impaired due to very limiting PCI64 aperture Guilherme G. Piccoli
2020-06-16 16:50 ` Gerd Hoffmann
2020-06-16 16:57   ` Dr. David Alan Gilbert
2020-06-16 17:10     ` Eduardo Habkost
2020-06-17  8:17       ` Christophe de Dinechin
2020-06-17 16:25         ` Eduardo Habkost
2020-06-17  8:50       ` Daniel P. Berrangé
2020-06-17 10:28         ` Dr. David Alan Gilbert
2020-06-17 14:11         ` Eduardo Habkost
2020-06-16 17:10     ` Gerd Hoffmann
2020-06-16 17:16       ` Dr. David Alan Gilbert
2020-06-16 17:14     ` Guilherme Piccoli
2020-06-17  6:40       ` Gerd Hoffmann
2020-06-17 13:25         ` Laszlo Ersek
2020-06-17 13:26         ` Laszlo Ersek
2020-06-17 13:22       ` Laszlo Ersek
2020-06-17 13:43         ` Guilherme Piccoli
2020-06-17 15:57           ` Laszlo Ersek
2020-06-17 16:01             ` Guilherme Piccoli
2020-06-18  7:56               ` Laszlo Ersek
2020-06-17 13:46         ` Dr. David Alan Gilbert
2020-06-17 15:49           ` Eduardo Habkost
2020-06-17 15:57             ` Guilherme Piccoli
2020-06-17 16:33               ` Eduardo Habkost
2020-06-17 16:40                 ` Guilherme Piccoli
2020-06-18  8:00                 ` Laszlo Ersek
2020-06-17 16:04             ` Dr. David Alan Gilbert
2020-06-17 16:17               ` Daniel P. Berrangé
2020-06-17 16:22                 ` Eduardo Habkost
2020-06-17 16:41                   ` Dr. David Alan Gilbert [this message]
2020-06-17 17:17                     ` Daniel P. Berrangé
2020-06-17 17:23                       ` Dr. David Alan Gilbert
2020-06-17 16:28               ` Eduardo Habkost
2020-06-19 16:13               ` Dr. David Alan Gilbert
2020-06-17 16:14           ` Laszlo Ersek
2020-06-17 16:43             ` Laszlo Ersek
2020-06-17 17:02               ` Eduardo Habkost
2020-06-18  8:29                 ` Laszlo Ersek
2020-06-17  8:16   ` Christophe de Dinechin
2020-06-17 10:12     ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200617164141.GH2776@work-vm \
    --to=dgilbert@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=berrange@redhat.com \
    --cc=chenyi.qiang@intel.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=dann.frazier@canonical.com \
    --cc=ehabkost@redhat.com \
    --cc=fw@gpiccoli.net \
    --cc=gpiccoli@canonical.com \
    --cc=kraxel@redhat.com \
    --cc=lersek@redhat.com \
    --cc=like.xu@linux.intel.com \
    --cc=pedro.principeza@canonical.com \
    --cc=qemu-devel@nongnu.org \
    --cc=robert.hu@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).