All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Tu Dinh <ngoc-tu.dinh@vates.tech>
Cc: xen-devel@lists.xenproject.org,
	Anthony PERARD <anthony.perard@vates.tech>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Michal Orzel <michal.orzel@amd.com>,
	Jan Beulich <jbeulich@suse.com>, Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Juergen Gross <jgross@suse.com>,
	Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
Subject: Re: [PATCH v3] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Date: Fri, 6 Jun 2025 15:23:07 +0200	[thread overview]
Message-ID: <aELru4IZmqHilNiN@macbook.local> (raw)
In-Reply-To: <1b290503-078e-491a-8552-b884df7ac747@vates.tech>

On Fri, Jun 06, 2025 at 01:00:19PM +0000, Tu Dinh wrote:
> Hi Roger,
> 
> On 05/06/2025 18:20, Roger Pau Monne wrote:
> > The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't
> > have the functionality of a traditional PCI device.  The exposed MMIO BAR
> > is used by some guests (including Linux) as a safe place to map foreign
> > memory, including the grant table itself.
> >
> > Traditionally BARs from devices have the uncacheable (UC) cache attribute
> > from the MTRR, to ensure correct functionality of such devices.  hvmloader
> > mimics this behavior and sets the MTRR attributes of both the low and high
> > PCI MMIO windows (where BARs of PCI devices reside) as UC in MTRR.
> >
> > This however causes performance issues for users of the Xen PCI device BAR,
> > as for the purposes of mapping remote memory there's no need to use the UC
> > attribute.  On Intel systems this is worked around by using iPAT, that
> > allows the hypervisor to force the effective cache attribute of a p2m entry
> > regardless of the guest PAT value.  AMD however doesn't have an equivalent
> > of iPAT, and guest PAT values are always considered.
> >
> > Linux commit:
> >
> > 41925b105e34 xen: replace xen_remap() with memremap()
> >
> > Attempted to mitigate this by forcing mappings of the grant-table to use
> > the write-back (WB) cache attribute.  However Linux memremap() takes MTRRs
> > into account to calculate which PAT type to use, and seeing the MTRR cache
> > attribute for the region being UC the PAT also ends up as UC, regardless of
> > the caller having requested WB.
> >
> > As a workaround to allow current Linux to map the grant-table as WB using
> > memremap() introduce an xl.cfg option (xenpci_bar_uc=0) that can be used to
> > select whether the Xen PCI device BAR will have the UC attribute in MTRR.
> > Such workaround in hvmloader should also be paired with a fix for Linux so
> > it attempts to change the MTRR of the Xen PCI device BAR to WB by itself.
> >
> > Overall, the long term solution would be to provide the guest with a safe
> > range in the guest physical address space where mappings to foreign pages
> > can be created.
> >
> > Some vif throughput performance figures provided by Anthoine from a 8
> > vCPUs, 4GB of RAM HVM guest(s) running on AMD hardware:
> >
> > Without this patch:
> > vm -> dom0: 1.1Gb/s
> > vm -> vm:   5.0Gb/s
> >
> > With the patch:
> > vm -> dom0: 4.5Gb/s
> > vm -> vm:   7.0Gb/s
> >
> > Reported-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v2:
> >   - Add default value in xl.cfg.
> >   - List xenstore path in the pandoc file.
> >   - Adjust comment in hvmloader.
> >   - Fix commit message MIO -> MMIO.
> >
> > Changes since v1:
> >   - Leave the xenpci BAR as UC by default.
> >   - Introduce an option to not set it as UC.
> > ---
> >   docs/man/xl.cfg.5.pod.in                |  8 ++++
> >   docs/misc/xenstore-paths.pandoc         |  5 +++
> >   tools/firmware/hvmloader/config.h       |  2 +-
> >   tools/firmware/hvmloader/pci.c          | 49 ++++++++++++++++++++++++-
> >   tools/firmware/hvmloader/util.c         |  2 +-
> >   tools/include/libxl.h                   |  9 +++++
> >   tools/libs/light/libxl_create.c         |  1 +
> >   tools/libs/light/libxl_dom.c            |  9 +++++
> >   tools/libs/light/libxl_types.idl        |  1 +
> >   tools/xl/xl_parse.c                     |  2 +
> >   xen/include/public/hvm/hvm_xs_strings.h |  2 +
> >   11 files changed, 86 insertions(+), 4 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index c388899306c2..ddbff6fffc16 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
> >   Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
> >   requires at least QEMU 1.6.
> >
> > +
> > +=item B<xenpci_bar_uc=BOOLEAN>
> > +
> > +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
> > +uncacheable (UC) cache attribute set in MTRR.
> > +
> > +Default is B<true>.
> > +
> >   =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
> >
> >   The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
> > diff --git a/docs/misc/xenstore-paths.pandoc b/docs/misc/xenstore-paths.pandoc
> > index 01a340fafcbe..073bed91eec1 100644
> > --- a/docs/misc/xenstore-paths.pandoc
> > +++ b/docs/misc/xenstore-paths.pandoc
> > @@ -234,6 +234,11 @@ These xenstore values are used to override some of the default string
> >   values in the SMBIOS table constructed in hvmloader. See the SMBIOS
> >   table specification at http://www.dmtf.org/standards/smbios/
> >
> > +#### ~/hvmloader/pci/xenpci-bar-uc = ("1"|"0") [HVM,INTERNAL]
> > +
> > +Select whether the Xen PCI device MMIO BAR will have the uncacheable cache
> > +attribute set in the MTRRs by hvmloader.
> > +
> >   #### ~/bios-strings/oem-* = STRING [HVM,INTERNAL]
> >
> >   1 to 99 OEM strings can be set in xenstore using values of the form
> > diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> > index 6e1da137d779..c159db30eea9 100644
> > --- a/tools/firmware/hvmloader/config.h
> > +++ b/tools/firmware/hvmloader/config.h
> > @@ -58,7 +58,7 @@ extern uint32_t *cpu_to_apicid;
> >   #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
> >
> >   extern uint32_t pci_mem_start;
> > -extern const uint32_t pci_mem_end;
> > +extern uint32_t pci_mem_end;
> >   extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
> >
> >   extern bool acpi_enabled;
> > diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> > index cc67b18c0361..747f6cfb6794 100644
> > --- a/tools/firmware/hvmloader/pci.c
> > +++ b/tools/firmware/hvmloader/pci.c
> > @@ -30,7 +30,7 @@
> >   #include <xen/hvm/e820.h>
> >
> >   uint32_t pci_mem_start = HVM_BELOW_4G_MMIO_START;
> > -const uint32_t pci_mem_end = RESERVED_MEMBASE;
> > +uint32_t pci_mem_end = RESERVED_MEMBASE;
> >   uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
> >
> >   /*
> > @@ -116,6 +116,8 @@ void pci_setup(void)
> >        * experience the memory relocation bug described below.
> >        */
> >       bool allow_memory_relocate = 1;
> > +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> > +    bool xenpci_bar_uc = false;
> 
> Since this is meant to be a workaround, I wonder if it makes more sense
> to flip the setting (`xenpci_bar_wb`) and make it 0 by default?

I originally didn't want to go that route, because while it's true
that the default MTRR type is set to WB, and so any memory not covered
by a MTRR range will default to that memory type I got the impression
this was inferring too much.

Overall my intention would be for inverting the default long term, and
libxl setting build_info->u.hvm.xenpci_bar_uc = false by default,
which then makes all the naming nicer IMO.

> It also
> simplifies the logic for both hvmloader and the consumer (no need for
> double negatives).

I don't think there are double negatives?  That would happen if the
variable was named xenpci_bar_no_uc or similar?

Thanks, Roger.


  reply	other threads:[~2025-06-06 13:23 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 16:16 [PATCH v3] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute Roger Pau Monne
2025-06-06 13:00 ` Tu Dinh
2025-06-06 13:23   ` Roger Pau Monné [this message]
2025-06-06 13:37     ` Tu Dinh
2025-06-06 13:41 ` Roger Pau Monné
2025-06-09 10:29   ` Oleksii Kurochko
2025-06-10 15:45 ` Jan Beulich
2025-06-10 16:00   ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aELru4IZmqHilNiN@macbook.local \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=anthoine.bourgeois@vates.tech \
    --cc=anthony.perard@vates.tech \
    --cc=jbeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=julien@xen.org \
    --cc=michal.orzel@amd.com \
    --cc=ngoc-tu.dinh@vates.tech \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.