From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Ajay Kaher <akaher@vmware.com>
Cc: x86@kernel.org, hpa@zytor.com, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, rostedt@goodmis.org,
srivatsab@vmware.com, srivatsa@csail.mit.edu,
amakhalov@vmware.com, vsirnapalli@vmware.com,
er.ajay.kaher@gmail.com, willy@infradead.org, namit@vmware.com,
linux-hyperv@vger.kernel.org, kvm@vger.kernel.org,
jailhouse-dev@googlegroups.com, xen-devel@lists.xenproject.org,
acrn-dev@lists.projectacrn.org, helgaas@kernel.org,
bhelgaas@google.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com
Subject: Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on VMware hypervisor
Date: Wed, 07 Sep 2022 17:20:00 +0200 [thread overview]
Message-ID: <8735d3rz33.fsf@redhat.com> (raw)
In-Reply-To: <1662448117-10807-1-git-send-email-akaher@vmware.com>
Ajay Kaher <akaher@vmware.com> writes:
> During boot-time there are many PCI config reads, these could be performed
> either using Port IO instructions (PIO) or memory mapped I/O (MMIO).
>
> PIO are less efficient than MMIO, they require twice as many PCI accesses
> and PIO instructions are serializing. As a result, MMIO should be preferred
> when possible over PIO.
>
> Virtual Machine test result using VMware hypervisor
> 1 hundred thousand reads using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 8.517 seconds (~33.5% faster then PIO)
>
> Currently, when these reads are performed by a virtual machine, they all
> cause a VM-exit, and therefore each one of them induces a considerable
> overhead.
>
> This overhead can be further improved, by mapping MMIO region of virtual
> machine to memory area that holds the values that the “emulated hardware”
> is supposed to return. The memory region is mapped as "read-only” in the
> NPT/EPT, so reads from these regions would be treated as regular memory
> reads. Writes would still be trapped and emulated by the hypervisor.
>
> Virtual Machine test result with above changes in VMware hypervisor
> 1 hundred thousand read using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 0.010 seconds
>
> This helps to reduce virtual machine PCI scan and initialization time by
> ~65%. In our case it reduced to ~18 mSec from ~55 mSec.
>
> MMIO is also faster than PIO on bare-metal systems, but due to some bugs
> with legacy hardware and the smaller gains on bare-metal, it seems prudent
> not to change bare-metal behavior.
Out of curiosity, are we sure MMIO *always* works for other hypervisors
besides Vmware? Various Hyper-V version can probably be tested (were
they?) but with KVM it's much harder as PCI is emulated in VMM and
there's certainly more than 1 in existence...
>
> Signed-off-by: Ajay Kaher <akaher@vmware.com>
> ---
> v1 -> v2:
> Limit changes to apply only to VMs [Matthew W.]
> ---
> arch/x86/pci/common.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 45 insertions(+)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb7986..1e5a8f7 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -20,6 +20,7 @@
> #include <asm/pci_x86.h>
> #include <asm/setup.h>
> #include <asm/irqdomain.h>
> +#include <asm/hypervisor.h>
>
> unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
> PCI_PROBE_MMCONF;
> @@ -57,14 +58,58 @@ int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> return -EINVAL;
> }
>
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +static int vm_raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 *val)
> +{
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}
> +
> +static int vm_raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn,
> + int reg, int len, u32 val)
> +{
> + if (raw_pci_ext_ops)
> + return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val);
> + if (domain == 0 && reg < 256 && raw_pci_ops)
> + return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> + return -EINVAL;
> +}
These look exactly like raw_pci_read()/raw_pci_write() but with inverted
priority. We could've added a parameter but to be more flexible, I'd
suggest we add a 'priority' field to 'struct pci_raw_ops' and make
raw_pci_read()/raw_pci_write() check it before deciding what to use
first. To be on the safe side, you can leave raw_pci_ops's priority
higher than raw_pci_ext_ops's by default and only tweak it in
arch/x86/kernel/cpu/vmware.c
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *value)
> {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> + /*
> + * MMIO is faster than PIO, but due to some bugs with legacy
> + * hardware, it seems prudent to prefer MMIO for VMs and PIO
> + * for bare-metal.
> + */
> + if (!hypervisor_is_type(X86_HYPER_NATIVE))
> + return vm_raw_pci_read(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> return raw_pci_read(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }
>
> static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 value)
> {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> + /*
> + * MMIO is faster than PIO, but due to some bugs with legacy
> + * hardware, it seems prudent to prefer MMIO for VMs and PIO
> + * for bare-metal.
> + */
> + if (!hypervisor_is_type(X86_HYPER_NATIVE))
> + return vm_raw_pci_write(pci_domain_nr(bus), bus->number,
> + devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
> return raw_pci_write(pci_domain_nr(bus), bus->number,
> devfn, where, size, value);
> }
--
Vitaly
next prev parent reply other threads:[~2022-09-07 15:20 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-06 7:08 [PATCH v2] x86/PCI: Prefer MMIO over PIO on VMware hypervisor Ajay Kaher
2022-09-07 15:20 ` Vitaly Kuznetsov [this message]
2022-09-12 15:17 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735d3rz33.fsf@redhat.com \
--to=vkuznets@redhat.com \
--cc=acrn-dev@lists.projectacrn.org \
--cc=akaher@vmware.com \
--cc=amakhalov@vmware.com \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=er.ajay.kaher@gmail.com \
--cc=helgaas@kernel.org \
--cc=hpa@zytor.com \
--cc=jailhouse-dev@googlegroups.com \
--cc=kvm@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namit@vmware.com \
--cc=rostedt@goodmis.org \
--cc=srivatsa@csail.mit.edu \
--cc=srivatsab@vmware.com \
--cc=tglx@linutronix.de \
--cc=vsirnapalli@vmware.com \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).