From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [RFC PATCH v2 1/3] PCI: Add support for enforcing all MMIO BARs to be page aligned Date: Mon, 04 Jan 2016 13:47:40 -0700 Message-ID: <1451940460.6585.6.camel@redhat.com> References: <1451551844-11732-1-git-send-email-xyjxie@linux.vnet.ibm.com> <1451551844-11732-2-git-send-email-xyjxie@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: bhelgaas@google.com, corbet@lwn.net, aik@ozlabs.ru, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, warrier@linux.vnet.ibm.com, zhong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com To: Yongji Xie , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-doc@vger.kernel.org Return-path: In-Reply-To: <1451551844-11732-2-git-send-email-xyjxie@linux.vnet.ibm.com> Sender: linux-pci-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Thu, 2015-12-31 at 16:50 +0800, Yongji Xie wrote: > When vfio passthrough a PCI device of which MMIO BARs > are smaller than PAGE_SIZE, guest will not handle the > mmio accesses to the BARs which leads to mmio emulations > in host. >=20 > This is because vfio will not allow to passthrough one > BAR's mmio page which may be shared with other BARs. >=20 > To solve this performance issue, this patch adds a kernel > parameter "pci=3Dresource_page_aligned=3Don" to enforce > the alignments of all MMIO BARs to be at least PAGE_SIZE, > so that one BAR's mmio page would not be shared with other > BARs. We can also disable it through kernel parameter > "pci=3Dresource_page_aligned=3Doff". Shouldn't this somehow be associated with the realloc option? =C2=A0I d= on't think PCI code will attempt to reprogram anything unless it needs to otherwise. > For the default value of this parameter, we think it should be > arch-independent, so we add a macro PCI_RESOURCE_PAGE_ALIGNED > to change it. And we define this macro to enable this parameter > by default on PPC64 platform which can easily hit this > performance issue because its PAGE_SIZE is 64KB. >=20 > Signed-off-by: Yongji Xie > --- > =C2=A0Documentation/kernel-parameters.txt |=C2=A0=C2=A0=C2=A0=C2=A04 = ++++ > =C2=A0arch/powerpc/include/asm/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0|=C2=A0=C2=A0=C2=A011 +++++++++++ > =C2=A0drivers/pci/pci.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0= =C2=A0=C2=A017 +++++++++++++++++ > =C2=A0drivers/pci/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0= =C2=A0=C2=A0=C2=A07 ++++++- > =C2=A0include/linux/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2= =A0=C2=A02 ++ > =C2=A05 files changed, 40 insertions(+), 1 deletion(-) >=20 > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kern= el-parameters.txt > index 742f69d..a53aaee 100644 > --- a/Documentation/kernel-parameters.txt > +++ b/Documentation/kernel-parameters.txt > @@ -2857,6 +2857,10 @@ bytes respectively. Such letter suffixes can a= lso be entirely omitted. > =C2=A0 PAGE_SIZE is used as alignment. > =C2=A0 PCI-PCI bridge can be specified, if resource > =C2=A0 windows need to be expanded. > + resource_page_aligned=3D Enable/disable enforcing the alignment > + of all PCI devices' memory resources to be > + at least PAGE_SIZE. > + Format: { "on" | "off" } > =C2=A0 ecrc=3D Enable/disable PCIe ECRC (transaction layer > =C2=A0 end-to-end CRC checking). > =C2=A0 bios: Use BIOS/firmware settings. This is the > diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/include/as= m/pci.h > index 3453bd8..27bff59 100644 > --- a/arch/powerpc/include/asm/pci.h > +++ b/arch/powerpc/include/asm/pci.h > @@ -136,6 +136,17 @@ extern pgprot_t pci_phys_mem_access_prot(struct = file *file, > =C2=A0 =C2=A0unsigned long pfn, > =C2=A0 =C2=A0unsigned long size, > =C2=A0 =C2=A0pgprot_t prot); > +#ifdef CONFIG_PPC64 > + > +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned > + * by default. This would be helpful to improve performance > + * when we passthrough a PCI device of which BARs are smaller > + * than PAGE_SIZE(64KB). And we can use bootcmd > + * "pci=3Dresource_page_aligned=3Doff" to disable it. > + */ > +#define PCI_ENABLE_RESOURCE_PAGE_ALIGNED > + > +#endif This should be done with something like=C2=A0HAVE_PCI_DEFAULT_RESOURCE_= PAGE_ ALIGNED in arch/powerpc/include/asm > =C2=A0#define HAVE_ARCH_PCI_RESOURCE_TO_USER > =C2=A0extern void pci_resource_to_user(const struct pci_dev *dev, int= bar, > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 314db8c..9f14ba5 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -99,6 +99,13 @@ u8 pci_cache_line_size; > =C2=A0 */ > =C2=A0unsigned int pcibios_max_latency =3D 255; > =C2=A0 > +#ifdef PCI_ENABLE_RESOURCE_PAGE_ALIGNED > +bool pci_resource_page_aligned =3D true; > +#else > +bool pci_resource_page_aligned; > +#endif > +EXPORT_SYMBOL(pci_resource_page_aligned); Couldn't this be done in a single line with IS_ENABLED() macro? Should this symbol be GPL-only? > + > =C2=A0/* If set, the PCIe ARI capability will not be used. */ > =C2=A0static bool pcie_ari_disabled; > =C2=A0 > @@ -4746,6 +4753,14 @@ static ssize_t pci_resource_alignment_store(st= ruct bus_type *bus, > =C2=A0BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_show, > =C2=A0 pci_resource_alignment_store); > =C2=A0 > +static void pci_resource_get_page_aligned(char *str) > +{ > + if (!strncmp(str, "off", 3)) > + pci_resource_page_aligned =3D false; > + else if (!strncmp(str, "on", 2)) > + pci_resource_page_aligned =3D true; > +} > + > =C2=A0static int __init pci_resource_alignment_sysfs_init(void) > =C2=A0{ > =C2=A0 return bus_create_file(&pci_bus_type, > @@ -4859,6 +4874,8 @@ static int __init pci_setup(char *str) > =C2=A0 } else if (!strncmp(str, "resource_alignment=3D", 19)) { > =C2=A0 pci_set_resource_alignment_param(str + 19, > =C2=A0 strlen(str + 19)); > + } else if (!strncmp(str, "resource_page_aligned=3D", 22)) { > + pci_resource_get_page_aligned(str + 22); > =C2=A0 } else if (!strncmp(str, "ecrc=3D", 5)) { > =C2=A0 pcie_ecrc_get_policy(str + 5); > =C2=A0 } else if (!strncmp(str, "hpiosize=3D", 9)) { > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index d390fc1..e16e48c 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -312,11 +312,16 @@ static inline resource_size_t pci_resource_alig= nment(struct pci_dev *dev, > =C2=A0#ifdef CONFIG_PCI_IOV > =C2=A0 int resno =3D res - dev->resource; > =C2=A0 > - if (resno >=3D PCI_IOV_RESOURCES && resno <=3D PCI_IOV_RESOURCE_END= ) > + if (resno >=3D PCI_IOV_RESOURCES && resno <=3D PCI_IOV_RESOURCE_END= ) { > + if (pci_resource_page_aligned && res->flags & IORESOURCE_MEM) > + return PAGE_ALIGN(pci_sriov_resource_alignment(dev, resno)); > =C2=A0 return pci_sriov_resource_alignment(dev, resno); > + } > =C2=A0#endif > =C2=A0 if (dev->class >> 8=C2=A0=C2=A0=3D=3D PCI_CLASS_BRIDGE_CARDBUS= ) > =C2=A0 return pci_cardbus_resource_alignment(res); > + if (pci_resource_page_aligned && res->flags & IORESOURCE_MEM) > + return PAGE_ALIGN(resource_alignment(res)); > =C2=A0 return resource_alignment(res); > =C2=A0} Why are we calling resource_alignment() and then doing another alignment on top of that? =C2=A0Shouldn't this alignment be done there?= =C2=A0I still don't really understand if this is enough to produce a reallocation if the alignment is not sufficient, vfio would need to know if=C2=A0pci_resource_page_aligned has any loopholes. > =C2=A0 > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 6ae25aa..0ca57f1 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1517,6 +1517,8 @@ static inline int pci_get_new_domain_nr(void) {= return -ENOSYS; } > =C2=A0 > =C2=A0#include=20 > =C2=A0 > +extern bool pci_resource_page_aligned; > + > =C2=A0/* these helpers provide future and backwards compatibility > =C2=A0 * for accessing popular PCI BAR info */ > =C2=A0#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].st= art)