From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [RFC PATCH v3 1/5] PCI: Add support for enforcing all MMIO BARs to be page aligned Date: Fri, 29 Jan 2016 12:01:08 -0700 Message-ID: <1454094068.8133.1.camel@redhat.com> References: <1452841574-2781-1-git-send-email-xyjxie@linux.vnet.ibm.com> <1452841574-2781-2-git-send-email-xyjxie@linux.vnet.ibm.com> <1454021169.23148.5.camel@redhat.com> <56AB40DC.4000302@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: bhelgaas@google.com, corbet@lwn.net, aik@ozlabs.ru, benh@kernel.crashing.org, paulus@samba.org, mpe@ellerman.id.au, warrier@linux.vnet.ibm.com, zhong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com To: Yongji Xie , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-doc@vger.kernel.org Return-path: In-Reply-To: <56AB40DC.4000302@linux.vnet.ibm.com> Sender: linux-doc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Fri, 2016-01-29 at 18:37 +0800, Yongji Xie wrote: > On 2016/1/29 6:46, Alex Williamson wrote: > > On Fri, 2016-01-15 at 15:06 +0800, Yongji Xie wrote: > > > When vfio passthrough a PCI device of which MMIO BARs > > > are smaller than PAGE_SIZE, guest will not handle the > > > mmio accesses to the BARs which leads to mmio emulations > > > in host. > > > =C2=A0=C2=A0 > > > This is because vfio will not allow to passthrough one > > > BAR's mmio page which may be shared with other BARs. > > > =C2=A0=C2=A0 > > > To solve this performance issue, this patch adds a kernel > > > parameter "pci=3Dresource_page_aligned=3Don" to enforce > > > the alignment of all MMIO BARs to be at least PAGE_SIZE, > > > so that one BAR's mmio page would not be shared with other > > > BARs. We can also disable it through kernel parameter > > > "pci=3Dresource_page_aligned=3Doff". > > > =C2=A0=C2=A0 > > > For the default value of the parameter, we think it should be > > > arch-independent, so we add a macro > > > HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED to change it. And we > > > define this macro to enable this parameter by default on PPC64 > > > platform which can easily hit this performance issue because > > > its PAGE_SIZE is 64KB. > > > =C2=A0=C2=A0 > > > Note that the kernel parameter won't works if kernel doesn't do > > > resources reallocation. > > And where do you account for this so that we know whether it's real= ly in > > effect? >=C2=A0 > We can check the flag PCI_PROBE_ONLY to know whether kernel do > resources reallocation. Then we know if the kernel parameter is reall= y > in effect. >=C2=A0 > enum { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Force re-assigning all resources (ig= nore firmware > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* setup completely) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PCI_REASSIGN_ALL_RSRC=C2=A0=C2=A0=C2=A0= =C2=A0=3D 0x00000001, >=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Re-assign all bus numbers */ > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0PCI_REASSIGN_ALL_BUS=C2=A0=C2=A0=C2=A0=C2= =A0=3D 0x00000002, >=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* Do not try to assign, just use exist= ing setup */ > --->=C2=A0=C2=A0=C2=A0=C2=A0PCI_PROBE_ONLY=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=3D 0x00000004, >=C2=A0 > And I will add this to commit log. We need more than a commit log entry for this, what's the purpose of th= e pci_resources_share_page() function if we don't know if this is in effect? > > > Signed-off-by: Yongji Xie > > > --- > > > =C2=A0 Documentation/kernel-parameters.txt |=C2=A0=C2=A0=C2=A0=C2= =A05 +++++ > > > =C2=A0 arch/powerpc/include/asm/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0|=C2=A0=C2=A0=C2=A011 +++++++++++ > > > =C2=A0 drivers/pci/pci.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0=C2=A0=C2=A035 +++++++++++++++++++++++++++++++++++ > > > =C2=A0 drivers/pci/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0=C2=A0=C2=A0=C2=A08 +++++++- > > > =C2=A0 include/linux/pci.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2= =A0=C2=A0=C2=A04 ++++ > > > =C2=A0 5 files changed, 62 insertions(+), 1 deletion(-) > > > =C2=A0=C2=A0 > > > diff --git a/Documentation/kernel-parameters.txt b/Documentation/= kernel-parameters.txt > > > index 742f69d..3f2a7c9 100644 > > > --- a/Documentation/kernel-parameters.txt > > > +++ b/Documentation/kernel-parameters.txt > > > @@ -2857,6 +2857,11 @@ bytes respectively. Such letter suffixes c= an also be entirely omitted. > > > =C2=A0=C2=A0 PAGE_SIZE is used as alignment. > > > =C2=A0=C2=A0 PCI-PCI bridge can be specified, if resource > > > =C2=A0=C2=A0 windows need to be expanded. > > > + resource_page_aligned=3D Enable/disable enforcing the alignmen= t > > > + of all PCI devices' memory resources to be > > > + at least PAGE_SIZE if resources reallocation > > > + is done by kernel. > > > + Format: { "on" | "off" } > > > =C2=A0=C2=A0 ecrc=3D Enable/disable PCIe ECRC (transaction laye= r > > > =C2=A0=C2=A0 end-to-end CRC checking). > > > =C2=A0=C2=A0 bios: Use BIOS/firmware settings. This is the > > > diff --git a/arch/powerpc/include/asm/pci.h b/arch/powerpc/includ= e/asm/pci.h > > > index 3453bd8..2d2b3ef 100644 > > > --- a/arch/powerpc/include/asm/pci.h > > > +++ b/arch/powerpc/include/asm/pci.h > > > @@ -136,6 +136,17 @@ extern pgprot_t pci_phys_mem_access_prot(str= uct file *file, > > > =C2=A0=C2=A0 =C2=A0unsigned long pfn, > > > =C2=A0=C2=A0 =C2=A0unsigned long size, > > > =C2=A0=C2=A0 =C2=A0pgprot_t prot); > > > +#ifdef CONFIG_PPC64 > > > + > > > +/* For PPC64, We enforce all PCI MMIO BARs to be page aligned > > > + * by default. This would be helpful to improve performance > > > + * when we passthrough a PCI device of which BARs are smaller > > > + * than PAGE_SIZE(64KB). And we can use kernel parameter > > > + * "pci=3Dresource_page_aligned=3Doff" to disable it. > > > + */ > > > +#define HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED 1 > > > + > > > +#endif > > > =C2=A0=C2=A0 > > > =C2=A0 #define HAVE_ARCH_PCI_RESOURCE_TO_USER > > > =C2=A0 extern void pci_resource_to_user(const struct pci_dev *dev= , int bar, > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 314db8c..7b21238 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -99,6 +99,9 @@ u8 pci_cache_line_size; > > > =C2=A0=C2=A0=C2=A0*/ > > > =C2=A0 unsigned int pcibios_max_latency =3D 255; > > > =C2=A0=C2=A0 > > > +bool pci_resources_page_aligned =3D > > > + IS_ENABLED(HAVE_PCI_DEFAULT_RESOURCES_PAGE_ALIGNED); > > I don't think this is proper use of IS_ENABLED, which seems to be > > targeted at CONFIG_ type options.=C2=A0=C2=A0You could define this = as that in an > > arch Kconfig. >=C2=A0 > Is it better that we define this as a pci Kconfig and select it in ar= ch=C2=A0 > Kconfig? If you want to use IS_ENABLE here, I would think so. > > > + > > > =C2=A0 /* If set, the PCIe ARI capability will not be used. */ > > > =C2=A0 static bool pcie_ari_disabled; > > > =C2=A0=C2=A0 > > > @@ -4746,6 +4749,35 @@ static ssize_t pci_resource_alignment_stor= e(struct bus_type *bus, > > > =C2=A0 BUS_ATTR(resource_alignment, 0644, pci_resource_alignment_= show, > > > =C2=A0=C2=A0 pci_resource_alignment_store); > > > =C2=A0=C2=A0 > > > +static void pci_resources_get_page_aligned(char *str) > > > +{ > > > + if (!strncmp(str, "off", 3)) > > > + pci_resources_page_aligned =3D false; > > > + else if (!strncmp(str, "on", 2)) > > > + pci_resources_page_aligned =3D true; > > > +} > > "get"? >=C2=A0 > How about pci_resources_parse_page_aligned_param()? Better. > > > + > > > +/* > > > + * This function checks whether PCI BARs' mmio page will be shar= ed > > > + * with other BARs. > > > + */ > > > +bool pci_resources_share_page(struct pci_dev *dev, int resno) > > > +{ > > > + struct resource *res =3D dev->resource + resno; > > > + > > > + if (resource_size(res) >=3D PAGE_SIZE) > > > + return false; > > > + if (pci_resources_page_aligned && !(res->start & ~PAGE_MASK) && > > > + =C2=A0=C2=A0=C2=A0=C2=A0res->flags & IORESOURCE_MEM) { > > > + if (res->sibling) > > > + return (res->sibling->start & ~PAGE_MASK); > > > + else > > > + return false; > > > + } > > > + return true; > > > +} > > > +EXPORT_SYMBOL_GPL(pci_resources_share_page); > > > + > > > =C2=A0 static int __init pci_resource_alignment_sysfs_init(void) > > > =C2=A0 { > > > =C2=A0=C2=A0 return bus_create_file(&pci_bus_type, > > > @@ -4859,6 +4891,9 @@ static int __init pci_setup(char *str) > > > =C2=A0=C2=A0 } else if (!strncmp(str, "resource_alignment=3D", = 19)) { > > > =C2=A0=C2=A0 pci_set_resource_alignment_param(str + 19, > > > =C2=A0=C2=A0 strlen(str + 19)); > > > + } else if (!strncmp(str, "resource_page_aligned=3D", > > > + =C2=A0=C2=A0=C2=A022)) { > > > + pci_resources_get_page_aligned(str + 22); > > Doesn't this seem rather redundant with the option right above it, > > resource_alignment=3D?=C2=A0=C2=A0Why not modify that to support sy= ntax where all > > devices get the same alignment? > >=C2=A0 >=C2=A0 > The kernel option will be used to do two things. >=C2=A0 > Firstly, the option can be used to enable all devices to be page alig= ned. >=C2=A0 > Secondly, we can use the option to disable it when the Kconfig option > mentioned above enable all devices to be page aligned by default. >=C2=A0 > We can easily modify this option "resource_alignment=3D" to do the fi= rst > thing. But I didn't find a proper way to modify it to do the second t= hing. You could allow an arch specified default which is overridden by specifying a resource_alignment=3D value.=C2=A0=C2=A0Then you need a wa= y to disable it, which you could simply do by making pci_set_resource_alignment_param() able to parse something like resource_alignment=3Doff. > > > =C2=A0=C2=A0 } else if (!strncmp(str, "ecrc=3D", 5)) { > > > =C2=A0=C2=A0 pcie_ecrc_get_policy(str + 5); > > > =C2=A0=C2=A0 } else if (!strncmp(str, "hpiosize=3D", 9)) { > > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > > > index d390fc1..b9b333d 100644 > > > --- a/drivers/pci/pci.h > > > +++ b/drivers/pci/pci.h > > > @@ -312,11 +312,17 @@ static inline resource_size_t pci_resource_= alignment(struct pci_dev *dev, > > > =C2=A0 #ifdef CONFIG_PCI_IOV > > > =C2=A0=C2=A0 int resno =3D res - dev->resource; > > > =C2=A0=C2=A0 > > > - if (resno >=3D PCI_IOV_RESOURCES && resno <=3D PCI_IOV_RESOURCE= _END) > > > + if (resno >=3D PCI_IOV_RESOURCES && resno <=3D PCI_IOV_RESOURCE= _END) { > > > + if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM) > > > + return PAGE_ALIGN(pci_sriov_resource_alignment(dev, > > > + =C2=A0=C2=A0resno)); > > > =C2=A0=C2=A0 return pci_sriov_resource_alignment(dev, resno); > > > + } > > > =C2=A0 #endif > > > =C2=A0=C2=A0 if (dev->class >> 8=C2=A0=C2=A0=3D=3D PCI_CLASS_BRID= GE_CARDBUS) > > > =C2=A0=C2=A0 return pci_cardbus_resource_alignment(res); > > > + if (pci_resources_page_aligned && res->flags & IORESOURCE_MEM) > > > + return PAGE_ALIGN(resource_alignment(res)); > > > =C2=A0=C2=A0 return resource_alignment(res); > > > =C2=A0 } > >=C2=A0 > > Since we already have resource_alignment=3D, shouldn't we already h= ave the > > code in place to re-align? >=C2=A0 > Yes, this code can do the re-aligning. But we can't reuse the code be= cause > it re-align device's bars by changing their sizes, which can potentia= lly=C2=A0 > break > some drivers. >=C2=A0 > I'm thinking if we can use IORESOURCE_STARTALIGN for this. Thanks. Shouldn't we fix resource_alignment=3D then to make it behave in a more compatible way then?=C2=A0=C2=A0resource_alignment=3D64k,resource_resiz= e=3Doff? Thanks, Alex