From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53446) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bk2NS-0004Ql-MW for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bk2NO-0003Sz-Bc for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:53 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60995) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bk2NO-0003SF-34 for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:50 -0400 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u8E4wGjG096560 for ; Wed, 14 Sep 2016 01:04:48 -0400 Received: from e23smtp01.au.ibm.com (e23smtp01.au.ibm.com [202.81.31.143]) by mx0a-001b2d01.pphosted.com with ESMTP id 25exqq3b2n-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 14 Sep 2016 01:04:47 -0400 Received: from localhost by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 14 Sep 2016 15:04:44 +1000 Received: from d23relay08.au.ibm.com (d23relay08.au.ibm.com [9.185.71.33]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id 2396E2CE8046 for ; Wed, 14 Sep 2016 15:04:43 +1000 (EST) Received: from d23av06.au.ibm.com (d23av06.au.ibm.com [9.190.235.151]) by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u8E54h9l7012794 for ; Wed, 14 Sep 2016 15:04:43 +1000 Received: from d23av06.au.ibm.com (localhost [127.0.0.1]) by d23av06.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u8E54g71006850 for ; Wed, 14 Sep 2016 15:04:42 +1000 References: <1470913557-4355-1-git-send-email-xyjxie@linux.vnet.ibm.com> <20160913165504.3e1b9707@t450s.home> From: Yongji Xie Date: Wed, 14 Sep 2016 13:04:44 +0800 MIME-Version: 1.0 In-Reply-To: <20160913165504.3e1b9707@t450s.home> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Message-Id: Subject: Re: [Qemu-devel] [PATCH v2] vfio: Add support for mmapping sub-page MMIO BARs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: aik@ozlabs.ru, zhong@linux.vnet.ibm.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, gwshan@linux.vnet.ibm.com, Paolo Bonzini On 2016/9/14 6:55, Alex Williamson wrote: > [cc +Paolo] > > On Thu, 11 Aug 2016 19:05:57 +0800 > Yongji Xie wrote: > >> Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap >> sub-page MMIO BARs if the mmio page is exclusive") allows VFIO >> to mmap sub-page BARs. This is the corresponding QEMU patch. >> With those patches applied, we could passthrough sub-page BARs >> to guest, which can help to improve IO performance for some devices. >> >> In this patch, we expand MemoryRegions of these sub-page >> MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that >> the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION >> with a valid size. The expanding size will be recovered when >> the base address of sub-page BAR is changed and not page aligned >> any more in guest. And we also set the priority of these BARs' >> memory regions to zero in case of overlap with BARs which share >> the same page with sub-page BARs in guest. >> >> Signed-off-by: Yongji Xie >> --- >> hw/vfio/common.c | 3 +-- >> hw/vfio/pci.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 77 insertions(+), 2 deletions(-) > Hi Yongji, > > There was an automated patch checker reply to this patch already, see: > > https://patchwork.kernel.org/patch/9275077/ > > Looks like there's a trivial whitespace issue with using a tab. Also > another QEMU style issue noted below. Yes, I saw it. I'll fix it in next version. Thanks for your remind. >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >> index b313e7c..1a70307 100644 >> --- a/hw/vfio/common.c >> +++ b/hw/vfio/common.c >> @@ -661,8 +661,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region, >> region, name, region->size); >> >> if (!vbasedev->no_mmap && >> - region->flags & VFIO_REGION_INFO_FLAG_MMAP && >> - !(region->size & ~qemu_real_host_page_mask)) { >> + region->flags & VFIO_REGION_INFO_FLAG_MMAP) { >> >> vfio_setup_region_sparse_mmaps(region, info); >> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c >> index 7bfa17c..7035617 100644 >> --- a/hw/vfio/pci.c >> +++ b/hw/vfio/pci.c >> @@ -1057,6 +1057,65 @@ static const MemoryRegionOps vfio_vga_ops = { >> }; >> >> /* >> + * Expand memory region of sub-page(size < PAGE_SIZE) MMIO BAR to page >> + * size if the BAR is in an exclusive page in host so that we could map >> + * this BAR to guest. But this sub-page BAR may not occupy an exclusive >> + * page in guest. So we should set the priority of the expanded memory >> + * region to zero in case of overlap with BARs which share the same page >> + * with the sub-page BAR in guest. Besides, we should also recover the >> + * size of this sub-page BAR when its base address is changed in guest >> + * and not page aligned any more. >> + */ >> +static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) >> +{ >> + VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); >> + VFIORegion *region = &vdev->bars[bar].region; >> + MemoryRegion *mmap_mr, *mr; >> + PCIIORegion *r; >> + pcibus_t bar_addr; >> + >> + /* Make sure that the whole region is allowed to be mmapped */ >> + if (!(region->nr_mmaps == 1 && >> + region->mmaps[0].size == region->size)) { >> + return; >> + } >> + >> + r = &pdev->io_regions[bar]; >> + bar_addr = r->addr; >> + if (bar_addr == PCI_BAR_UNMAPPED) { >> + return; >> + } >> + >> + mr = region->mem; >> + mmap_mr = ®ion->mmaps[0].mem; >> + memory_region_transaction_begin(); >> + if (memory_region_size(mr) < qemu_real_host_page_size) { >> + if (!(bar_addr & ~qemu_real_host_page_mask) && >> + memory_region_is_mapped(mr) && region->mmaps[0].mmap) { >> + /* Expand memory region to page size and set priority */ >> + memory_region_del_subregion(r->address_space, mr); >> + memory_region_set_size(mr, qemu_real_host_page_size); >> + memory_region_set_size(mmap_mr, qemu_real_host_page_size); >> + memory_region_add_subregion_overlap(r->address_space, >> + bar_addr, mr, 0); >> + } >> + } else { >> + /* This case would happen when guest rescan one PCI device */ >> + if (bar_addr & ~qemu_real_host_page_mask) { >> + /* Recover the size of memory region */ >> + memory_region_set_size(mr, r->size); >> + memory_region_set_size(mmap_mr, r->size); >> + } else if (memory_region_is_mapped(mr)) { >> + /* Set the priority of memory region to zero */ >> + memory_region_del_subregion(r->address_space, mr); >> + memory_region_add_subregion_overlap(r->address_space, >> + bar_addr, mr, 0); >> + } >> + } >> + memory_region_transaction_commit(); > Paolo, as the reigning memory API expert, do you see any issues with > this? Expanding the size of a container MemoryRegion and the contained > mmap'd subregion and manipulating priorities so that we don't interfere > with other MemoryRegions. > >> +} >> + >> +/* >> * PCI config space >> */ >> uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) >> @@ -1139,6 +1198,23 @@ void vfio_pci_write_config(PCIDevice *pdev, >> } else if (was_enabled && !is_enabled) { >> vfio_msix_disable(vdev); >> } >> + } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) || >> + range_covers_byte(addr, len, PCI_COMMAND)) { >> + pcibus_t old_addr[PCI_NUM_REGIONS - 1]; >> + int bar; >> + >> + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { >> + old_addr[bar] = pdev->io_regions[bar].addr; >> + } >> + >> + pci_default_write_config(pdev, addr, val, len); >> + >> + for (bar = 0; bar < PCI_ROM_SLOT; bar++) { >> + if (old_addr[bar] != pdev->io_regions[bar].addr && >> + pdev->io_regions[bar].size > 0 && >> + pdev->io_regions[bar].size < qemu_real_host_page_size) > This requires {} per QEMU coding standards. Will do. Thanks, Yongji