From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53446)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xyjxie@linux.vnet.ibm.com>) id 1bk2NS-0004Ql-MW
	for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:56 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <xyjxie@linux.vnet.ibm.com>) id 1bk2NO-0003Sz-Bc
	for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:53 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:60995)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xyjxie@linux.vnet.ibm.com>) id 1bk2NO-0003SF-34
	for qemu-devel@nongnu.org; Wed, 14 Sep 2016 01:04:50 -0400
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id
	u8E4wGjG096560
	for <qemu-devel@nongnu.org>; Wed, 14 Sep 2016 01:04:48 -0400
Received: from e23smtp01.au.ibm.com (e23smtp01.au.ibm.com [202.81.31.143])
	by mx0a-001b2d01.pphosted.com with ESMTP id 25exqq3b2n-1
	(version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT)
	for <qemu-devel@nongnu.org>; Wed, 14 Sep 2016 01:04:47 -0400
Received: from localhost
	by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <xyjxie@linux.vnet.ibm.com>;
	Wed, 14 Sep 2016 15:04:44 +1000
Received: from d23relay08.au.ibm.com (d23relay08.au.ibm.com [9.185.71.33])
	by d23dlp01.au.ibm.com (Postfix) with ESMTP id 2396E2CE8046
	for <qemu-devel@nongnu.org>; Wed, 14 Sep 2016 15:04:43 +1000 (EST)
Received: from d23av06.au.ibm.com (d23av06.au.ibm.com [9.190.235.151])
	by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
	u8E54h9l7012794
	for <qemu-devel@nongnu.org>; Wed, 14 Sep 2016 15:04:43 +1000
Received: from d23av06.au.ibm.com (localhost [127.0.0.1])
	by d23av06.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	u8E54g71006850
	for <qemu-devel@nongnu.org>; Wed, 14 Sep 2016 15:04:42 +1000
References: <1470913557-4355-1-git-send-email-xyjxie@linux.vnet.ibm.com>
	<20160913165504.3e1b9707@t450s.home>
From: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Date: Wed, 14 Sep 2016 13:04:44 +0800
MIME-Version: 1.0
In-Reply-To: <20160913165504.3e1b9707@t450s.home>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Message-Id: <e6f38bab-46b8-21f7-8bb9-c5f5f7289985@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [PATCH v2] vfio: Add support for mmapping sub-page
 MMIO BARs
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: aik@ozlabs.ru, zhong@linux.vnet.ibm.com, qemu-devel@nongnu.org, kvm@vger.kernel.org, gwshan@linux.vnet.ibm.com, Paolo Bonzini <pbonzini@redhat.com>

On 2016/9/14 6:55, Alex Williamson wrote:

> [cc +Paolo]
>
> On Thu, 11 Aug 2016 19:05:57 +0800
> Yongji Xie <xyjxie@linux.vnet.ibm.com> wrote:
>
>> Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
>> sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
>> to mmap sub-page BARs. This is the corresponding QEMU patch.
>> With those patches applied, we could passthrough sub-page BARs
>> to guest, which can help to improve IO performance for some devices.
>>
>> In this patch, we expand MemoryRegions of these sub-page
>> MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
>> the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
>> with a valid size. The expanding size will be recovered when
>> the base address of sub-page BAR is changed and not page aligned
>> any more in guest. And we also set the priority of these BARs'
>> memory regions to zero in case of overlap with BARs which share
>> the same page with sub-page BARs in guest.
>>
>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com>
>> ---
>>   hw/vfio/common.c |    3 +--
>>   hw/vfio/pci.c    |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 77 insertions(+), 2 deletions(-)
> Hi Yongji,
>
> There was an automated patch checker reply to this patch already, see:
>
> https://patchwork.kernel.org/patch/9275077/
>
> Looks like there's a trivial whitespace issue with using a tab.  Also
> another QEMU style issue noted below.

Yes,  I saw it. I'll fix it in next version. Thanks for your remind.

>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index b313e7c..1a70307 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -661,8 +661,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
>>                                 region, name, region->size);
>>   
>>           if (!vbasedev->no_mmap &&
>> -            region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
>> -            !(region->size & ~qemu_real_host_page_mask)) {
>> +            region->flags & VFIO_REGION_INFO_FLAG_MMAP) {
>>   
>>               vfio_setup_region_sparse_mmaps(region, info);
>>   
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 7bfa17c..7035617 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -1057,6 +1057,65 @@ static const MemoryRegionOps vfio_vga_ops = {
>>   };
>>   
>>   /*
>> + * Expand memory region of sub-page(size < PAGE_SIZE) MMIO BAR to page
>> + * size if the BAR is in an exclusive page in host so that we could map
>> + * this BAR to guest. But this sub-page BAR may not occupy an exclusive
>> + * page in guest. So we should set the priority of the expanded memory
>> + * region to zero in case of overlap with BARs which share the same page
>> + * with the sub-page BAR in guest. Besides, we should also recover the
>> + * size of this sub-page BAR when its base address is changed in guest
>> + * and not page aligned any more.
>> + */
>> +static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>> +{
>> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>> +    VFIORegion *region = &vdev->bars[bar].region;
>> +    MemoryRegion *mmap_mr, *mr;
>> +    PCIIORegion *r;
>> +    pcibus_t bar_addr;
>> +
>> +    /* Make sure that the whole region is allowed to be mmapped */
>> +    if (!(region->nr_mmaps == 1 &&
>> +        region->mmaps[0].size == region->size)) {
>> +        return;
>> +    }
>> +
>> +    r = &pdev->io_regions[bar];
>> +    bar_addr = r->addr;
>> +    if (bar_addr == PCI_BAR_UNMAPPED) {
>> +        return;
>> +    }
>> +
>> +    mr = region->mem;
>> +    mmap_mr = &region->mmaps[0].mem;
>> +    memory_region_transaction_begin();
>> +    if (memory_region_size(mr) < qemu_real_host_page_size) {
>> +        if (!(bar_addr & ~qemu_real_host_page_mask) &&
>> +            memory_region_is_mapped(mr) && region->mmaps[0].mmap) {
>> +            /* Expand memory region to page size and set priority */
>> +            memory_region_del_subregion(r->address_space, mr);
>> +            memory_region_set_size(mr, qemu_real_host_page_size);
>> +            memory_region_set_size(mmap_mr, qemu_real_host_page_size);
>> +            memory_region_add_subregion_overlap(r->address_space,
>> +                                                bar_addr, mr, 0);
>> +	}
>> +    } else {
>> +        /* This case would happen when guest rescan one PCI device */
>> +        if (bar_addr & ~qemu_real_host_page_mask) {
>> +            /* Recover the size of memory region */
>> +            memory_region_set_size(mr, r->size);
>> +            memory_region_set_size(mmap_mr, r->size);
>> +        } else if (memory_region_is_mapped(mr)) {
>> +            /* Set the priority of memory region to zero */
>> +            memory_region_del_subregion(r->address_space, mr);
>> +            memory_region_add_subregion_overlap(r->address_space,
>> +                                                bar_addr, mr, 0);
>> +        }
>> +    }
>> +    memory_region_transaction_commit();
> Paolo, as the reigning memory API expert, do you see any issues with
> this?  Expanding the size of a container MemoryRegion and the contained
> mmap'd subregion and manipulating priorities so that we don't interfere
> with other MemoryRegions.
>
>> +}
>> +
>> +/*
>>    * PCI config space
>>    */
>>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>> @@ -1139,6 +1198,23 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>           } else if (was_enabled && !is_enabled) {
>>               vfio_msix_disable(vdev);
>>           }
>> +    } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
>> +        range_covers_byte(addr, len, PCI_COMMAND)) {
>> +        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>> +        int bar;
>> +
>> +        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +            old_addr[bar] = pdev->io_regions[bar].addr;
>> +        }
>> +
>> +        pci_default_write_config(pdev, addr, val, len);
>> +
>> +        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> +            if (old_addr[bar] != pdev->io_regions[bar].addr &&
>> +                pdev->io_regions[bar].size > 0 &&
>> +                pdev->io_regions[bar].size < qemu_real_host_page_size)
> This requires {} per QEMU coding standards.

Will do.

Thanks,
Yongji