From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39480)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XH8Wu-0004Ge-GQ
	for qemu-devel@nongnu.org; Tue, 12 Aug 2014 05:38:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1XH8Wo-0001ST-K7
	for qemu-devel@nongnu.org; Tue, 12 Aug 2014 05:38:08 -0400
Message-ID: <53E9E077.1040804@suse.de>
Date: Tue, 12 Aug 2014 11:37:59 +0200
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
References: <1406799254-25223-1-git-send-email-aik@ozlabs.ru>
	<1406799254-25223-10-git-send-email-aik@ozlabs.ru>
	<53E8B0BB.60006@suse.de> <53E8DAC2.1020307@ozlabs.ru>
	<53E8FDCC.7060308@suse.de> <53E959DA.3020206@ozlabs.ru>
In-Reply-To: <53E959DA.3020206@ozlabs.ru>
Content-Type: text/plain; charset=koi8-r; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH 09/10] spapr_pci_vfio: Enable DDW
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>, qemu-ppc@nongnu.org


On 12.08.14 02:03, Alexey Kardashevskiy wrote:
> On 08/12/2014 03:30 AM, Alexander Graf wrote:
>> On 11.08.14 17:01, Alexey Kardashevskiy wrote:
>>> On 08/11/2014 10:02 PM, Alexander Graf wrote:
>>>> On 31.07.14 11:34, Alexey Kardashevskiy wrote:
>>>>> This implements DDW for VFIO. Host kernel support is required for this.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>>     hw/ppc/spapr_pci_vfio.c | 75
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>     1 file changed, 75 insertions(+)
>>>>>
>>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
>>>>> index d3bddf2..dc443e2 100644
>>>>> --- a/hw/ppc/spapr_pci_vfio.c
>>>>> +++ b/hw/ppc/spapr_pci_vfio.c
>>>>> @@ -69,6 +69,77 @@ static void
>>>>> spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp)
>>>>>         /* Register default 32bit DMA window */
>>>>>         memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset,
>>>>>                                     spapr_tce_get_iommu(tcet));
>>>>> +
>>>>> +    sphb->ddw_supported = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW);
>>>>> +}
>>>>> +
>>>>> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb,
>>>>> +                                    uint32_t *windows_available,
>>>>> +                                    uint32_t *page_size_mask)
>>>>> +{
>>>>> +    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>>>>> +    struct vfio_iommu_spapr_tce_query query = { .argsz = sizeof(query) };
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>>>>> +                               VFIO_IOMMU_SPAPR_TCE_QUERY, &query);
>>>>> +    if (ret) {
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>> +    *windows_available = query.windows_available;
>>>>> +    *page_size_mask = query.page_size_mask;
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t
>>>>> page_shift,
>>>>> +                                     uint32_t window_shift, uint32_t
>>>>> liobn,
>>>>> +                                     sPAPRTCETable **ptcet)
>>>>> +{
>>>>> +    sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb);
>>>>> +    struct vfio_iommu_spapr_tce_create create = {
>>>>> +        .argsz = sizeof(create),
>>>>> +        .page_shift = page_shift,
>>>>> +        .window_shift = window_shift,
>>>>> +        .start_addr = 0
>>>>> +    };
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid,
>>>>> +                               VFIO_IOMMU_SPAPR_TCE_CREATE, &create);
>>>>> +    if (ret) {
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>> +    *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn, create.start_addr,
>>>>> +                                 page_shift, 1 << (window_shift -
>>>>> page_shift),
>>>> I spot a 1 without ULL again - this time it might work out ok, but please
>>>> just always use ULL when you pass around addresses.
>>> My bad. I keep forgetting this, I'll adjust my own checkpatch.py :)
>>>
>>>
>>>> Please walk me though the abstraction levels on what each page size
>>>> honoration means. If I use THP, what page size granularity can I use for
>>>> TCE entries?
>>> [RFC PATCH 06/10] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls
>>> support
>>>
>>> +        const struct { int shift; uint32_t mask; } masks[] = {
>>> +            { 12, DDW_PGSIZE_4K },
>>> +            { 16, DDW_PGSIZE_64K },
>>> +            { 24, DDW_PGSIZE_16M },
>>> +            { 25, DDW_PGSIZE_32M },
>>> +            { 26, DDW_PGSIZE_64M },
>>> +            { 27, DDW_PGSIZE_128M },
>>> +            { 28, DDW_PGSIZE_256M },
>>> +            { 34, DDW_PGSIZE_16G },
>>> +        };
>>>
>>>
>>> Supported page sizes are returned by the host kernel via "query". For 16MB
>>> pages, page shift will return DDW_PGSIZE_4K|DDW_PGSIZE_64K|DDW_PGSIZE_16M.
>>> Or I did not understand the question...
>> Why do we care about the sizes? Anything bigger than what we support should
>> always work, no? What happens if the guest creates a 16MB map but my pages
>> are 4kb mapped? Wouldn't the same logic be able to deal with 16G pages?
> It is DMA memory, if I split "virtual" 16M page to a bunch of real 4K
> pages, I have to make sure these 16M are continuous - there will be one TCE
> entry for it and no more translations besides IOMMU. What do I miss now?

Who does the shadow translation where? Does it exist at all?


Alex