From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39480) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XH8Wu-0004Ge-GQ for qemu-devel@nongnu.org; Tue, 12 Aug 2014 05:38:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XH8Wo-0001ST-K7 for qemu-devel@nongnu.org; Tue, 12 Aug 2014 05:38:08 -0400 Message-ID: <53E9E077.1040804@suse.de> Date: Tue, 12 Aug 2014 11:37:59 +0200 From: Alexander Graf MIME-Version: 1.0 References: <1406799254-25223-1-git-send-email-aik@ozlabs.ru> <1406799254-25223-10-git-send-email-aik@ozlabs.ru> <53E8B0BB.60006@suse.de> <53E8DAC2.1020307@ozlabs.ru> <53E8FDCC.7060308@suse.de> <53E959DA.3020206@ozlabs.ru> In-Reply-To: <53E959DA.3020206@ozlabs.ru> Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH 09/10] spapr_pci_vfio: Enable DDW List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy , qemu-devel@nongnu.org Cc: Alex Williamson , qemu-ppc@nongnu.org On 12.08.14 02:03, Alexey Kardashevskiy wrote: > On 08/12/2014 03:30 AM, Alexander Graf wrote: >> On 11.08.14 17:01, Alexey Kardashevskiy wrote: >>> On 08/11/2014 10:02 PM, Alexander Graf wrote: >>>> On 31.07.14 11:34, Alexey Kardashevskiy wrote: >>>>> This implements DDW for VFIO. Host kernel support is required for this. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy >>>>> --- >>>>> hw/ppc/spapr_pci_vfio.c | 75 >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 75 insertions(+) >>>>> >>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c >>>>> index d3bddf2..dc443e2 100644 >>>>> --- a/hw/ppc/spapr_pci_vfio.c >>>>> +++ b/hw/ppc/spapr_pci_vfio.c >>>>> @@ -69,6 +69,77 @@ static void >>>>> spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp) >>>>> /* Register default 32bit DMA window */ >>>>> memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset, >>>>> spapr_tce_get_iommu(tcet)); >>>>> + >>>>> + sphb->ddw_supported = !!(info.flags & VFIO_IOMMU_SPAPR_TCE_FLAG_DDW); >>>>> +} >>>>> + >>>>> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb, >>>>> + uint32_t *windows_available, >>>>> + uint32_t *page_size_mask) >>>>> +{ >>>>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); >>>>> + struct vfio_iommu_spapr_tce_query query = { .argsz = sizeof(query) }; >>>>> + int ret; >>>>> + >>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, >>>>> + VFIO_IOMMU_SPAPR_TCE_QUERY, &query); >>>>> + if (ret) { >>>>> + return ret; >>>>> + } >>>>> + >>>>> + *windows_available = query.windows_available; >>>>> + *page_size_mask = query.page_size_mask; >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t >>>>> page_shift, >>>>> + uint32_t window_shift, uint32_t >>>>> liobn, >>>>> + sPAPRTCETable **ptcet) >>>>> +{ >>>>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); >>>>> + struct vfio_iommu_spapr_tce_create create = { >>>>> + .argsz = sizeof(create), >>>>> + .page_shift = page_shift, >>>>> + .window_shift = window_shift, >>>>> + .start_addr = 0 >>>>> + }; >>>>> + int ret; >>>>> + >>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, >>>>> + VFIO_IOMMU_SPAPR_TCE_CREATE, &create); >>>>> + if (ret) { >>>>> + return ret; >>>>> + } >>>>> + >>>>> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn, create.start_addr, >>>>> + page_shift, 1 << (window_shift - >>>>> page_shift), >>>> I spot a 1 without ULL again - this time it might work out ok, but please >>>> just always use ULL when you pass around addresses. >>> My bad. I keep forgetting this, I'll adjust my own checkpatch.py :) >>> >>> >>>> Please walk me though the abstraction levels on what each page size >>>> honoration means. If I use THP, what page size granularity can I use for >>>> TCE entries? >>> [RFC PATCH 06/10] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls >>> support >>> >>> + const struct { int shift; uint32_t mask; } masks[] = { >>> + { 12, DDW_PGSIZE_4K }, >>> + { 16, DDW_PGSIZE_64K }, >>> + { 24, DDW_PGSIZE_16M }, >>> + { 25, DDW_PGSIZE_32M }, >>> + { 26, DDW_PGSIZE_64M }, >>> + { 27, DDW_PGSIZE_128M }, >>> + { 28, DDW_PGSIZE_256M }, >>> + { 34, DDW_PGSIZE_16G }, >>> + }; >>> >>> >>> Supported page sizes are returned by the host kernel via "query". For 16MB >>> pages, page shift will return DDW_PGSIZE_4K|DDW_PGSIZE_64K|DDW_PGSIZE_16M. >>> Or I did not understand the question... >> Why do we care about the sizes? Anything bigger than what we support should >> always work, no? What happens if the guest creates a 16MB map but my pages >> are 4kb mapped? Wouldn't the same logic be able to deal with 16G pages? > It is DMA memory, if I split "virtual" 16M page to a bunch of real 4K > pages, I have to make sure these 16M are continuous - there will be one TCE > entry for it and no more translations besides IOMMU. What do I miss now? Who does the shadow translation where? Does it exist at all? Alex