From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A9DE224AF7 for ; Thu, 19 Dec 2024 14:24:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734618254; cv=none; b=PT+saxDrp3j8xf2DPkIgGvOsofi2R5cZ4zCcbkztSE+QI5HJCeugo1LIgiIwv1b/rMCY+LMrqqsMlySDgAEHGVOEHaaOpzGKiJjoPTM6SXeF5rYG83VsegWuPbbi4EaGJYEiaC0Qb13K0naE3wAtNmDmvmJr2JDtbUOEONVtUHA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734618254; c=relaxed/simple; bh=BPob3r66xo8dFDNqEidWe3lsaIV/5JdDt3161Tki7hc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=pd3wgY4uI7UUkbBCXngQ4AMAfJ9L8xwv4IyrT6MAAiima9K+S5C7MtU/W+XevvJ8lK8EeyDhdDGtj6a7qTgGpr7DZjnp2nC0kro+b1ZfHmXIJZ6jEK+l96qtagXyaqKhIanTf+JbNeA0ZUu8+3iJ9v3fCR+PRf8OWk2eJ47jOvQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lGBVXIOx; arc=none smtp.client-ip=209.85.128.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lGBVXIOx" Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-4361d5730caso44055e9.1 for ; Thu, 19 Dec 2024 06:24:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734618250; x=1735223050; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=0mF0heT2EsQndTEIg+1n0gHprFJ8BxQdp0xOf3Xlv8A=; b=lGBVXIOxx6nKteih7gRIOz4Msb1BXHlg3hYkTWHa4J9W//dCKv+EIDSQDBlcZ16pzH 9A2Te+uGBtcP+zQbkLGp1oI0oz7yNMhgULfKfug5G2t4Ecggfy1n8nd3nDY7v2MEFUPt +29Y1kA7UYPCdprHYvNqqVdbPBdVHOsLlWlXyKIdGDTXkDdsT2YE+g851xCfJbtrDwnL 1x6UoaFIpGOUuc/dSt737H2iO4oA5u297a/Qzls4kIodlbo3Y3okdLUmSox8RzGOfvaz iZj2Ov6MJMP1hUAgxxfTopz/Qk8x+7lM27a+ODIWS/9gFbWVW1yE1vOG2vKmoY3RMCm4 ZDWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734618250; x=1735223050; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0mF0heT2EsQndTEIg+1n0gHprFJ8BxQdp0xOf3Xlv8A=; b=KwLVEFBAi0QEZpAWFlYOcqnHFDEJDP9Cw0JUMTeQjtw3pIMxRDBNM3o1DbxxtKnhRa Azd+WYFCPH8/MLUSi7Uwe+0Ynk65OX+U6bpykBUns8LzleWhzqx67pGTl+VEVTXYGHJR WfZ1qPi3rDnbBFXpmnXKAPdcrzDJNh6L4aKiK9bgO4yfKzZ4kZg3YFOwJumIg/bxzWg9 N/svgtSLS5zTAeHXIiPIY3pVlqFxJYw6PtaWXTTneL3xmRvPAbcc3bPSgZBcgk/QY6sF kSTmp3NgSeSNfHgEoJGz/WMAQHfU1Y1XFDQBBGhPLGxYMZISNSlSa4MFv+UefQUmCMTN K9RQ== X-Forwarded-Encrypted: i=1; AJvYcCVvfT9T2TSYbJT9drBSIqOQ2lCqRAZ/QYruniEU7wJHRcQ7HfagtAZNNnBcFrL90fRSmhTaWYPpr2GTEkU=@vger.kernel.org X-Gm-Message-State: AOJu0YxWyMWSZ5Z7A/36MDe0LHuBvfCigWea5vsn7TdOQpolUt22fXEz gHXIzUv2toF4XZLmupiebNOvxNplV9SCTxaz+p1oO4QJokIrS8YsGEZkpBfSwQ== X-Gm-Gg: ASbGnctU+crBMBWJRkYySRY/x/RJHOr9QA2NX22b8ivojZoSTTzY6n1zjKOLRw7nmk1 +SL4VFivjqEW5ETQBIG1eUVdBCikpqisIlVHeySdm9/ZhSJR4M3ViTAeXLZNY4P9G4pTrF2jqZV 4H+LCy/IUtE64AjqOKHqur4YwjA99bWoq6Z1EQKu2kUIYxAlw4L5d2IWnaIRHkeRdoQ4hCy4F0p OOCv4dGHj307IbJ2h7hBJQMlSf1AIe+5EIg75n4JfeXaXsi6otnYIItGA25fXFekV8E4dXsRScE Hu45KYAwdA4omUQZiEc= X-Google-Smtp-Source: AGHT+IFPdnsOq4Xdu3lmcHrohZg3ZaDg9fAxDEjwde7d0y24seJ+Gf3Cv6GehVm5ybmSKZZ/QDT42w== X-Received: by 2002:a05:600c:1c21:b0:436:186e:13a6 with SMTP id 5b1f17b1804b1-4365e6f26b9mr1380595e9.6.1734618250380; Thu, 19 Dec 2024 06:24:10 -0800 (PST) Received: from google.com (216.131.76.34.bc.googleusercontent.com. [34.76.131.216]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38a1c829235sm1658623f8f.15.2024.12.19.06.24.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Dec 2024 06:24:09 -0800 (PST) Date: Thu, 19 Dec 2024 14:24:05 +0000 From: Mostafa Saleh To: Robin Murphy Cc: iommu@lists.linux.dev, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, will@kernel.org, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, robdclark@gmail.com, joro@8bytes.org, jean-philippe@linaro.org, jgg@ziepe.ca, nicolinc@nvidia.com, vdonnefort@google.com, qperret@google.com, tabba@google.com, danielmentz@google.com, tzukui@google.com Subject: Re: [RFC PATCH v2 55/58] drivers/iommu: Add deferred map_sg operations Message-ID: References: <20241212180423.1578358-1-smostafa@google.com> <20241212180423.1578358-56-smostafa@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Hi Robin, On Thu, Dec 19, 2024 at 12:48:27PM +0000, Robin Murphy wrote: > On 2024-12-12 6:04 pm, Mostafa Saleh wrote: > > With pKVM SMMUv3 driver which para-virtualizes the IOMMU in the > > hypervisor, has an extra overhead with map_sg, as it loops over > > iommu_map, and for each map requires context switching, disabling > > interrupts... > > > > Instead, add an new domain operations: > > - alloc_cookie_sg: Allocate a new sg deferred cookie > > - add_deferred_map_sg: Add a mapping to the cookie > > - consume_deferred_map_sg: Consume and release the cookie > > > > Alternativly, we can pass the sg list as is. However, this would > > duplicate some of the logic and it would make more sense to > > conolidate all the sg list parsing for IOMMU drivers in one place. > > But why bother with fiddly overly-specific machinery at all when you can > already make ->map_pages asynchronous and consolidate the expensive part > into ->iotlb_sync_map in general, like s390 does? This was my initial idea too. But I believe there is no enough context in iotlb_sync_map, so we either have to create a per-domain deferred_map list which is synced on any iotlb_sync_map, but that would require to lock the map operation, hence impacting concurrency. Or we have to use some complex logic to extract context from iotlb_sync_map, (something like range iova tree in map and then on sync we can retrieve that) That’s why I proposed this approach, where the IOMMU subsystem by design is aware of the semantics and “helps” by providing the right data structures/calls. I had a quick look now at s390, and it seems a bit different as they only notify the hypervisor about the iova range being changed, and don’t need to provide iova->paddr mapping which pKVM does. Thanks, Mostafa > > Thanks, > Robin. > > > virtio-iommu is another IOMMU that can benfit from this, but it > > would need to have a new operation that standerdize passing > > an sglist based on these ops. > > > > Signed-off-by: Mostafa Saleh > > --- > > drivers/iommu/iommu.c | 53 +++++++++++++++++++++++++++++++++++++++++-- > > include/linux/iommu.h | 19 ++++++++++++++++ > > 2 files changed, 70 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 83c8e617a2c5..3a3c48631dd6 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -2608,6 +2608,37 @@ size_t iommu_unmap_fast(struct iommu_domain *domain, > > } > > EXPORT_SYMBOL_GPL(iommu_unmap_fast); > > +static int __iommu_add_sg(struct iommu_map_cookie_sg *cookie_sg, > > + unsigned long iova, phys_addr_t paddr, size_t size) > > +{ > > + struct iommu_domain *domain = cookie_sg->domain; > > + const struct iommu_domain_ops *ops = domain->ops; > > + unsigned int min_pagesz; > > + size_t pgsize, count; > > + > > + if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING))) > > + return -EINVAL; > > + > > + if (WARN_ON(domain->pgsize_bitmap == 0UL)) > > + return -ENODEV; > > + > > + /* find out the minimum page size supported */ > > + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); > > + > > + /* > > + * both the virtual address and the physical one, as well as > > + * the size of the mapping, must be aligned (at least) to the > > + * size of the smallest page supported by the hardware > > + */ > > + if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) { > > + pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n", > > + iova, &paddr, size, min_pagesz); > > + return -EINVAL; > > + } > > + pgsize = iommu_pgsize(domain, iova, paddr, size, &count); > > + return ops->add_deferred_map_sg(cookie_sg, paddr, pgsize, count); > > +} > > + > > ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, > > struct scatterlist *sg, unsigned int nents, int prot, > > gfp_t gfp) > > @@ -2617,6 +2648,9 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, > > phys_addr_t start; > > unsigned int i = 0; > > int ret; > > + bool deferred_sg = ops->alloc_cookie_sg && ops->add_deferred_map_sg && > > + ops->consume_deferred_map_sg; > > + struct iommu_map_cookie_sg *cookie_sg; > > might_sleep_if(gfpflags_allow_blocking(gfp)); > > @@ -2625,12 +2659,24 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, > > __GFP_HIGHMEM))) > > return -EINVAL; > > + if (deferred_sg) { > > + cookie_sg = ops->alloc_cookie_sg(iova, prot, nents, gfp); > > + if (!cookie_sg) { > > + pr_err("iommu: failed alloc cookie\n"); > > + return -ENOMEM; > > + } > > + cookie_sg->domain = domain; > > + } > > + > > while (i <= nents) { > > phys_addr_t s_phys = sg_phys(sg); > > if (len && s_phys != start + len) { > > - ret = __iommu_map(domain, iova + mapped, start, > > - len, prot, gfp); > > + if (deferred_sg) > > + ret = __iommu_add_sg(cookie_sg, iova + mapped, start, len); > > + else > > + ret = __iommu_map(domain, iova + mapped, start, > > + len, prot, gfp); > > if (ret) > > goto out_err; > > @@ -2654,6 +2700,9 @@ ssize_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, > > sg = sg_next(sg); > > } > > + if (deferred_sg) > > + ops->consume_deferred_map_sg(cookie_sg); > > + > > if (ops->iotlb_sync_map) { > > ret = ops->iotlb_sync_map(domain, iova, mapped); > > if (ret) > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > > index c75877044185..5e60ac349228 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -601,6 +601,14 @@ struct iommu_ops { > > u8 user_pasid_table:1; > > }; > > +/** > > + * struct iommu_map_cookie_sg - Cookie for a deferred map sg > > + * @domain: Domain for the sg lit > > + */ > > +struct iommu_map_cookie_sg { > > + struct iommu_domain *domain; > > +}; > > + > > /** > > * struct iommu_domain_ops - domain specific operations > > * @attach_dev: attach an iommu domain to a device > > @@ -638,6 +646,11 @@ struct iommu_ops { > > * @enable_nesting: Enable nesting > > * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) > > * @free: Release the domain after use. > > + * @alloc_cookie_sg: Allocate a cookie that would be used to create > > + * a sg list, filled from the next functions > > + * @add_deferred_map_sg: Add a mapping to a cookie of a sg list. > > + * @consume_deferred_map_sg: Consume the sg list as now all mappings are added, > > + * it should also release the cookie as it's not used. > > */ > > struct iommu_domain_ops { > > int (*attach_dev)(struct iommu_domain *domain, struct device *dev); > > @@ -668,6 +681,12 @@ struct iommu_domain_ops { > > unsigned long quirks); > > void (*free)(struct iommu_domain *domain); > > + > > + struct iommu_map_cookie_sg *(*alloc_cookie_sg)(unsigned long iova, int prot, > > + unsigned int nents, gfp_t gfp); > > + int (*add_deferred_map_sg)(struct iommu_map_cookie_sg *cookie, > > + phys_addr_t paddr, size_t pgsize, size_t pgcount); > > + int (*consume_deferred_map_sg)(struct iommu_map_cookie_sg *cookie); > > }; > > /** >