* [PATCH v2 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH
@ 2023-09-18 11:51 Niklas Schnelle
2023-09-18 11:51 ` [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Niklas Schnelle
2023-09-18 11:51 ` [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush Niklas Schnelle
0 siblings, 2 replies; 22+ messages in thread
From: Niklas Schnelle @ 2023-09-18 11:51 UTC (permalink / raw)
To: Jean-Philippe Brucker, Joerg Roedel, Will Deacon, Robin Murphy
Cc: virtualization, iommu, linux-kernel, Niklas Schnelle
Hi All,
Previously I used virtio-iommu as a non-s390x test vehicle[0] for the
single queue flushing scheme introduced by my s390x DMA API conversion
series[1]. For this I modified virtio-iommu to a) use .iotlb_sync_map
and b) enable IOMMU_CAP_DEFERRED_FLUSH. It turned out that deferred
flush and even just the introduction of ops->iotlb_sync_map yield
performance uplift[2] even with per-CPU queues. So here is a small
series of these two changes. This still applies on top of my series[1]
because its first patch titled "iommu: Allow .iotlb_sync_map to fail and
handle s390's -ENOMEM return" enable ops->iotlb_sync_map to return
errors and virtio-iommu's sync can fail. This also makes sure there is
no merge conflict with that series.
The code is also available on the b4/viommu-deferred-flush branch of my
kernel.org git repository[3]
Thanks,
Niklas
[0] https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/
[1] https://lore.kernel.org/lkml/20230825-dma_iommu-v12-0-4134455994a7@linux.ibm.com/
[2] https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/log/?h=b4/viommu-deferred-flush
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Changes in v2:
- Check for viommu == NULL in viommu_sync_req() instead of for
0 endpoints in ops (Jean-Philippe)
- Added comment where viommu can be NULL (me)
- Link to v1: https://lore.kernel.org/r/20230825-viommu-sync-map-v1-0-56bdcfaa29ec@linux.ibm.com
---
Niklas Schnelle (2):
iommu/virtio: Make use of ops->iotlb_sync_map
iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush
drivers/iommu/virtio-iommu.c | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
---
base-commit: e165388f6d32dc8a49f49ef6e80584ad3def3d78
change-id: 20230825-viommu-sync-map-1bf0cc4fdc15
Best regards,
--
Niklas Schnelle
Linux on Z Development
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
IBM Data Privacy Statement - https://www.ibm.com/privacy
^ permalink raw reply [flat|nested] 22+ messages in thread* [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-18 11:51 [PATCH v2 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH Niklas Schnelle @ 2023-09-18 11:51 ` Niklas Schnelle 2023-09-18 15:58 ` Jean-Philippe Brucker 2023-09-18 16:37 ` Robin Murphy 2023-09-18 11:51 ` [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush Niklas Schnelle 1 sibling, 2 replies; 22+ messages in thread From: Niklas Schnelle @ 2023-09-18 11:51 UTC (permalink / raw) To: Jean-Philippe Brucker, Joerg Roedel, Will Deacon, Robin Murphy Cc: virtualization, iommu, linux-kernel, Niklas Schnelle Pull out the sync operation from viommu_map_pages() by implementing ops->iotlb_sync_map. This allows the common IOMMU code to map multiple elements of an sg with a single sync (see iommu_map_sg()). Furthermore, it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 17dcd826f5c2..3649586f0e5c 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) int ret; unsigned long flags; + /* + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu + * is initialized e.g. via iommu_create_device_direct_mappings() + */ + if (!viommu) + return 0; spin_lock_irqsave(&viommu->request_lock, flags); ret = __viommu_sync_req(viommu); if (ret) @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, .flags = cpu_to_le32(flags), }; - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); if (ret) { viommu_del_mappings(vdomain, iova, end); return ret; @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, viommu_sync_req(vdomain->viommu); } +static int viommu_iotlb_sync_map(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct viommu_domain *vdomain = to_viommu_domain(domain); + + return viommu_sync_req(vdomain->viommu); +} + static void viommu_get_resv_regions(struct device *dev, struct list_head *head) { struct iommu_resv_region *entry, *new_entry, *msi = NULL; @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { .unmap_pages = viommu_unmap_pages, .iova_to_phys = viommu_iova_to_phys, .iotlb_sync = viommu_iotlb_sync, + .iotlb_sync_map = viommu_iotlb_sync_map, .free = viommu_domain_free, } }; -- 2.39.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-18 11:51 ` [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Niklas Schnelle @ 2023-09-18 15:58 ` Jean-Philippe Brucker 2023-09-18 16:37 ` Robin Murphy 1 sibling, 0 replies; 22+ messages in thread From: Jean-Philippe Brucker @ 2023-09-18 15:58 UTC (permalink / raw) To: Niklas Schnelle Cc: Joerg Roedel, Will Deacon, Robin Murphy, virtualization, iommu, linux-kernel On Mon, Sep 18, 2023 at 01:51:43PM +0200, Niklas Schnelle wrote: > Pull out the sync operation from viommu_map_pages() by implementing > ops->iotlb_sync_map. This allows the common IOMMU code to map multiple > elements of an sg with a single sync (see iommu_map_sg()). Furthermore, > it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. > > Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> This must be merged after "iommu/dma: s390 DMA API conversion and optimized IOTLB flushing" because of the updated iotlb_sync_map() prototype. Thanks, Jean > --- > drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 17dcd826f5c2..3649586f0e5c 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > int ret; > unsigned long flags; > > + /* > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > + * is initialized e.g. via iommu_create_device_direct_mappings() > + */ > + if (!viommu) > + return 0; > spin_lock_irqsave(&viommu->request_lock, flags); > ret = __viommu_sync_req(viommu); > if (ret) > @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, > .flags = cpu_to_le32(flags), > }; > > - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); > + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); > if (ret) { > viommu_del_mappings(vdomain, iova, end); > return ret; > @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, > viommu_sync_req(vdomain->viommu); > } > > +static int viommu_iotlb_sync_map(struct iommu_domain *domain, > + unsigned long iova, size_t size) > +{ > + struct viommu_domain *vdomain = to_viommu_domain(domain); > + > + return viommu_sync_req(vdomain->viommu); > +} > + > static void viommu_get_resv_regions(struct device *dev, struct list_head *head) > { > struct iommu_resv_region *entry, *new_entry, *msi = NULL; > @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { > .unmap_pages = viommu_unmap_pages, > .iova_to_phys = viommu_iova_to_phys, > .iotlb_sync = viommu_iotlb_sync, > + .iotlb_sync_map = viommu_iotlb_sync_map, > .free = viommu_domain_free, > } > }; > > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-18 11:51 ` [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Niklas Schnelle 2023-09-18 15:58 ` Jean-Philippe Brucker @ 2023-09-18 16:37 ` Robin Murphy 2023-09-19 8:00 ` Niklas Schnelle 2023-09-19 8:15 ` Jean-Philippe Brucker 1 sibling, 2 replies; 22+ messages in thread From: Robin Murphy @ 2023-09-18 16:37 UTC (permalink / raw) To: Niklas Schnelle, Jean-Philippe Brucker, Joerg Roedel, Will Deacon Cc: virtualization, iommu, linux-kernel On 2023-09-18 12:51, Niklas Schnelle wrote: > Pull out the sync operation from viommu_map_pages() by implementing > ops->iotlb_sync_map. This allows the common IOMMU code to map multiple > elements of an sg with a single sync (see iommu_map_sg()). Furthermore, > it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. Is it really a requirement? Deferred flush only deals with unmapping. Or are you just trying to say that it's not too worthwhile to try doing more for unmapping performance while obvious mapping performance is still left on the table? > Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 17dcd826f5c2..3649586f0e5c 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > int ret; > unsigned long flags; > > + /* > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > + * is initialized e.g. via iommu_create_device_direct_mappings() > + */ > + if (!viommu) > + return 0; Minor nit: I'd be inclined to make that check explicitly in the places where it definitely is expected, rather than allowing *any* sync to silently do nothing if called incorrectly. Plus then they could use vdomain->nr_endpoints for consistency with the equivalent checks elsewhere (it did take me a moment to figure out how we could get to .iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up first...) Thanks, Robin. > spin_lock_irqsave(&viommu->request_lock, flags); > ret = __viommu_sync_req(viommu); > if (ret) > @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, > .flags = cpu_to_le32(flags), > }; > > - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); > + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); > if (ret) { > viommu_del_mappings(vdomain, iova, end); > return ret; > @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, > viommu_sync_req(vdomain->viommu); > } > > +static int viommu_iotlb_sync_map(struct iommu_domain *domain, > + unsigned long iova, size_t size) > +{ > + struct viommu_domain *vdomain = to_viommu_domain(domain); > + > + return viommu_sync_req(vdomain->viommu); > +} > + > static void viommu_get_resv_regions(struct device *dev, struct list_head *head) > { > struct iommu_resv_region *entry, *new_entry, *msi = NULL; > @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { > .unmap_pages = viommu_unmap_pages, > .iova_to_phys = viommu_iova_to_phys, > .iotlb_sync = viommu_iotlb_sync, > + .iotlb_sync_map = viommu_iotlb_sync_map, > .free = viommu_domain_free, > } > }; > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-18 16:37 ` Robin Murphy @ 2023-09-19 8:00 ` Niklas Schnelle 2023-09-19 8:15 ` Jean-Philippe Brucker 1 sibling, 0 replies; 22+ messages in thread From: Niklas Schnelle @ 2023-09-19 8:00 UTC (permalink / raw) To: Robin Murphy, Jean-Philippe Brucker, Joerg Roedel, Will Deacon Cc: virtualization, iommu, linux-kernel On Mon, 2023-09-18 at 17:37 +0100, Robin Murphy wrote: > On 2023-09-18 12:51, Niklas Schnelle wrote: > > Pull out the sync operation from viommu_map_pages() by implementing > > ops->iotlb_sync_map. This allows the common IOMMU code to map multiple > > elements of an sg with a single sync (see iommu_map_sg()). Furthermore, > > it is also a requirement for IOMMU_CAP_DEFERRED_FLUSH. > > Is it really a requirement? Deferred flush only deals with unmapping. Or > are you just trying to say that it's not too worthwhile to try doing > more for unmapping performance while obvious mapping performance is > still left on the table? > You're right there is no hard requirement. I somehow thought that iommu_create_device_direct_map() relied on it because it does flush_iotlb_all() and iommu_map() but that doesn't seem to be true. If you want I can resend with the last sentence removed. > > Link: https://lore.kernel.org/lkml/20230726111433.1105665-1-schnelle@linux.ibm.com/ > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > drivers/iommu/virtio-iommu.c | 17 ++++++++++++++++- > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > index 17dcd826f5c2..3649586f0e5c 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > int ret; > > unsigned long flags; > > > > + /* > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > + */ > > + if (!viommu) > > + return 0; > > Minor nit: I'd be inclined to make that check explicitly in the places > where it definitely is expected, rather than allowing *any* sync to > silently do nothing if called incorrectly. Plus then they could use > vdomain->nr_endpoints for consistency with the equivalent checks > elsewhere (it did take me a moment to figure out how we could get to > .iotlb_sync_map with a NULL viommu without viommu_map_pages() blowing up > first...) > > Thanks, > Robin. That's what I had in v1. I think this is a matter of taste and Jean- Philippe pointed me to moving the check into viommu_sync_req() I added a comment because it really is not entirely obvious. > > > spin_lock_irqsave(&viommu->request_lock, flags); > > ret = __viommu_sync_req(viommu); > > if (ret) > > @@ -843,7 +849,7 @@ static int viommu_map_pages(struct iommu_domain *domain, unsigned long iova, > > .flags = cpu_to_le32(flags), > > }; > > > > - ret = viommu_send_req_sync(vdomain->viommu, &map, sizeof(map)); > > + ret = viommu_add_req(vdomain->viommu, &map, sizeof(map)); > > if (ret) { > > viommu_del_mappings(vdomain, iova, end); > > return ret; > > @@ -912,6 +918,14 @@ static void viommu_iotlb_sync(struct iommu_domain *domain, > > viommu_sync_req(vdomain->viommu); > > } > > > > +static int viommu_iotlb_sync_map(struct iommu_domain *domain, > > + unsigned long iova, size_t size) > > +{ > > + struct viommu_domain *vdomain = to_viommu_domain(domain); > > + > > + return viommu_sync_req(vdomain->viommu); > > +} > > + > > static void viommu_get_resv_regions(struct device *dev, struct list_head *head) > > { > > struct iommu_resv_region *entry, *new_entry, *msi = NULL; > > @@ -1058,6 +1072,7 @@ static struct iommu_ops viommu_ops = { > > .unmap_pages = viommu_unmap_pages, > > .iova_to_phys = viommu_iova_to_phys, > > .iotlb_sync = viommu_iotlb_sync, > > + .iotlb_sync_map = viommu_iotlb_sync_map, > > .free = viommu_domain_free, > > } > > }; > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-18 16:37 ` Robin Murphy 2023-09-19 8:00 ` Niklas Schnelle @ 2023-09-19 8:15 ` Jean-Philippe Brucker 2023-09-19 8:28 ` Robin Murphy 2023-09-19 14:46 ` Jason Gunthorpe 1 sibling, 2 replies; 22+ messages in thread From: Jean-Philippe Brucker @ 2023-09-19 8:15 UTC (permalink / raw) To: Robin Murphy Cc: Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > index 17dcd826f5c2..3649586f0e5c 100644 > > --- a/drivers/iommu/virtio-iommu.c > > +++ b/drivers/iommu/virtio-iommu.c > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > int ret; > > unsigned long flags; > > + /* > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > + */ > > + if (!viommu) > > + return 0; > > Minor nit: I'd be inclined to make that check explicitly in the places where > it definitely is expected, rather than allowing *any* sync to silently do > nothing if called incorrectly. Plus then they could use > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > (it did take me a moment to figure out how we could get to .iotlb_sync_map > with a NULL viommu without viommu_map_pages() blowing up first...) They're not strictly equivalent: this check works around a temporary issue with the IOMMU core, which calls map/unmap before the domain is finalized. Once we merge domain_alloc() and finalize(), then this check disappears, but we still need to test nr_endpoints in map/unmap to handle detached domains (and we still need to fix the synchronization of nr_endpoints against attach/detach). That's why I preferred doing this on viommu and keeping it in one place. Thanks, Jean ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-19 8:15 ` Jean-Philippe Brucker @ 2023-09-19 8:28 ` Robin Murphy 2023-09-22 7:52 ` Jean-Philippe Brucker 2023-09-19 14:46 ` Jason Gunthorpe 1 sibling, 1 reply; 22+ messages in thread From: Robin Murphy @ 2023-09-19 8:28 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 2023-09-19 09:15, Jean-Philippe Brucker wrote: > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: >>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c >>> index 17dcd826f5c2..3649586f0e5c 100644 >>> --- a/drivers/iommu/virtio-iommu.c >>> +++ b/drivers/iommu/virtio-iommu.c >>> @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) >>> int ret; >>> unsigned long flags; >>> + /* >>> + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu >>> + * is initialized e.g. via iommu_create_device_direct_mappings() >>> + */ >>> + if (!viommu) >>> + return 0; >> >> Minor nit: I'd be inclined to make that check explicitly in the places where >> it definitely is expected, rather than allowing *any* sync to silently do >> nothing if called incorrectly. Plus then they could use >> vdomain->nr_endpoints for consistency with the equivalent checks elsewhere >> (it did take me a moment to figure out how we could get to .iotlb_sync_map >> with a NULL viommu without viommu_map_pages() blowing up first...) > > They're not strictly equivalent: this check works around a temporary issue > with the IOMMU core, which calls map/unmap before the domain is finalized. > Once we merge domain_alloc() and finalize(), then this check disappears, > but we still need to test nr_endpoints in map/unmap to handle detached > domains (and we still need to fix the synchronization of nr_endpoints > against attach/detach). That's why I preferred doing this on viommu and > keeping it in one place. Fair enough - it just seems to me that in both cases it's a detached domain, so its previous history of whether it's ever been otherwise or not shouldn't matter. Even once viommu is initialised, does it really make sense to send sync commands for a mapping on a detached domain where we haven't actually sent any map/unmap commands? Thanks, Robin. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-19 8:28 ` Robin Murphy @ 2023-09-22 7:52 ` Jean-Philippe Brucker 0 siblings, 0 replies; 22+ messages in thread From: Jean-Philippe Brucker @ 2023-09-22 7:52 UTC (permalink / raw) To: Robin Murphy Cc: Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Tue, Sep 19, 2023 at 09:28:08AM +0100, Robin Murphy wrote: > On 2023-09-19 09:15, Jean-Philippe Brucker wrote: > > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > > --- a/drivers/iommu/virtio-iommu.c > > > > +++ b/drivers/iommu/virtio-iommu.c > > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > > int ret; > > > > unsigned long flags; > > > > + /* > > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > > + */ > > > > + if (!viommu) > > > > + return 0; > > > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > > it definitely is expected, rather than allowing *any* sync to silently do > > > nothing if called incorrectly. Plus then they could use > > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > > with a NULL viommu without viommu_map_pages() blowing up first...) > > > > They're not strictly equivalent: this check works around a temporary issue > > with the IOMMU core, which calls map/unmap before the domain is finalized. > > Once we merge domain_alloc() and finalize(), then this check disappears, > > but we still need to test nr_endpoints in map/unmap to handle detached > > domains (and we still need to fix the synchronization of nr_endpoints > > against attach/detach). That's why I preferred doing this on viommu and > > keeping it in one place. > > Fair enough - it just seems to me that in both cases it's a detached domain, > so its previous history of whether it's ever been otherwise or not shouldn't > matter. Even once viommu is initialised, does it really make sense to send > sync commands for a mapping on a detached domain where we haven't actually > sent any map/unmap commands? If no requests were added by map/unmap, then viommu_sync_req() is essentially a nop because virtio doesn't use sync commands (and virtqueue_kick() only kicks the host when the queue's not empty, if I remember correctly). It still does a bit of work so is less efficient than a preliminary check on nr_endpoints, but it feels nicer to streamline the case where the domain is attached, since map/unmap on detached domains happens rarely, if ever. Either is fine by me. An extra test won't make much difference performance wise, and I guess will look less confusing. Niklas, do you mind resending the version with nr_endpoints check (and updated commit messages)? Thanks, Jean ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-19 8:15 ` Jean-Philippe Brucker 2023-09-19 8:28 ` Robin Murphy @ 2023-09-19 14:46 ` Jason Gunthorpe 2023-09-22 7:57 ` Jean-Philippe Brucker 1 sibling, 1 reply; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-19 14:46 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Robin Murphy, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote: > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > --- a/drivers/iommu/virtio-iommu.c > > > +++ b/drivers/iommu/virtio-iommu.c > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > int ret; > > > unsigned long flags; > > > + /* > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > + */ > > > + if (!viommu) > > > + return 0; > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > it definitely is expected, rather than allowing *any* sync to silently do > > nothing if called incorrectly. Plus then they could use > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > with a NULL viommu without viommu_map_pages() blowing up first...) This makes more sense to me Ultimately this driver should reach a point where every iommu_domain always has a non-null domain->viommu because it will be set during alloc. But it can still have nr_endpoints == 0, doesn't it make sense to avoid sync in this case? (btw this driver is missing locking around vdomain->nr_endpoints) > They're not strictly equivalent: this check works around a temporary issue > with the IOMMU core, which calls map/unmap before the domain is > finalized. Where? The above points to iommu_create_device_direct_mappings() but it doesn't because the pgsize_bitmap == 0: static int iommu_create_device_direct_mappings(struct iommu_domain *domain, struct device *dev) { struct iommu_resv_region *entry; struct list_head mappings; unsigned long pg_size; int ret = 0; pg_size = domain->pgsize_bitmap ? 1UL << __ffs(domain->pgsize_bitmap) : 0; INIT_LIST_HEAD(&mappings); if (WARN_ON_ONCE(iommu_is_dma_domain(domain) && !pg_size)) Indeed, the driver should be failing all map's until the domain is finalized because it has no way to check the IOVA matches the eventual aperture. Jason ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-19 14:46 ` Jason Gunthorpe @ 2023-09-22 7:57 ` Jean-Philippe Brucker 2023-09-22 12:41 ` Jason Gunthorpe 0 siblings, 1 reply; 22+ messages in thread From: Jean-Philippe Brucker @ 2023-09-22 7:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: Robin Murphy, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Tue, Sep 19, 2023 at 11:46:49AM -0300, Jason Gunthorpe wrote: > On Tue, Sep 19, 2023 at 09:15:19AM +0100, Jean-Philippe Brucker wrote: > > On Mon, Sep 18, 2023 at 05:37:47PM +0100, Robin Murphy wrote: > > > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > > > > index 17dcd826f5c2..3649586f0e5c 100644 > > > > --- a/drivers/iommu/virtio-iommu.c > > > > +++ b/drivers/iommu/virtio-iommu.c > > > > @@ -189,6 +189,12 @@ static int viommu_sync_req(struct viommu_dev *viommu) > > > > int ret; > > > > unsigned long flags; > > > > + /* > > > > + * .iotlb_sync_map and .flush_iotlb_all may be called before the viommu > > > > + * is initialized e.g. via iommu_create_device_direct_mappings() > > > > + */ > > > > + if (!viommu) > > > > + return 0; > > > > > > Minor nit: I'd be inclined to make that check explicitly in the places where > > > it definitely is expected, rather than allowing *any* sync to silently do > > > nothing if called incorrectly. Plus then they could use > > > vdomain->nr_endpoints for consistency with the equivalent checks elsewhere > > > (it did take me a moment to figure out how we could get to .iotlb_sync_map > > > with a NULL viommu without viommu_map_pages() blowing up first...) > > This makes more sense to me > > Ultimately this driver should reach a point where every iommu_domain > always has a non-null domain->viommu because it will be set during > alloc. > > But it can still have nr_endpoints == 0, doesn't it make sense to > avoid sync in this case? > > (btw this driver is missing locking around vdomain->nr_endpoints) Yes, that's on my list > > > They're not strictly equivalent: this check works around a temporary issue > > with the IOMMU core, which calls map/unmap before the domain is > > finalized. > > Where? The above points to iommu_create_device_direct_mappings() but > it doesn't because the pgsize_bitmap == 0: __iommu_domain_alloc() sets pgsize_bitmap in this case: /* * If not already set, assume all sizes by default; the driver * may override this later */ if (!domain->pgsize_bitmap) domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Thanks, Jean ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 7:57 ` Jean-Philippe Brucker @ 2023-09-22 12:41 ` Jason Gunthorpe 2023-09-22 13:13 ` Robin Murphy 0 siblings, 1 reply; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-22 12:41 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Robin Murphy, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: > > > They're not strictly equivalent: this check works around a temporary issue > > > with the IOMMU core, which calls map/unmap before the domain is > > > finalized. > > > > Where? The above points to iommu_create_device_direct_mappings() but > > it doesn't because the pgsize_bitmap == 0: > > __iommu_domain_alloc() sets pgsize_bitmap in this case: > > /* > * If not already set, assume all sizes by default; the driver > * may override this later > */ > if (!domain->pgsize_bitmap) > domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; Dirver's shouldn't do that. The core code was fixed to try again with mapping reserved regions to support these kinds of drivers. Jason ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 12:41 ` Jason Gunthorpe @ 2023-09-22 13:13 ` Robin Murphy 2023-09-22 16:27 ` Jason Gunthorpe 0 siblings, 1 reply; 22+ messages in thread From: Robin Murphy @ 2023-09-22 13:13 UTC (permalink / raw) To: Jason Gunthorpe, Jean-Philippe Brucker Cc: Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: >>>> They're not strictly equivalent: this check works around a temporary issue >>>> with the IOMMU core, which calls map/unmap before the domain is >>>> finalized. >>> >>> Where? The above points to iommu_create_device_direct_mappings() but >>> it doesn't because the pgsize_bitmap == 0: >> >> __iommu_domain_alloc() sets pgsize_bitmap in this case: >> >> /* >> * If not already set, assume all sizes by default; the driver >> * may override this later >> */ >> if (!domain->pgsize_bitmap) >> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; > > Dirver's shouldn't do that. > > The core code was fixed to try again with mapping reserved regions to > support these kinds of drivers. This is still the "normal" code path, really; I think it's only AMD that started initialising the domain bitmap "early" and warranted making it conditional. However we *do* ultimately want all the drivers to do the same, so we can get rid of ops->pgsize_bitmap, because it's already pretty redundant and meaningless in the face of per-domain pagetable formats. Thanks, Robin. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 13:13 ` Robin Murphy @ 2023-09-22 16:27 ` Jason Gunthorpe 2023-09-22 18:07 ` Robin Murphy 0 siblings, 1 reply; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-22 16:27 UTC (permalink / raw) To: Robin Murphy Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote: > On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: > > > > > They're not strictly equivalent: this check works around a temporary issue > > > > > with the IOMMU core, which calls map/unmap before the domain is > > > > > finalized. > > > > > > > > Where? The above points to iommu_create_device_direct_mappings() but > > > > it doesn't because the pgsize_bitmap == 0: > > > > > > __iommu_domain_alloc() sets pgsize_bitmap in this case: > > > > > > /* > > > * If not already set, assume all sizes by default; the driver > > > * may override this later > > > */ > > > if (!domain->pgsize_bitmap) > > > domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; > > > > Dirver's shouldn't do that. > > > > The core code was fixed to try again with mapping reserved regions to > > support these kinds of drivers. > > This is still the "normal" code path, really; I think it's only AMD that > started initialising the domain bitmap "early" and warranted making it > conditional. My main point was that iommu_create_device_direct_mappings() should fail for unfinalized domains, setting pgsize_bitmap to allow it to succeed is not a nice hack, and not necessary now. What do you think about something like this to replace iommu_create_device_direct_mappings(), that does enforce things properly? static int resv_cmp(void *priv, const struct list_head *llhs, const struct list_head *lrhs) { struct iommu_resv_region *lhs = list_entry(llhs, struct iommu_resv_region, list); struct iommu_resv_region *rhs = list_entry(lrhs, struct iommu_resv_region, list); if (lhs->start == rhs->start) return 0; if (lhs->start < rhs->start) return -1; return 1; } static int iommu_create_device_direct_mappings(struct iommu_domain *domain, struct device *dev) { struct iommu_resv_region *entry; struct iommu_resv_region *tmp; struct list_head mappings; struct list_head direct; phys_addr_t cur = 0; int ret = 0; INIT_LIST_HEAD(&mappings); INIT_LIST_HEAD(&direct); iommu_get_resv_regions(dev, &mappings); list_for_each_entry_safe(entry, tmp, &mappings, list) { if (entry->type == IOMMU_RESV_DIRECT) dev->iommu->require_direct = 1; if ((domain->type & __IOMMU_DOMAIN_PAGING) && (entry->type == IOMMU_RESV_DIRECT || entry->type == IOMMU_RESV_DIRECT_RELAXABLE)) { if (domain->geometry.aperture_start > entry->start || domain->geometry.aperture_end == 0 || (domain->geometry.aperture_end - 1) < (entry->start + entry->length - 1)) { ret = -EINVAL; goto out; } list_move(&entry->list, &direct); } } if (list_empty(&direct)) goto out; /* * FW can have overlapping ranges, sort the list by start address * and map any duplicated IOVA only once. */ list_sort(NULL, &direct, resv_cmp); list_for_each_entry(entry, &direct, list) { phys_addr_t start_pfn = entry->start / PAGE_SIZE; phys_addr_t last_pfn = (entry->length - 1 + entry->start) / PAGE_SIZE; if (start_pfn < cur) start_pfn = cur; if (start_pfn <= last_pfn) { ret = iommu_map(domain, start_pfn * PAGE_SIZE, start_pfn * PAGE_SIZE, (last_pfn - start_pfn + 1) * PAGE_SIZE, entry->prot, GFP_KERNEL); if (ret) goto out; cur = last_pfn + 1; } } out: list_splice(&direct, &mappings); iommu_put_resv_regions(dev, &mappings); return ret; } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 16:27 ` Jason Gunthorpe @ 2023-09-22 18:07 ` Robin Murphy 2023-09-22 23:33 ` Jason Gunthorpe 0 siblings, 1 reply; 22+ messages in thread From: Robin Murphy @ 2023-09-22 18:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 22/09/2023 5:27 pm, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 02:13:18PM +0100, Robin Murphy wrote: >> On 22/09/2023 1:41 pm, Jason Gunthorpe wrote: >>> On Fri, Sep 22, 2023 at 08:57:19AM +0100, Jean-Philippe Brucker wrote: >>>>>> They're not strictly equivalent: this check works around a temporary issue >>>>>> with the IOMMU core, which calls map/unmap before the domain is >>>>>> finalized. >>>>> >>>>> Where? The above points to iommu_create_device_direct_mappings() but >>>>> it doesn't because the pgsize_bitmap == 0: >>>> >>>> __iommu_domain_alloc() sets pgsize_bitmap in this case: >>>> >>>> /* >>>> * If not already set, assume all sizes by default; the driver >>>> * may override this later >>>> */ >>>> if (!domain->pgsize_bitmap) >>>> domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap; >>> >>> Dirver's shouldn't do that. >>> >>> The core code was fixed to try again with mapping reserved regions to >>> support these kinds of drivers. >> >> This is still the "normal" code path, really; I think it's only AMD that >> started initialising the domain bitmap "early" and warranted making it >> conditional. > > My main point was that iommu_create_device_direct_mappings() should > fail for unfinalized domains, setting pgsize_bitmap to allow it to > succeed is not a nice hack, and not necessary now. Sure, but it's the whole "unfinalised domains" and rewriting domain->pgsize_bitmap after attach thing that is itself the massive hack. AMD doesn't do that, and doesn't need to; it knows the appropriate format at allocation time and can quite happily return a fully working domain which allows map before attach, but the old ops->pgsize_bitmap mechanism fundamentally doesn't work for multiple formats with different page sizes. The only thing I'd accuse it of doing wrong is the weird half-and-half thing of having one format as a default via one mechanism, and the other as an override through the other, rather than setting both explicitly. virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings either; it sets it once it's discovered any instance, since apparently it's assuming that all instances must support identical page sizes, and thus once it's seen one it can work "normally" per the core code's assumptions. It's also I think the only driver which has a "finalise" bodge but *can* still properly support map-before-attach, by virtue of having to replay mappings to every new endpoint anyway. > What do you think about something like this to replace > iommu_create_device_direct_mappings(), that does enforce things > properly? I fail to see how that would make any practical difference. Either the mappings can be correctly set up in a pagetable *before* the relevant device is attached to that pagetable, or they can't (if the driver doesn't have enough information to be able to do so) and we just have to really hope nothing blows up in the race window between attaching the device to an empty pagetable and having a second try at iommu_create_device_direct_mappings(). That's a driver-level issue and has nothing to do with pgsize_bitmap either way. Thanks, Robin. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 18:07 ` Robin Murphy @ 2023-09-22 23:33 ` Jason Gunthorpe 2023-09-25 2:48 ` Baolu Lu 2023-09-25 13:07 ` Robin Murphy 0 siblings, 2 replies; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-22 23:33 UTC (permalink / raw) To: Robin Murphy Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > either; it sets it once it's discovered any instance, since apparently it's > assuming that all instances must support identical page sizes, and thus once > it's seen one it can work "normally" per the core code's assumptions. It's > also I think the only driver which has a "finalise" bodge but *can* still > properly support map-before-attach, by virtue of having to replay mappings > to every new endpoint anyway. Well it can't quite do that since it doesn't know the geometry - it all is sort of guessing and hoping it doesn't explode on replay. If it knows the geometry it wouldn't need finalize... > > What do you think about something like this to replace > > iommu_create_device_direct_mappings(), that does enforce things > > properly? > > I fail to see how that would make any practical difference. Either the > mappings can be correctly set up in a pagetable *before* the relevant device > is attached to that pagetable, or they can't (if the driver doesn't have > enough information to be able to do so) and we just have to really hope > nothing blows up in the race window between attaching the device to an empty > pagetable and having a second try at iommu_create_device_direct_mappings(). > That's a driver-level issue and has nothing to do with pgsize_bitmap either > way. Except we don't detect this in the core code correctly, that is my point. We should detect the aperture conflict, not pgsize_bitmap to check if it is the first or second try. Jason ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 23:33 ` Jason Gunthorpe @ 2023-09-25 2:48 ` Baolu Lu 2023-09-25 12:40 ` Jason Gunthorpe 2023-09-25 13:07 ` Robin Murphy 1 sibling, 1 reply; 22+ messages in thread From: Baolu Lu @ 2023-09-25 2:48 UTC (permalink / raw) To: Jason Gunthorpe, Robin Murphy Cc: baolu.lu, Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 9/23/23 7:33 AM, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > >> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >> either; it sets it once it's discovered any instance, since apparently it's >> assuming that all instances must support identical page sizes, and thus once >> it's seen one it can work "normally" per the core code's assumptions. It's >> also I think the only driver which has a "finalise" bodge but*can* still >> properly support map-before-attach, by virtue of having to replay mappings >> to every new endpoint anyway. > Well it can't quite do that since it doesn't know the geometry - it > all is sort of guessing and hoping it doesn't explode on replay. If it > knows the geometry it wouldn't need finalize... The ultimate solution to this problem seems to be to add device pointer to the parameter of ops->domain_alloc. So once the domain is allocated, it is fully initialized. Attaching this domain to a device that is not compatible will return -EINVAL, then the caller has to allocate a new domain for this device. I feel that this is not an AMD specific problem, other iommu drivers will also encounter the similar problem sooner or later. Best regards, baolu ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-25 2:48 ` Baolu Lu @ 2023-09-25 12:40 ` Jason Gunthorpe 0 siblings, 0 replies; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-25 12:40 UTC (permalink / raw) To: Baolu Lu Cc: Robin Murphy, Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Mon, Sep 25, 2023 at 10:48:21AM +0800, Baolu Lu wrote: > On 9/23/23 7:33 AM, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > > > > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > > > either; it sets it once it's discovered any instance, since apparently it's > > > assuming that all instances must support identical page sizes, and thus once > > > it's seen one it can work "normally" per the core code's assumptions. It's > > > also I think the only driver which has a "finalise" bodge but*can* still > > > properly support map-before-attach, by virtue of having to replay mappings > > > to every new endpoint anyway. > > Well it can't quite do that since it doesn't know the geometry - it > > all is sort of guessing and hoping it doesn't explode on replay. If it > > knows the geometry it wouldn't need finalize... > > The ultimate solution to this problem seems to be to add device pointer > to the parameter of ops->domain_alloc. So once the domain is allocated, > it is fully initialized. Attaching this domain to a device that is not > compatible will return -EINVAL, then the caller has to allocate a new > domain for this device. Sure, it looks like this, and it works already for default domain setup. diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 8599394c9157d7..1697f5a2370785 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -637,15 +637,10 @@ static void viommu_event_handler(struct virtqueue *vq) /* IOMMU API */ -static struct iommu_domain *viommu_domain_alloc(unsigned type) +static struct viommu_domain *__viommu_domain_alloc(void) { struct viommu_domain *vdomain; - if (type != IOMMU_DOMAIN_UNMANAGED && - type != IOMMU_DOMAIN_DMA && - type != IOMMU_DOMAIN_IDENTITY) - return NULL; - vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL); if (!vdomain) return NULL; @@ -654,16 +649,32 @@ static struct iommu_domain *viommu_domain_alloc(unsigned type) spin_lock_init(&vdomain->mappings_lock); vdomain->mappings = RB_ROOT_CACHED; + return vdomain; +} + +static struct iommu_domain *viommu_domain_alloc(unsigned type) +{ + struct viommu_domain *vdomain; + + /* + * viommu sometimes creates an identity domain out of a + * paging domain, that can't use the global static. + * During attach it will fill in an identity page table. + */ + if (type != IOMMU_DOMAIN_IDENTITY) + return NULL; + vdomain = __viommu_domain_alloc(); + if (!vdomain) + return NULL; return &vdomain->domain; } static int viommu_domain_finalise(struct viommu_endpoint *vdev, - struct iommu_domain *domain) + struct viommu_domain *vdomain) { int ret; unsigned long viommu_page_size; struct viommu_dev *viommu = vdev->viommu; - struct viommu_domain *vdomain = to_viommu_domain(domain); viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap); if (viommu_page_size > PAGE_SIZE) { @@ -680,13 +691,13 @@ static int viommu_domain_finalise(struct viommu_endpoint *vdev, vdomain->id = (unsigned int)ret; - domain->pgsize_bitmap = viommu->pgsize_bitmap; - domain->geometry = viommu->geometry; + vdomain->domain.pgsize_bitmap = viommu->pgsize_bitmap; + vdomain->domain.geometry = viommu->geometry; vdomain->map_flags = viommu->map_flags; vdomain->viommu = viommu; - if (domain->type == IOMMU_DOMAIN_IDENTITY) { + if (vdomain->domain.type == IOMMU_DOMAIN_IDENTITY) { if (virtio_has_feature(viommu->vdev, VIRTIO_IOMMU_F_BYPASS_CONFIG)) { vdomain->bypass = true; @@ -717,6 +728,24 @@ static void viommu_domain_free(struct iommu_domain *domain) kfree(vdomain); } +static struct iommu_domain *viommu_domain_alloc_paging(struct device *dev) +{ + struct viommu_domain *vdomain; + vdomain = __viommu_domain_alloc(); + if (!vdomain) + return NULL; + + if (dev) { + struct viommu_endpoint *vdev = dev_iommu_priv_get(dev); + + if (viommu_domain_finalise(vdev, vdomain)) { + viommu_domain_free(&vdomain->domain); + return NULL; + } + } + return &vdomain->domain; +} + static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) { int i; @@ -732,7 +761,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev) * Properly initialize the domain now that we know which viommu * owns it. */ - ret = viommu_domain_finalise(vdev, domain); + ret = viommu_domain_finalise(vdev, vdomain); } else if (vdomain->viommu != vdev->viommu) { ret = -EINVAL; } @@ -1045,6 +1074,7 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) static struct iommu_ops viommu_ops = { .capable = viommu_capable, .domain_alloc = viommu_domain_alloc, + .domain_alloc_paging = viommu_domain_alloc_paging, .probe_device = viommu_probe_device, .probe_finalize = viommu_probe_finalize, .release_device = viommu_release_device, ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-22 23:33 ` Jason Gunthorpe 2023-09-25 2:48 ` Baolu Lu @ 2023-09-25 13:07 ` Robin Murphy 2023-09-25 13:29 ` Jason Gunthorpe 1 sibling, 1 reply; 22+ messages in thread From: Robin Murphy @ 2023-09-25 13:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 2023-09-23 00:33, Jason Gunthorpe wrote: > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > >> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >> either; it sets it once it's discovered any instance, since apparently it's >> assuming that all instances must support identical page sizes, and thus once >> it's seen one it can work "normally" per the core code's assumptions. It's >> also I think the only driver which has a "finalise" bodge but *can* still >> properly support map-before-attach, by virtue of having to replay mappings >> to every new endpoint anyway. > > Well it can't quite do that since it doesn't know the geometry - it > all is sort of guessing and hoping it doesn't explode on replay. If it > knows the geometry it wouldn't need finalize... I think it's entirely reasonable to assume that any direct mappings specified for a device are valid for that device and its IOMMU. However, in the particular case of virtio, it really shouldn't ever have direct mappings anyway, since even if the underlying hardware did have any, the host can enforce the actual direct-mapping aspect itself, and just present them as unusable regions to the guest. >>> What do you think about something like this to replace >>> iommu_create_device_direct_mappings(), that does enforce things >>> properly? >> >> I fail to see how that would make any practical difference. Either the >> mappings can be correctly set up in a pagetable *before* the relevant device >> is attached to that pagetable, or they can't (if the driver doesn't have >> enough information to be able to do so) and we just have to really hope >> nothing blows up in the race window between attaching the device to an empty >> pagetable and having a second try at iommu_create_device_direct_mappings(). >> That's a driver-level issue and has nothing to do with pgsize_bitmap either >> way. > > Except we don't detect this in the core code correctly, that is my > point. We should detect the aperture conflict, not pgsize_bitmap to > check if it is the first or second try. Again, that's irrelevant. It can only be about whether the actual ->map_pages call succeeds or not. A driver could well know up-front that all instances support the same pgsize_bitmap and aperture, and set both at ->domain_alloc time, yet still be unable to handle an actual mapping without knowing which instance(s) that needs to interact with (e.g. omap-iommu). Thanks, Robin. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-25 13:07 ` Robin Murphy @ 2023-09-25 13:29 ` Jason Gunthorpe 2023-09-25 17:23 ` Robin Murphy 0 siblings, 1 reply; 22+ messages in thread From: Jason Gunthorpe @ 2023-09-25 13:29 UTC (permalink / raw) To: Robin Murphy Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote: > On 2023-09-23 00:33, Jason Gunthorpe wrote: > > On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: > > > > > virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings > > > either; it sets it once it's discovered any instance, since apparently it's > > > assuming that all instances must support identical page sizes, and thus once > > > it's seen one it can work "normally" per the core code's assumptions. It's > > > also I think the only driver which has a "finalise" bodge but *can* still > > > properly support map-before-attach, by virtue of having to replay mappings > > > to every new endpoint anyway. > > > > Well it can't quite do that since it doesn't know the geometry - it > > all is sort of guessing and hoping it doesn't explode on replay. If it > > knows the geometry it wouldn't need finalize... > > I think it's entirely reasonable to assume that any direct mappings > specified for a device are valid for that device and its IOMMU. However, in > the particular case of virtio, it really shouldn't ever have direct mappings > anyway, since even if the underlying hardware did have any, the host can > enforce the actual direct-mapping aspect itself, and just present them as > unusable regions to the guest. I assume this machinery is for the ARM GIC ITS page.... > Again, that's irrelevant. It can only be about whether the actual > ->map_pages call succeeds or not. A driver could well know up-front that all > instances support the same pgsize_bitmap and aperture, and set both at > ->domain_alloc time, yet still be unable to handle an actual mapping without > knowing which instance(s) that needs to interact with (e.g. omap-iommu). I think this is a different issue. The domain is supposed to represent the actual io pte storage, and the storage is supposed to exist even when the domain is not attached to anything. As we said with tegra-gart, it is a bug in the driver if all the mappings disappear when the last device is detached from the domain. Driver bugs like this turn into significant issues with vfio/iommufd as this will result in warn_on's and memory leaking. So, I disagree that this is something we should be allowing in the API design. map_pages should succeed (memory allocation failures aside) if a IOVA within the aperture and valid flags are presented. Regardless of the attachment status. Calling map_pages with an IOVA outside the aperture should be a caller bug. It looks omap is just mis-designed to store the pgd in the omap_iommu, not the omap_iommu_domain :( pgd is clearly a per-domain object in our API. And why does every instance need its own copy of the identical pgd? Jason ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map 2023-09-25 13:29 ` Jason Gunthorpe @ 2023-09-25 17:23 ` Robin Murphy 0 siblings, 0 replies; 22+ messages in thread From: Robin Murphy @ 2023-09-25 17:23 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jean-Philippe Brucker, Niklas Schnelle, Joerg Roedel, Will Deacon, virtualization, iommu, linux-kernel On 2023-09-25 14:29, Jason Gunthorpe wrote: > On Mon, Sep 25, 2023 at 02:07:50PM +0100, Robin Murphy wrote: >> On 2023-09-23 00:33, Jason Gunthorpe wrote: >>> On Fri, Sep 22, 2023 at 07:07:40PM +0100, Robin Murphy wrote: >>> >>>> virtio isn't setting ops->pgsize_bitmap for the sake of direct mappings >>>> either; it sets it once it's discovered any instance, since apparently it's >>>> assuming that all instances must support identical page sizes, and thus once >>>> it's seen one it can work "normally" per the core code's assumptions. It's >>>> also I think the only driver which has a "finalise" bodge but *can* still >>>> properly support map-before-attach, by virtue of having to replay mappings >>>> to every new endpoint anyway. >>> >>> Well it can't quite do that since it doesn't know the geometry - it >>> all is sort of guessing and hoping it doesn't explode on replay. If it >>> knows the geometry it wouldn't need finalize... >> >> I think it's entirely reasonable to assume that any direct mappings >> specified for a device are valid for that device and its IOMMU. However, in >> the particular case of virtio, it really shouldn't ever have direct mappings >> anyway, since even if the underlying hardware did have any, the host can >> enforce the actual direct-mapping aspect itself, and just present them as >> unusable regions to the guest. > > I assume this machinery is for the ARM GIC ITS page.... > >> Again, that's irrelevant. It can only be about whether the actual >> ->map_pages call succeeds or not. A driver could well know up-front that all >> instances support the same pgsize_bitmap and aperture, and set both at >> ->domain_alloc time, yet still be unable to handle an actual mapping without >> knowing which instance(s) that needs to interact with (e.g. omap-iommu). > > I think this is a different issue. The domain is supposed to represent > the actual io pte storage, and the storage is supposed to exist even > when the domain is not attached to anything. > > As we said with tegra-gart, it is a bug in the driver if all the > mappings disappear when the last device is detached from the domain. > Driver bugs like this turn into significant issues with vfio/iommufd > as this will result in warn_on's and memory leaking. > > So, I disagree that this is something we should be allowing in the API > design. map_pages should succeed (memory allocation failures aside) if > a IOVA within the aperture and valid flags are presented. Regardless > of the attachment status. Calling map_pages with an IOVA outside the > aperture should be a caller bug. > > It looks omap is just mis-designed to store the pgd in the omap_iommu, > not the omap_iommu_domain :( pgd is clearly a per-domain object in our > API. And why does every instance need its own copy of the identical > pgd? The point wasn't that it was necessarily a good and justifiable example, just that it is one that exists, to demonstrate that in general we have no reasonable heuristic for guessing whether ->map_pages is going to succeed or not other than by calling it and seeing if it succeeds or not. And IMO it's a complete waste of time thinking about ways to make such a heuristic possible instead of just getting on with fixing iommu_domain_alloc() to make the problem disappear altogether. Once Joerg pushes out the current queue I'll rebase and resend v4 of the bus ops removal, then hopefully get back to despairing at the hideous pile of WIP iommu_domain_alloc() patches I currently have on top of it... Thanks, Robin. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush 2023-09-18 11:51 [PATCH v2 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH Niklas Schnelle 2023-09-18 11:51 ` [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Niklas Schnelle @ 2023-09-18 11:51 ` Niklas Schnelle 2023-09-18 15:59 ` Jean-Philippe Brucker 1 sibling, 1 reply; 22+ messages in thread From: Niklas Schnelle @ 2023-09-18 11:51 UTC (permalink / raw) To: Jean-Philippe Brucker, Joerg Roedel, Will Deacon, Robin Murphy Cc: virtualization, iommu, linux-kernel, Niklas Schnelle Add ops->flush_iotlb_all operation to enable virtio-iommu for the dma-iommu deferred flush scheme. This results in a significant increase in performance in exchange for a window in which devices can still access previously IOMMU mapped memory when running with CONFIG_IOMMU_DEFAULT_DMA_LAZY. The previous strict behavior can be achieved with iommu.strict=1 on the kernel command line or CONFIG_IOMMU_DEFAULT_DMA_STRICT. Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/ Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- drivers/iommu/virtio-iommu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 3649586f0e5c..4dd330fbcbdd 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -926,6 +926,13 @@ static int viommu_iotlb_sync_map(struct iommu_domain *domain, return viommu_sync_req(vdomain->viommu); } +static void viommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct viommu_domain *vdomain = to_viommu_domain(domain); + + viommu_sync_req(vdomain->viommu); +} + static void viommu_get_resv_regions(struct device *dev, struct list_head *head) { struct iommu_resv_region *entry, *new_entry, *msi = NULL; @@ -1051,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: return true; + case IOMMU_CAP_DEFERRED_FLUSH: + return true; default: return false; } @@ -1071,6 +1080,7 @@ static struct iommu_ops viommu_ops = { .map_pages = viommu_map_pages, .unmap_pages = viommu_unmap_pages, .iova_to_phys = viommu_iova_to_phys, + .flush_iotlb_all = viommu_flush_iotlb_all, .iotlb_sync = viommu_iotlb_sync, .iotlb_sync_map = viommu_iotlb_sync_map, .free = viommu_domain_free, -- 2.39.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush 2023-09-18 11:51 ` [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush Niklas Schnelle @ 2023-09-18 15:59 ` Jean-Philippe Brucker 0 siblings, 0 replies; 22+ messages in thread From: Jean-Philippe Brucker @ 2023-09-18 15:59 UTC (permalink / raw) To: Niklas Schnelle Cc: Joerg Roedel, Will Deacon, Robin Murphy, virtualization, iommu, linux-kernel On Mon, Sep 18, 2023 at 01:51:44PM +0200, Niklas Schnelle wrote: > Add ops->flush_iotlb_all operation to enable virtio-iommu for the > dma-iommu deferred flush scheme. This results in a significant increase > in performance in exchange for a window in which devices can still > access previously IOMMU mapped memory when running with > CONFIG_IOMMU_DEFAULT_DMA_LAZY. The previous strict behavior can be > achieved with iommu.strict=1 on the kernel command line or > CONFIG_IOMMU_DEFAULT_DMA_STRICT. > > Link: https://lore.kernel.org/lkml/20230802123612.GA6142@myrica/ > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > --- > drivers/iommu/virtio-iommu.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 3649586f0e5c..4dd330fbcbdd 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -926,6 +926,13 @@ static int viommu_iotlb_sync_map(struct iommu_domain *domain, > return viommu_sync_req(vdomain->viommu); > } > > +static void viommu_flush_iotlb_all(struct iommu_domain *domain) > +{ > + struct viommu_domain *vdomain = to_viommu_domain(domain); > + > + viommu_sync_req(vdomain->viommu); > +} > + > static void viommu_get_resv_regions(struct device *dev, struct list_head *head) > { > struct iommu_resv_region *entry, *new_entry, *msi = NULL; > @@ -1051,6 +1058,8 @@ static bool viommu_capable(struct device *dev, enum iommu_cap cap) > switch (cap) { > case IOMMU_CAP_CACHE_COHERENCY: > return true; > + case IOMMU_CAP_DEFERRED_FLUSH: > + return true; > default: > return false; > } > @@ -1071,6 +1080,7 @@ static struct iommu_ops viommu_ops = { > .map_pages = viommu_map_pages, > .unmap_pages = viommu_unmap_pages, > .iova_to_phys = viommu_iova_to_phys, > + .flush_iotlb_all = viommu_flush_iotlb_all, > .iotlb_sync = viommu_iotlb_sync, > .iotlb_sync_map = viommu_iotlb_sync_map, > .free = viommu_domain_free, > > -- > 2.39.2 > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-09-25 17:24 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-18 11:51 [PATCH v2 0/2] iommu/virtio: Enable IOMMU_CAP_DERRED_FLUSH Niklas Schnelle 2023-09-18 11:51 ` [PATCH v2 1/2] iommu/virtio: Make use of ops->iotlb_sync_map Niklas Schnelle 2023-09-18 15:58 ` Jean-Philippe Brucker 2023-09-18 16:37 ` Robin Murphy 2023-09-19 8:00 ` Niklas Schnelle 2023-09-19 8:15 ` Jean-Philippe Brucker 2023-09-19 8:28 ` Robin Murphy 2023-09-22 7:52 ` Jean-Philippe Brucker 2023-09-19 14:46 ` Jason Gunthorpe 2023-09-22 7:57 ` Jean-Philippe Brucker 2023-09-22 12:41 ` Jason Gunthorpe 2023-09-22 13:13 ` Robin Murphy 2023-09-22 16:27 ` Jason Gunthorpe 2023-09-22 18:07 ` Robin Murphy 2023-09-22 23:33 ` Jason Gunthorpe 2023-09-25 2:48 ` Baolu Lu 2023-09-25 12:40 ` Jason Gunthorpe 2023-09-25 13:07 ` Robin Murphy 2023-09-25 13:29 ` Jason Gunthorpe 2023-09-25 17:23 ` Robin Murphy 2023-09-18 11:51 ` [PATCH v2 2/2] iommu/virtio: Add ops->flush_iotlb_all and enable deferred flush Niklas Schnelle 2023-09-18 15:59 ` Jean-Philippe Brucker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox