* [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-10-13 15:43 ` Jason Gunthorpe
2023-09-23 1:24 ` [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core Joao Martins
` (19 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
In preparation to move iova_bitmap into iommufd, export the rest of API
symbols that will be used in what could be used by modules, namely:
iova_bitmap_alloc
iova_bitmap_free
iova_bitmap_for_each
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/vfio/iova_bitmap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/vfio/iova_bitmap.c b/drivers/vfio/iova_bitmap.c
index 0848f920efb7..f54b56388e00 100644
--- a/drivers/vfio/iova_bitmap.c
+++ b/drivers/vfio/iova_bitmap.c
@@ -268,6 +268,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
iova_bitmap_free(bitmap);
return ERR_PTR(rc);
}
+EXPORT_SYMBOL_GPL(iova_bitmap_alloc);
/**
* iova_bitmap_free() - Frees an IOVA bitmap object
@@ -289,6 +290,7 @@ void iova_bitmap_free(struct iova_bitmap *bitmap)
kfree(bitmap);
}
+EXPORT_SYMBOL_GPL(iova_bitmap_free);
/*
* Returns the remaining bitmap indexes from mapped_total_index to process for
@@ -387,6 +389,7 @@ int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque,
return ret;
}
+EXPORT_SYMBOL_GPL(iova_bitmap_for_each);
/**
* iova_bitmap_set() - Records an IOVA range in bitmap
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols
2023-09-23 1:24 ` [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols Joao Martins
@ 2023-10-13 15:43 ` Jason Gunthorpe
2023-10-13 15:57 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 15:43 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:53AM +0100, Joao Martins wrote:
> In preparation to move iova_bitmap into iommufd, export the rest of API
> symbols that will be used in what could be used by modules, namely:
>
> iova_bitmap_alloc
> iova_bitmap_free
> iova_bitmap_for_each
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/vfio/iova_bitmap.c | 3 +++
> 1 file changed, 3 insertions(+)
All iommufd symbols should be exported more like:
drivers/iommu/iommufd/device.c:EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
Including these. So please fix them all here too
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols
2023-10-13 15:43 ` Jason Gunthorpe
@ 2023-10-13 15:57 ` Joao Martins
2023-10-13 16:03 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 15:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 16:43, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:53AM +0100, Joao Martins wrote:
>> In preparation to move iova_bitmap into iommufd, export the rest of API
>> symbols that will be used in what could be used by modules, namely:
>>
>> iova_bitmap_alloc
>> iova_bitmap_free
>> iova_bitmap_for_each
>>
>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/vfio/iova_bitmap.c | 3 +++
>> 1 file changed, 3 insertions(+)
>
> All iommufd symbols should be exported more like:
>
> drivers/iommu/iommufd/device.c:EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
>
> Including these. So please fix them all here too
OK, Provided your comment on the next patch to move this into IOMMUFD.
The IOMMU core didn't exported symbols that way, so I adhered to the style in-use.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols
2023-10-13 15:57 ` Joao Martins
@ 2023-10-13 16:03 ` Jason Gunthorpe
2023-10-13 16:22 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:03 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 04:57:53PM +0100, Joao Martins wrote:
> On 13/10/2023 16:43, Jason Gunthorpe wrote:
> > On Sat, Sep 23, 2023 at 02:24:53AM +0100, Joao Martins wrote:
> >> In preparation to move iova_bitmap into iommufd, export the rest of API
> >> symbols that will be used in what could be used by modules, namely:
> >>
> >> iova_bitmap_alloc
> >> iova_bitmap_free
> >> iova_bitmap_for_each
> >>
> >> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> >> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> >> ---
> >> drivers/vfio/iova_bitmap.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >
> > All iommufd symbols should be exported more like:
> >
> > drivers/iommu/iommufd/device.c:EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
> >
> > Including these. So please fix them all here too
>
> OK, Provided your comment on the next patch to move this into IOMMUFD.
>
> The IOMMU core didn't exported symbols that way, so I adhered to the style in-use.
Well, this commit message says "move iova_bitmap into iommufd" :)
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols
2023-10-13 16:03 ` Jason Gunthorpe
@ 2023-10-13 16:22 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:22 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:03, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 04:57:53PM +0100, Joao Martins wrote:
>> On 13/10/2023 16:43, Jason Gunthorpe wrote:
>>> On Sat, Sep 23, 2023 at 02:24:53AM +0100, Joao Martins wrote:
>>>> In preparation to move iova_bitmap into iommufd, export the rest of API
>>>> symbols that will be used in what could be used by modules, namely:
>>>>
>>>> iova_bitmap_alloc
>>>> iova_bitmap_free
>>>> iova_bitmap_for_each
>>>>
>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> drivers/vfio/iova_bitmap.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>
>>> All iommufd symbols should be exported more like:
>>>
>>> drivers/iommu/iommufd/device.c:EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
>>>
>>> Including these. So please fix them all here too
>>
>> OK, Provided your comment on the next patch to move this into IOMMUFD.
>>
>> The IOMMU core didn't exported symbols that way, so I adhered to the style in-use.
>
> Well, this commit message says "move iova_bitmap into iommufd" :)
:( It should have said 'into iommu core' just like the next patch which says
"vfio: Move iova_bitmap into iommu core". let me fix that, but well no point now
if the idea is to actually move to iommufd
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
2023-09-23 1:24 ` [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-10-13 15:48 ` Jason Gunthorpe
2023-09-23 1:24 ` [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking Joao Martins
` (18 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
can't exactly host it given that VFIO dirty tracking can be used without
IOMMUFD.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/Makefile | 1 +
drivers/{vfio => iommu}/iova_bitmap.c | 0
drivers/vfio/Makefile | 3 +--
3 files changed, 2 insertions(+), 2 deletions(-)
rename drivers/{vfio => iommu}/iova_bitmap.c (100%)
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 769e43d780ce..9d9dfbd2dfc2 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) += io-pgtable-dart.o
obj-$(CONFIG_IOMMU_IOVA) += iova.o
+obj-$(CONFIG_IOMMU_IOVA) += iova_bitmap.o
obj-$(CONFIG_OF_IOMMU) += of_iommu.o
obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iova_bitmap.c
similarity index 100%
rename from drivers/vfio/iova_bitmap.c
rename to drivers/iommu/iova_bitmap.c
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index c82ea032d352..68c05705200f 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,8 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_VFIO) += vfio.o
-vfio-y += vfio_main.o \
- iova_bitmap.o
+vfio-y += vfio_main.o
vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
vfio-$(CONFIG_VFIO_GROUP) += group.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-09-23 1:24 ` [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core Joao Martins
@ 2023-10-13 15:48 ` Jason Gunthorpe
2023-10-13 16:00 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 15:48 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> can't exactly host it given that VFIO dirty tracking can be used without
> IOMMUFD.
Hum, this seems strange. Why not just make those VFIO drivers depends
on iommufd? That seems harmless to me.
However, I think the real issue is that iommu drivers need to use this
API too for their part?
IMHO would I would like to get to is a part of iommufd that used by
iommu drivers (and thus built-in) and the current part that is
modular.
Basically, I think you should put this in the iommufd directory. Make
the vfio side kconfig depend on iommufd at this point
Later when the iommu drivers need it make some
CONFIG_IOMMUFD_DRIVER_SUPPORT to build another module (that will be
built in) and make the drivers that need it select it so it becomes
built in.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 15:48 ` Jason Gunthorpe
@ 2023-10-13 16:00 ` Joao Martins
2023-10-13 16:04 ` Jason Gunthorpe
2023-10-13 17:10 ` Joao Martins
0 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:00 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 16:48, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
>> can't exactly host it given that VFIO dirty tracking can be used without
>> IOMMUFD.
>
> Hum, this seems strange. Why not just make those VFIO drivers depends
> on iommufd? That seems harmless to me.
>
IF you and Alex are OK with it then I can move to IOMMUFD.
> However, I think the real issue is that iommu drivers need to use this
> API too for their part?
>
Exactly.
> IMHO would I would like to get to is a part of iommufd that used by
> iommu drivers (and thus built-in) and the current part that is
> modular.
>
> Basically, I think you should put this in the iommufd directory. Make
> the vfio side kconfig depend on iommufd at this point
>
> Later when the iommu drivers need it make some
> CONFIG_IOMMUFD_DRIVER_SUPPORT to build another module (that will be
> built in) and make the drivers that need it select it so it becomes
> built in.
That's a good idea; you want me to do this (CONFIG_IOMMUFD_DRIVER_SUPPORT) in
the context of this series, or as a follow-up (assuming I make it depend on
iommufd as you suggested earlier) ?
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 16:00 ` Joao Martins
@ 2023-10-13 16:04 ` Jason Gunthorpe
2023-10-13 16:23 ` Joao Martins
2023-10-13 17:10 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:04 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 05:00:14PM +0100, Joao Martins wrote:
> > Later when the iommu drivers need it make some
> > CONFIG_IOMMUFD_DRIVER_SUPPORT to build another module (that will be
> > built in) and make the drivers that need it select it so it becomes
> > built in.
>
> That's a good idea; you want me to do this (CONFIG_IOMMUFD_DRIVER_SUPPORT) in
> the context of this series, or as a follow-up (assuming I make it depend on
> iommufd as you suggested earlier) ?
I think it needs to be in this series, maybe as the next patch since
io pagetable code starts to use it?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 16:04 ` Jason Gunthorpe
@ 2023-10-13 16:23 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:04, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 05:00:14PM +0100, Joao Martins wrote:
>
>>> Later when the iommu drivers need it make some
>>> CONFIG_IOMMUFD_DRIVER_SUPPORT to build another module (that will be
>>> built in) and make the drivers that need it select it so it becomes
>>> built in.
>>
>> That's a good idea; you want me to do this (CONFIG_IOMMUFD_DRIVER_SUPPORT) in
>> the context of this series, or as a follow-up (assuming I make it depend on
>> iommufd as you suggested earlier) ?
>
> I think it needs to be in this series, maybe as the next patch since
> io pagetable code starts to use it?
OK
I'll see how it looks like
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 16:00 ` Joao Martins
2023-10-13 16:04 ` Jason Gunthorpe
@ 2023-10-13 17:10 ` Joao Martins
2023-10-13 17:16 ` Jason Gunthorpe
1 sibling, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 17:10 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:00, Joao Martins wrote:
> On 13/10/2023 16:48, Jason Gunthorpe wrote:
>> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
>>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
>>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
>>> can't exactly host it given that VFIO dirty tracking can be used without
>>> IOMMUFD.
>>
>> Hum, this seems strange. Why not just make those VFIO drivers depends
>> on iommufd? That seems harmless to me.
>>
>
> IF you and Alex are OK with it then I can move to IOMMUFD.
>
>> However, I think the real issue is that iommu drivers need to use this
>> API too for their part?
>>
>
> Exactly.
>
My other concern into moving to IOMMUFD instead of core was VFIO_IOMMU_TYPE1,
and if we always make it depend on IOMMUFD then we can't have what is today
something supported because of VFIO_IOMMU_TYPE1 stuff with migration drivers
(i.e. vfio-iommu-type1 with the live migration stuff).
But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
essentially talking about:
# SPDX-License-Identifier: GPL-2.0-only
menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
depends on IOMMUFD || !IOMMUFD
select INTERVAL_TREE
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
If you don't know what to do here, say N.
... and the fact that VFIO_IOMMU_TYPE1 requires VFIO_GROUP:
config VFIO_CONTAINER
bool "Support for the VFIO container /dev/vfio/vfio"
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
depends on VFIO_GROUP
default y
help
The VFIO container is the classic interface to VFIO for establishing
IOMMU mappings. If N is selected here then IOMMUFD must be used to
manage the mappings.
Unless testing IOMMUFD say Y here.
if VFIO_CONTAINER
config VFIO_IOMMU_TYPE1
tristate
default n
[...]
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 17:10 ` Joao Martins
@ 2023-10-13 17:16 ` Jason Gunthorpe
2023-10-13 17:23 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 17:16 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
> On 13/10/2023 17:00, Joao Martins wrote:
> > On 13/10/2023 16:48, Jason Gunthorpe wrote:
> >> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
> >>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> >>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> >>> can't exactly host it given that VFIO dirty tracking can be used without
> >>> IOMMUFD.
> >>
> >> Hum, this seems strange. Why not just make those VFIO drivers depends
> >> on iommufd? That seems harmless to me.
> >>
> >
> > IF you and Alex are OK with it then I can move to IOMMUFD.
> >
> >> However, I think the real issue is that iommu drivers need to use this
> >> API too for their part?
> >>
> >
> > Exactly.
> >
>
> My other concern into moving to IOMMUFD instead of core was VFIO_IOMMU_TYPE1,
> and if we always make it depend on IOMMUFD then we can't have what is today
> something supported because of VFIO_IOMMU_TYPE1 stuff with migration drivers
> (i.e. vfio-iommu-type1 with the live migration stuff).
I plan to remove the live migration stuff from vfio-iommu-type1, it is
all dead code now.
> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
> essentially talking about:
Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
vfio_main.c:#include <linux/iova_bitmap.h>
vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
vfio_main.c: struct iova_bitmap *iter;
vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
vfio_main.c: ret = iova_bitmap_for_each(iter, device,
vfio_main.c: iova_bitmap_free(iter);
And in various vfio device drivers.
So the various drivers can select IOMMUFD_DRIVER
And the core code should just gain a
if (!IS_SUPPORTED(CONFIG_IOMMUFD_DRIVER))
return -EOPNOTSUPP
On the two functions grep found above so the compiler eliminates all
the symbols. No kconfig needed.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 17:16 ` Jason Gunthorpe
@ 2023-10-13 17:23 ` Joao Martins
2023-10-13 17:28 ` Jason Gunthorpe
2023-10-13 20:41 ` Alex Williamson
0 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 17:23 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 18:16, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
>> On 13/10/2023 17:00, Joao Martins wrote:
>>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
>>>> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
>>>>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
>>>>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
>>>>> can't exactly host it given that VFIO dirty tracking can be used without
>>>>> IOMMUFD.
>>>>
>>>> Hum, this seems strange. Why not just make those VFIO drivers depends
>>>> on iommufd? That seems harmless to me.
>>>>
>>>
>>> IF you and Alex are OK with it then I can move to IOMMUFD.
>>>
>>>> However, I think the real issue is that iommu drivers need to use this
>>>> API too for their part?
>>>>
>>>
>>> Exactly.
>>>
>>
>> My other concern into moving to IOMMUFD instead of core was VFIO_IOMMU_TYPE1,
>> and if we always make it depend on IOMMUFD then we can't have what is today
>> something supported because of VFIO_IOMMU_TYPE1 stuff with migration drivers
>> (i.e. vfio-iommu-type1 with the live migration stuff).
>
> I plan to remove the live migration stuff from vfio-iommu-type1, it is
> all dead code now.
>
I wasn't referring to the type1 dirty tracking stuff -- I was referring the
stuff related to vfio devices, used *together* with type1 (for DMA map/unmap).
>> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
>> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
>> essentially talking about:
>
> Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
>
> vfio_main.c:#include <linux/iova_bitmap.h>
> vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
> vfio_main.c: struct iova_bitmap *iter;
> vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
> vfio_main.c: ret = iova_bitmap_for_each(iter, device,
> vfio_main.c: iova_bitmap_free(iter);
>
> And in various vfio device drivers.
>
> So the various drivers can select IOMMUFD_DRIVER
>
It isn't so much that type1 requires IOMMUFD, but more that it is used together
with the core code that allows the vfio drivers to do migration. So the concern
is if we make VFIO core depend on IOMMU that we prevent
VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
> And the core code should just gain a
>
> if (!IS_SUPPORTED(CONFIG_IOMMUFD_DRIVER))
> return -EOPNOTSUPP
>
> On the two functions grep found above so the compiler eliminates all
> the symbols. No kconfig needed.
>
> Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 17:23 ` Joao Martins
@ 2023-10-13 17:28 ` Jason Gunthorpe
2023-10-13 17:32 ` Joao Martins
2023-10-13 20:41 ` Alex Williamson
1 sibling, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 17:28 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 06:23:09PM +0100, Joao Martins wrote:
>
>
> On 13/10/2023 18:16, Jason Gunthorpe wrote:
> > On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
> >> On 13/10/2023 17:00, Joao Martins wrote:
> >>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
> >>>> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
> >>>>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> >>>>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> >>>>> can't exactly host it given that VFIO dirty tracking can be used without
> >>>>> IOMMUFD.
> >>>>
> >>>> Hum, this seems strange. Why not just make those VFIO drivers depends
> >>>> on iommufd? That seems harmless to me.
> >>>>
> >>>
> >>> IF you and Alex are OK with it then I can move to IOMMUFD.
> >>>
> >>>> However, I think the real issue is that iommu drivers need to use this
> >>>> API too for their part?
> >>>>
> >>>
> >>> Exactly.
> >>>
> >>
> >> My other concern into moving to IOMMUFD instead of core was VFIO_IOMMU_TYPE1,
> >> and if we always make it depend on IOMMUFD then we can't have what is today
> >> something supported because of VFIO_IOMMU_TYPE1 stuff with migration drivers
> >> (i.e. vfio-iommu-type1 with the live migration stuff).
> >
> > I plan to remove the live migration stuff from vfio-iommu-type1, it is
> > all dead code now.
> >
>
> I wasn't referring to the type1 dirty tracking stuff -- I was referring the
> stuff related to vfio devices, used *together* with type1 (for DMA
> map/unmap).
Ah, well, I guess that is true
> >> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
> >> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
> >> essentially talking about:
> >
> > Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
> >
> > vfio_main.c:#include <linux/iova_bitmap.h>
> > vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
> > vfio_main.c: struct iova_bitmap *iter;
> > vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
> > vfio_main.c: ret = iova_bitmap_for_each(iter, device,
> > vfio_main.c: iova_bitmap_free(iter);
> >
> > And in various vfio device drivers.
> >
> > So the various drivers can select IOMMUFD_DRIVER
> >
>
> It isn't so much that type1 requires IOMMUFD, but more that it is used together
> with the core code that allows the vfio drivers to do migration. So the concern
> is if we make VFIO core depend on IOMMU that we prevent
> VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
> select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
Doing it as I said is still the right thing.
If someone has turned on one of the drivers that actually implements
dirty tracking it will turn on IOMMUFD_DRIVER and that will cause the
supporting core code to compile in the support functions.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 17:28 ` Jason Gunthorpe
@ 2023-10-13 17:32 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 17:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 18:28, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 06:23:09PM +0100, Joao Martins wrote:
>> On 13/10/2023 18:16, Jason Gunthorpe wrote:
>>> On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
>>>> On 13/10/2023 17:00, Joao Martins wrote:
>>>>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
>>>>>> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
>>>> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
>>>> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
>>>> essentially talking about:
>>>
>>> Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
>>>
>>> vfio_main.c:#include <linux/iova_bitmap.h>
>>> vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
>>> vfio_main.c: struct iova_bitmap *iter;
>>> vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
>>> vfio_main.c: ret = iova_bitmap_for_each(iter, device,
>>> vfio_main.c: iova_bitmap_free(iter);
>>>
>>> And in various vfio device drivers.
>>>
>>> So the various drivers can select IOMMUFD_DRIVER
>>>
>>
>> It isn't so much that type1 requires IOMMUFD, but more that it is used together
>> with the core code that allows the vfio drivers to do migration. So the concern
>> is if we make VFIO core depend on IOMMU that we prevent
>> VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
>> select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
>
> Doing it as I said is still the right thing.
>
> If someone has turned on one of the drivers that actually implements
> dirty tracking it will turn on IOMMUFD_DRIVER and that will cause the
> supporting core code to compile in the support functions.
>
Yeap. And as long as CONFIG_IOMMUFD_DRIVER does not select/enable CONFIG_IOMMUFD
then we should be fine
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 17:23 ` Joao Martins
2023-10-13 17:28 ` Jason Gunthorpe
@ 2023-10-13 20:41 ` Alex Williamson
2023-10-13 21:20 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Alex Williamson @ 2023-10-13 20:41 UTC (permalink / raw)
To: Joao Martins
Cc: Jason Gunthorpe, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Fri, 13 Oct 2023 18:23:09 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:
> On 13/10/2023 18:16, Jason Gunthorpe wrote:
> > On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
> >> On 13/10/2023 17:00, Joao Martins wrote:
> >>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
> >>>> On Sat, Sep 23, 2023 at 02:24:54AM +0100, Joao Martins wrote:
> >>>>> Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking
> >>>>> the user bitmaps, so move to the common dependency into IOMMU core. IOMMUFD
> >>>>> can't exactly host it given that VFIO dirty tracking can be used without
> >>>>> IOMMUFD.
> >>>>
> >>>> Hum, this seems strange. Why not just make those VFIO drivers depends
> >>>> on iommufd? That seems harmless to me.
> >>>>
> >>>
> >>> IF you and Alex are OK with it then I can move to IOMMUFD.
It's only strange in that we don't actually have a hard dependency on
IOMMUFD currently and won't until we remove container support, which is
some ways down the road. Ultimately we expect to get to the same
place, so I don't have a particular issue with it.
> >>>> However, I think the real issue is that iommu drivers need to use this
> >>>> API too for their part?
> >>>>
> >>>
> >>> Exactly.
> >>>
> >>
> >> My other concern into moving to IOMMUFD instead of core was VFIO_IOMMU_TYPE1,
> >> and if we always make it depend on IOMMUFD then we can't have what is today
> >> something supported because of VFIO_IOMMU_TYPE1 stuff with migration drivers
> >> (i.e. vfio-iommu-type1 with the live migration stuff).
> >
> > I plan to remove the live migration stuff from vfio-iommu-type1, it is
> > all dead code now.
> >
>
> I wasn't referring to the type1 dirty tracking stuff -- I was referring the
> stuff related to vfio devices, used *together* with type1 (for DMA map/unmap).
>
> >> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
> >> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
> >> essentially talking about:
> >
> > Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
> >
> > vfio_main.c:#include <linux/iova_bitmap.h>
> > vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
> > vfio_main.c: struct iova_bitmap *iter;
> > vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
> > vfio_main.c: ret = iova_bitmap_for_each(iter, device,
> > vfio_main.c: iova_bitmap_free(iter);
> >
> > And in various vfio device drivers.
> >
> > So the various drivers can select IOMMUFD_DRIVER
> >
>
> It isn't so much that type1 requires IOMMUFD, but more that it is used together
> with the core code that allows the vfio drivers to do migration. So the concern
> is if we make VFIO core depend on IOMMU that we prevent
> VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
> select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
That's not true. We can have both. In fact we rely on having both to
support a smooth transition to the cdev interface. Thanks,
Alex
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 20:41 ` Alex Williamson
@ 2023-10-13 21:20 ` Joao Martins
2023-10-13 21:51 ` Alex Williamson
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 21:20 UTC (permalink / raw)
To: Alex Williamson
Cc: Jason Gunthorpe, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 13/10/2023 21:41, Alex Williamson wrote:
> On Fri, 13 Oct 2023 18:23:09 +0100
> Joao Martins <joao.m.martins@oracle.com> wrote:
>
>> On 13/10/2023 18:16, Jason Gunthorpe wrote:
>>> On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
>>>> On 13/10/2023 17:00, Joao Martins wrote:
>>>>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
>>>> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
>>>> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
>>>> essentially talking about:
>>>
>>> Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
>>>
>>> vfio_main.c:#include <linux/iova_bitmap.h>
>>> vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
>>> vfio_main.c: struct iova_bitmap *iter;
>>> vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
>>> vfio_main.c: ret = iova_bitmap_for_each(iter, device,
>>> vfio_main.c: iova_bitmap_free(iter);
>>>
>>> And in various vfio device drivers.
>>>
>>> So the various drivers can select IOMMUFD_DRIVER
>>>
>>
>> It isn't so much that type1 requires IOMMUFD, but more that it is used together
>> with the core code that allows the vfio drivers to do migration. So the concern
>> is if we make VFIO core depend on IOMMU that we prevent
>> VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
>> select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
>
> That's not true. We can have both. In fact we rely on having both to
> support a smooth transition to the cdev interface. Thanks,
On a triple look, mixed defaults[0] vs manual config: having IOMMUFD=y|m today
it won't select VFIO_CONTAINER, nobody stops one from actually selecting it
both. Unless I missed something
[0] Ref:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/vfio/Kconfig
menuconfig VFIO
[...]
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
[...]
if VFIO
config VFIO_DEVICE_CDEV
[...]
depends on IOMMUFD && !SPAPR_TCE_IOMMU
default !VFIO_GROUP
[...]
config VFIO_GROUP
default y
[...]
config VFIO_CONTAINER
[...]
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
depends on VFIO_GROUP
default y
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 21:20 ` Joao Martins
@ 2023-10-13 21:51 ` Alex Williamson
2023-10-14 0:02 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Alex Williamson @ 2023-10-13 21:51 UTC (permalink / raw)
To: Joao Martins
Cc: Jason Gunthorpe, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Fri, 13 Oct 2023 22:20:31 +0100
Joao Martins <joao.m.martins@oracle.com> wrote:
> On 13/10/2023 21:41, Alex Williamson wrote:
> > On Fri, 13 Oct 2023 18:23:09 +0100
> > Joao Martins <joao.m.martins@oracle.com> wrote:
> >
> >> On 13/10/2023 18:16, Jason Gunthorpe wrote:
> >>> On Fri, Oct 13, 2023 at 06:10:04PM +0100, Joao Martins wrote:
> >>>> On 13/10/2023 17:00, Joao Martins wrote:
> >>>>> On 13/10/2023 16:48, Jason Gunthorpe wrote:
> >>>> But if it's exists an IOMMUFD_DRIVER kconfig, then VFIO_CONTAINER can instead
> >>>> select the IOMMUFD_DRIVER alone so long as CONFIG_IOMMUFD isn't required? I am
> >>>> essentially talking about:
> >>>
> >>> Not VFIO_CONTAINER, the dirty tracking code is in vfio_main:
> >>>
> >>> vfio_main.c:#include <linux/iova_bitmap.h>
> >>> vfio_main.c:static int vfio_device_log_read_and_clear(struct iova_bitmap *iter,
> >>> vfio_main.c: struct iova_bitmap *iter;
> >>> vfio_main.c: iter = iova_bitmap_alloc(report.iova, report.length,
> >>> vfio_main.c: ret = iova_bitmap_for_each(iter, device,
> >>> vfio_main.c: iova_bitmap_free(iter);
> >>>
> >>> And in various vfio device drivers.
> >>>
> >>> So the various drivers can select IOMMUFD_DRIVER
> >>>
> >>
> >> It isn't so much that type1 requires IOMMUFD, but more that it is used together
> >> with the core code that allows the vfio drivers to do migration. So the concern
> >> is if we make VFIO core depend on IOMMU that we prevent
> >> VFIO_CONTAINER/VFIO_GROUP to not be selected. My kconfig read was that we either
> >> select VFIO_GROUP or VFIO_DEVICE_CDEV but not both
> >
> > That's not true. We can have both. In fact we rely on having both to
> > support a smooth transition to the cdev interface. Thanks,
>
> On a triple look, mixed defaults[0] vs manual config: having IOMMUFD=y|m today
> it won't select VFIO_CONTAINER, nobody stops one from actually selecting it
> both. Unless I missed something
Oh! I misunderstood your comment, you're referring to default
selections rather than possible selections. So yes, if VFIO depends on
IOMMUFD then suddenly our default configs shift to IOMMUFD/CDEV rather
than legacy CONTAINER/GROUP. So perhaps if VFIO selects IOMMUFD, that's
not exactly harmless currently.
I think Jason is describing this would eventually be in a built-in
portion of IOMMUFD, but I think currently that built-in portion is
IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
to get this iova bitmap support. Thanks,
Alex
> [0] Ref:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/vfio/Kconfig
>
> menuconfig VFIO
> [...]
> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> select VFIO_DEVICE_CDEV if !VFIO_GROUP
> select VFIO_CONTAINER if IOMMUFD=n
> [...]
>
> if VFIO
> config VFIO_DEVICE_CDEV
> [...]
> depends on IOMMUFD && !SPAPR_TCE_IOMMU
> default !VFIO_GROUP
> [...]
> config VFIO_GROUP
> default y
> [...]
> config VFIO_CONTAINER
> [...]
> select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> depends on VFIO_GROUP
> default y
>
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-13 21:51 ` Alex Williamson
@ 2023-10-14 0:02 ` Jason Gunthorpe
2023-10-16 16:25 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-14 0:02 UTC (permalink / raw)
To: Alex Williamson
Cc: Joao Martins, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Fri, Oct 13, 2023 at 03:51:34PM -0600, Alex Williamson wrote:
> I think Jason is describing this would eventually be in a built-in
> portion of IOMMUFD, but I think currently that built-in portion is
> IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
> portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
> to get this iova bitmap support. Thanks,
Right, I'm saying Joao may as well make IOMMUFD_DRIVER right now for
this
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-14 0:02 ` Jason Gunthorpe
@ 2023-10-16 16:25 ` Joao Martins
2023-10-16 16:34 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 16:25 UTC (permalink / raw)
To: Jason Gunthorpe, Alex Williamson
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, kvm
On 14/10/2023 01:02, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 03:51:34PM -0600, Alex Williamson wrote:
>
>> I think Jason is describing this would eventually be in a built-in
>> portion of IOMMUFD, but I think currently that built-in portion is
>> IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
>> portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
>> to get this iova bitmap support. Thanks,
>
> Right, I'm saying Joao may as well make IOMMUFD_DRIVER right now for
> this
So far I have this snip at the end.
Though given that there are struct iommu_domain changes that set a dirty_ops
(which require iova-bitmap). Do we just ifdef around IOMMUFD_DRIVER or we always
include it if CONFIG_IOMMU_API=y ? Thus far I'm going towards the latter
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 99d4b075df49..96ec013d1192 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -11,6 +11,13 @@ config IOMMUFD
If you don't know what to do here, say N.
+config IOMMUFD_DRIVER
+ bool "IOMMUFD provides iommu drivers supporting functions"
+ default IOMMU_API
+ help
+ IOMMUFD will provides supporting data structures and helpers to IOMMU
+ drivers.
+
if IOMMUFD
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 8aeba81800c5..34b446146961 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -11,3 +11,4 @@ iommufd-y := \
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
+obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
similarity index 100%
rename from drivers/vfio/iova_bitmap.c
rename to drivers/iommu/iommufd/iova_bitmap.c
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 6bda6dbb4878..1db519cce815 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -7,6 +7,7 @@ menuconfig VFIO
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
+ select IOMMUFD_DRIVER
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index c82ea032d352..68c05705200f 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,8 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_VFIO) += vfio.o
-vfio-y += vfio_main.o \
- iova_bitmap.o
+vfio-y += vfio_main.o
vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
vfio-$(CONFIG_VFIO_GROUP) += group.o
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 16:25 ` Joao Martins
@ 2023-10-16 16:34 ` Jason Gunthorpe
2023-10-16 17:52 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 16:34 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
> >> I think Jason is describing this would eventually be in a built-in
> >> portion of IOMMUFD, but I think currently that built-in portion is
> >> IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
> >> portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
> >> to get this iova bitmap support. Thanks,
> >
> > Right, I'm saying Joao may as well make IOMMUFD_DRIVER right now for
> > this
>
> So far I have this snip at the end.
>
> Though given that there are struct iommu_domain changes that set a dirty_ops
> (which require iova-bitmap).
Drivers which set those ops need to select IOMMUFD_DRIVER..
Perhaps (at least for ARM) they should even be coded
select IOMMUFD_DRIVER if IOMMUFD
And then #ifdef out the dirty tracking bits so embedded systems don't
get the bloat with !IOMMUFD
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> index 99d4b075df49..96ec013d1192 100644
> --- a/drivers/iommu/iommufd/Kconfig
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -11,6 +11,13 @@ config IOMMUFD
>
> If you don't know what to do here, say N.
>
> +config IOMMUFD_DRIVER
> + bool "IOMMUFD provides iommu drivers supporting functions"
> + default IOMMU_API
> + help
> + IOMMUFD will provides supporting data structures and helpers to IOMMU
> + drivers.
It is not a 'user selectable' kconfig, just make it
config IOMMUFD_DRIVER
tristate
default n
ie the only way to get it is to build a driver that will consume it.
> diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
> index 8aeba81800c5..34b446146961 100644
> --- a/drivers/iommu/iommufd/Makefile
> +++ b/drivers/iommu/iommufd/Makefile
> @@ -11,3 +11,4 @@ iommufd-y := \
> iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
>
> obj-$(CONFIG_IOMMUFD) += iommufd.o
> +obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
Right..
> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
> similarity index 100%
> rename from drivers/vfio/iova_bitmap.c
> rename to drivers/iommu/iommufd/iova_bitmap.c
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 6bda6dbb4878..1db519cce815 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -7,6 +7,7 @@ menuconfig VFIO
> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> select VFIO_DEVICE_CDEV if !VFIO_GROUP
> select VFIO_CONTAINER if IOMMUFD=n
> + select IOMMUFD_DRIVER
As discussed use a if (IS_ENABLED) here and just disable the
bitmap code if something else didn't enable it.
VFIO isn't a consumer of it
The question you are asking is on the driver side implementing it, and
it should be conditional if IOMMUFD is turned on.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 16:34 ` Jason Gunthorpe
@ 2023-10-16 17:52 ` Joao Martins
2023-10-16 18:05 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 17:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 16/10/2023 17:34, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
>>>> I think Jason is describing this would eventually be in a built-in
>>>> portion of IOMMUFD, but I think currently that built-in portion is
>>>> IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
>>>> portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
>>>> to get this iova bitmap support. Thanks,
>>>
>>> Right, I'm saying Joao may as well make IOMMUFD_DRIVER right now for
>>> this
>>
>> So far I have this snip at the end.
>>
>> Though given that there are struct iommu_domain changes that set a dirty_ops
>> (which require iova-bitmap).
>
> Drivers which set those ops need to select IOMMUFD_DRIVER..
>
My problem is more of the generic/vfio side (headers and structures of iommu
core) not really IOMMU driver nor IOMMUFD.
> Perhaps (at least for ARM) they should even be coded
>
> select IOMMUFD_DRIVER if IOMMUFD
>
> And then #ifdef out the dirty tracking bits so embedded systems don't
> get the bloat with !IOMMUFD
>
Right
>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>> index 99d4b075df49..96ec013d1192 100644
>> --- a/drivers/iommu/iommufd/Kconfig
>> +++ b/drivers/iommu/iommufd/Kconfig
>> @@ -11,6 +11,13 @@ config IOMMUFD
>>
>> If you don't know what to do here, say N.
>>
>> +config IOMMUFD_DRIVER
>> + bool "IOMMUFD provides iommu drivers supporting functions"
>> + default IOMMU_API
>> + help
>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
>> + drivers.
>
> It is not a 'user selectable' kconfig, just make it
>
> config IOMMUFD_DRIVER
> tristate
> default n
>
tristate? More like a bool as IOMMU drivers aren't modloadable
> ie the only way to get it is to build a driver that will consume it.
>
>> diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
>> index 8aeba81800c5..34b446146961 100644
>> --- a/drivers/iommu/iommufd/Makefile
>> +++ b/drivers/iommu/iommufd/Makefile
>> @@ -11,3 +11,4 @@ iommufd-y := \
>> iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
>>
>> obj-$(CONFIG_IOMMUFD) += iommufd.o
>> +obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
>
> Right..
>
>> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
>> similarity index 100%
>> rename from drivers/vfio/iova_bitmap.c
>> rename to drivers/iommu/iommufd/iova_bitmap.c
>> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
>> index 6bda6dbb4878..1db519cce815 100644
>> --- a/drivers/vfio/Kconfig
>> +++ b/drivers/vfio/Kconfig
>> @@ -7,6 +7,7 @@ menuconfig VFIO
>> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
>> select VFIO_DEVICE_CDEV if !VFIO_GROUP
>> select VFIO_CONTAINER if IOMMUFD=n
>> + select IOMMUFD_DRIVER
>
> As discussed use a if (IS_ENABLED) here and just disable the
> bitmap code if something else didn't enable it.
>
I'm adding this to vfio_main:
if (!IS_ENABLED(CONFIG_IOMMUFD_DRIVER))
return -EOPNOTSUPP;
> VFIO isn't a consumer of it
>
(...) The select IOMMUFD_DRIVER was there because of VFIO PCI vendor drivers not
VFIO core. for the 'disable bitmap code' I can add ifdef-ry in iova_bitmap.h to
add scalfold definitions to error-out/nop if CONFIG_IOMMUFD_DRIVER=n when moving
to iommufd/
> The question you are asking is on the driver side implementing it, and
> it should be conditional if IOMMUFD is turned on.
For the IOMMU driver side sure IOMMUFD should be required as it's the sole entry
point of this
But as discussed earlier IOMMUFD=m|y dependency really only matters for IOMMU
dirty tracking, but not for VFIO device dirty tracking.
For your suggested scheme to work VFIO PCI drivers still need to select
IOMMUFD_DRIVER as they require it
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 17:52 ` Joao Martins
@ 2023-10-16 18:05 ` Jason Gunthorpe
2023-10-16 18:15 ` Joao Martins
2023-10-18 10:19 ` Joao Martins
0 siblings, 2 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 18:05 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
> On 16/10/2023 17:34, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
> >>>> I think Jason is describing this would eventually be in a built-in
> >>>> portion of IOMMUFD, but I think currently that built-in portion is
> >>>> IOMMU. So until we have this IOMMUFD_DRIVER that enables that built-in
> >>>> portion, it seems unnecessarily disruptive to make VFIO select IOMMUFD
> >>>> to get this iova bitmap support. Thanks,
> >>>
> >>> Right, I'm saying Joao may as well make IOMMUFD_DRIVER right now for
> >>> this
> >>
> >> So far I have this snip at the end.
> >>
> >> Though given that there are struct iommu_domain changes that set a dirty_ops
> >> (which require iova-bitmap).
> >
> > Drivers which set those ops need to select IOMMUFD_DRIVER..
> >
>
> My problem is more of the generic/vfio side (headers and structures of iommu
> core) not really IOMMU driver nor IOMMUFD.
As I said, just don't compile that stuff. If nothing else selects
IOMMFD_DRIVER then the core code has nothing to do.
> >> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> >> index 99d4b075df49..96ec013d1192 100644
> >> --- a/drivers/iommu/iommufd/Kconfig
> >> +++ b/drivers/iommu/iommufd/Kconfig
> >> @@ -11,6 +11,13 @@ config IOMMUFD
> >>
> >> If you don't know what to do here, say N.
> >>
> >> +config IOMMUFD_DRIVER
> >> + bool "IOMMUFD provides iommu drivers supporting functions"
> >> + default IOMMU_API
> >> + help
> >> + IOMMUFD will provides supporting data structures and helpers to IOMMU
> >> + drivers.
> >
> > It is not a 'user selectable' kconfig, just make it
> >
> > config IOMMUFD_DRIVER
> > tristate
> > default n
> >
> tristate? More like a bool as IOMMU drivers aren't modloadable
tristate, who knows what people will select. If the modular drivers
use it then it is forced to a Y not a M. It is the right way to use kconfig..
> >> --- a/drivers/vfio/Kconfig
> >> +++ b/drivers/vfio/Kconfig
> >> @@ -7,6 +7,7 @@ menuconfig VFIO
> >> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> >> select VFIO_DEVICE_CDEV if !VFIO_GROUP
> >> select VFIO_CONTAINER if IOMMUFD=n
> >> + select IOMMUFD_DRIVER
> >
> > As discussed use a if (IS_ENABLED) here and just disable the
> > bitmap code if something else didn't enable it.
> >
>
> I'm adding this to vfio_main:
>
> if (!IS_ENABLED(CONFIG_IOMMUFD_DRIVER))
> return -EOPNOTSUPP;
Seems right
> > VFIO isn't a consumer of it
> >
>
> (...) The select IOMMUFD_DRIVER was there because of VFIO PCI vendor drivers not
> VFIO core.
Those driver should individually select IOMMUFD_DRIVER
for the 'disable bitmap code' I can add ifdef-ry in iova_bitmap.h to
> add scalfold definitions to error-out/nop if CONFIG_IOMMUFD_DRIVER=n when moving
> to iommufd/
Yes that could also be a good approach
> For your suggested scheme to work VFIO PCI drivers still need to select
> IOMMUFD_DRIVER as they require it
Yes, of course
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:05 ` Jason Gunthorpe
@ 2023-10-16 18:15 ` Joao Martins
2023-10-16 18:20 ` Jason Gunthorpe
2023-10-18 10:19 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 18:15 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 16/10/2023 19:05, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
>> On 16/10/2023 17:34, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
>>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>>>> index 99d4b075df49..96ec013d1192 100644
>>>> --- a/drivers/iommu/iommufd/Kconfig
>>>> +++ b/drivers/iommu/iommufd/Kconfig
>>>> @@ -11,6 +11,13 @@ config IOMMUFD
>>>>
>>>> If you don't know what to do here, say N.
>>>>
>>>> +config IOMMUFD_DRIVER
>>>> + bool "IOMMUFD provides iommu drivers supporting functions"
>>>> + default IOMMU_API
>>>> + help
>>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
>>>> + drivers.
>>>
>>> It is not a 'user selectable' kconfig, just make it
>>>
>>> config IOMMUFD_DRIVER
>>> tristate
>>> default n
>>>
>> tristate? More like a bool as IOMMU drivers aren't modloadable
>
> tristate, who knows what people will select. If the modular drivers
> use it then it is forced to a Y not a M. It is the right way to use kconfig..
>
Got it (and thanks for the patience)
>>>> --- a/drivers/vfio/Kconfig
>>>> +++ b/drivers/vfio/Kconfig
>>>> @@ -7,6 +7,7 @@ menuconfig VFIO
>>>> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
>>>> select VFIO_DEVICE_CDEV if !VFIO_GROUP
>>>> select VFIO_CONTAINER if IOMMUFD=n
>>>> + select IOMMUFD_DRIVER
>>>
>>> As discussed use a if (IS_ENABLED) here and just disable the
>>> bitmap code if something else didn't enable it.
>>>
>>
>> I'm adding this to vfio_main:
>>
>> if (!IS_ENABLED(CONFIG_IOMMUFD_DRIVER))
>> return -EOPNOTSUPP;
>
> Seems right
>
>>> VFIO isn't a consumer of it
>>>
>>
>> (...) The select IOMMUFD_DRIVER was there because of VFIO PCI vendor drivers not
>> VFIO core.
>
> Those driver should individually select IOMMUFD_DRIVER
>
OK -- this is the part that wasn't clear straight away.
So individually per driver not on VFIO_PCI_CORE in which these drivers depend
on? A lot of the dirty tracking stuff gets steered via what VFIO_PCI_CORE
allows, perhaps I can put the IOMMUFD_DRIVER selection there?
>> for the 'disable bitmap code' I can add ifdef-ry in iova_bitmap.h to
>> add scalfold definitions to error-out/nop if CONFIG_IOMMUFD_DRIVER=n when moving
>> to iommufd/
>
> Yes that could also be a good approach
>
>> For your suggested scheme to work VFIO PCI drivers still need to select
>> IOMMUFD_DRIVER as they require it
>
> Yes, of course
Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
well later in the series
diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 99d4b075df49..4c6cb96a4b28 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -11,6 +11,10 @@ config IOMMUFD
If you don't know what to do here, say N.
+config IOMMUFD_DRIVER
+ tristate
+ default n
+
if IOMMUFD
config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 8aeba81800c5..34b446146961 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -11,3 +11,4 @@ iommufd-y := \
iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
obj-$(CONFIG_IOMMUFD) += iommufd.o
+obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o
diff --git a/drivers/vfio/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
similarity index 100%
rename from drivers/vfio/iova_bitmap.c
rename to drivers/iommu/iommufd/iova_bitmap.c
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index c82ea032d352..68c05705200f 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,8 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_VFIO) += vfio.o
-vfio-y += vfio_main.o \
- iova_bitmap.o
+vfio-y += vfio_main.o
vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
vfio-$(CONFIG_VFIO_GROUP) += group.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
diff --git a/drivers/vfio/pci/mlx5/Kconfig b/drivers/vfio/pci/mlx5/Kconfig
index 7088edc4fb28..c3ced56b7787 100644
--- a/drivers/vfio/pci/mlx5/Kconfig
+++ b/drivers/vfio/pci/mlx5/Kconfig
@@ -3,6 +3,7 @@ config MLX5_VFIO_PCI
tristate "VFIO support for MLX5 PCI devices"
depends on MLX5_CORE
select VFIO_PCI_CORE
+ select IOMMUFD_DRIVER
help
This provides migration support for MLX5 devices using the VFIO
framework.
diff --git a/drivers/vfio/pci/pds/Kconfig b/drivers/vfio/pci/pds/Kconfig
index 407b3fd32733..fff368a8183b 100644
--- a/drivers/vfio/pci/pds/Kconfig
+++ b/drivers/vfio/pci/pds/Kconfig
@@ -5,6 +5,7 @@ config PDS_VFIO_PCI
tristate "VFIO support for PDS PCI devices"
depends on PDS_CORE
select VFIO_PCI_CORE
+ select IOMMUFD_DRIVER
help
This provides generic PCI support for PDS devices using the VFIO
framework.
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 40732e8ed4c6..93b0c2b377e1 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1095,6 +1095,9 @@ static int vfio_device_log_read_and_clear(struct
iova_bitmap *iter,
{
struct vfio_device *device = opaque;
+ if (!IS_ENABLED(CONFIG_IOMMUFD_DRIVER))
+ return -EOPNOTSUPP;
+
return device->log_ops->log_read_and_clear(device, iova, length, iter);
}
@@ -1111,6 +1114,9 @@ vfio_ioctl_device_feature_logging_report(struct
vfio_device *device,
u64 iova_end;
int ret;
+ if (!IS_ENABLED(CONFIG_IOMMUFD_DRIVER))
+ return -EOPNOTSUPP;
+
if (!device->log_ops)
return -ENOTTY;
diff --git a/include/linux/iova_bitmap.h b/include/linux/iova_bitmap.h
index c006cf0a25f3..1c338f5e5b7a 100644
--- a/include/linux/iova_bitmap.h
+++ b/include/linux/iova_bitmap.h
@@ -7,6 +7,7 @@
#define _IOVA_BITMAP_H_
#include <linux/types.h>
+#include <linux/errno.h>
struct iova_bitmap;
@@ -14,6 +15,7 @@ typedef int (*iova_bitmap_fn_t)(struct iova_bitmap *bitmap,
unsigned long iova, size_t length,
void *opaque);
+#if IS_ENABLED(CONFIG_IOMMUFD_DRIVER)
struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
unsigned long page_size,
u64 __user *data);
@@ -22,5 +24,29 @@ int iova_bitmap_for_each(struct iova_bitmap *bitmap, void
*opaque,
iova_bitmap_fn_t fn);
void iova_bitmap_set(struct iova_bitmap *bitmap,
unsigned long iova, size_t length);
+#else
+static inline struct iova_bitmap *iova_bitmap_alloc(unsigned long iova,
+ size_t length,
+ unsigned long page_size,
+ u64 __user *data)
+{
+ return NULL;
+}
+
+static inline void iova_bitmap_free(struct iova_bitmap *bitmap)
+{
+}
+
+static inline int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque,
+ iova_bitmap_fn_t fn)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void iova_bitmap_set(struct iova_bitmap *bitmap,
+ unsigned long iova, size_t length)
+{
+}
+#endif
#endif
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:15 ` Joao Martins
@ 2023-10-16 18:20 ` Jason Gunthorpe
2023-10-16 18:37 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 18:20 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
> well later in the series
It looks OK, the IS_ENABLES are probably overkill once you have
changed the .h file, just saves a few code bytes, not sure we care?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:20 ` Jason Gunthorpe
@ 2023-10-16 18:37 ` Joao Martins
2023-10-16 18:50 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 18:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 16/10/2023 19:20, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
>
>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
>> well later in the series
>
> It looks OK, the IS_ENABLES are probably overkill once you have
> changed the .h file, just saves a few code bytes, not sure we care?
I can remove them
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:37 ` Joao Martins
@ 2023-10-16 18:50 ` Joao Martins
2023-10-17 12:58 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 18:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 16/10/2023 19:37, Joao Martins wrote:
> On 16/10/2023 19:20, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
>>
>>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
>>> well later in the series
>>
>> It looks OK, the IS_ENABLES are probably overkill once you have
>> changed the .h file, just saves a few code bytes, not sure we care?
>
> I can remove them
Additionally, I don't think I can use the symbol namespace for IOMMUFD, as
iova-bitmap can be build builtin with a module iommufd, otherwise we get into
errors like this:
ERROR: modpost: module iommufd uses symbol iova_bitmap_for_each from namespace
IOMMUFD, but does not import it.
ERROR: modpost: module iommufd uses symbol iova_bitmap_free from namespace
IOMMUFD, but does not import it.
ERROR: modpost: module iommufd uses symbol iova_bitmap_alloc from namespace
IOMMUFD, but does not import it.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:50 ` Joao Martins
@ 2023-10-17 12:58 ` Jason Gunthorpe
2023-10-17 15:20 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 12:58 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Mon, Oct 16, 2023 at 07:50:25PM +0100, Joao Martins wrote:
> On 16/10/2023 19:37, Joao Martins wrote:
> > On 16/10/2023 19:20, Jason Gunthorpe wrote:
> >> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
> >>
> >>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
> >>> well later in the series
> >>
> >> It looks OK, the IS_ENABLES are probably overkill once you have
> >> changed the .h file, just saves a few code bytes, not sure we care?
> >
> > I can remove them
>
> Additionally, I don't think I can use the symbol namespace for IOMMUFD, as
> iova-bitmap can be build builtin with a module iommufd, otherwise we get into
> errors like this:
>
> ERROR: modpost: module iommufd uses symbol iova_bitmap_for_each from namespace
> IOMMUFD, but does not import it.
> ERROR: modpost: module iommufd uses symbol iova_bitmap_free from namespace
> IOMMUFD, but does not import it.
> ERROR: modpost: module iommufd uses symbol iova_bitmap_alloc from namespace
> IOMMUFD, but does not import it.
You cannot self-import the namespace? I'm not that familiar with this stuff
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-17 12:58 ` Jason Gunthorpe
@ 2023-10-17 15:20 ` Joao Martins
2023-10-17 15:23 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 15:20 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 17/10/2023 13:58, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 07:50:25PM +0100, Joao Martins wrote:
>> On 16/10/2023 19:37, Joao Martins wrote:
>>> On 16/10/2023 19:20, Jason Gunthorpe wrote:
>>>> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
>>>>
>>>>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
>>>>> well later in the series
>>>>
>>>> It looks OK, the IS_ENABLES are probably overkill once you have
>>>> changed the .h file, just saves a few code bytes, not sure we care?
>>>
>>> I can remove them
>>
>> Additionally, I don't think I can use the symbol namespace for IOMMUFD, as
>> iova-bitmap can be build builtin with a module iommufd, otherwise we get into
>> errors like this:
>>
>> ERROR: modpost: module iommufd uses symbol iova_bitmap_for_each from namespace
>> IOMMUFD, but does not import it.
>> ERROR: modpost: module iommufd uses symbol iova_bitmap_free from namespace
>> IOMMUFD, but does not import it.
>> ERROR: modpost: module iommufd uses symbol iova_bitmap_alloc from namespace
>> IOMMUFD, but does not import it.
>
> You cannot self-import the namespace? I'm not that familiar with this stuff
Neither do I. But self-importing looks to work. An alternative is to have an
alternative namespace (e.g. IOMMUFD_DRIVER) in similar fashion to IOMMUFD_INTERNAL.
But I fear this patch is already doing too much at the late stage. Are you keen
on getting this moved with namespaces right now, or it can be a post-merge cleanup?
diff --git a/drivers/iommu/iommufd/iova_bitmap.c
b/drivers/iommu/iommufd/iova_bitmap.c
index f54b56388e00..0aaf2218bf61 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -7,6 +7,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/highmem.h>
+#include <linux/module.h>
#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_BYTE)
@@ -268,7 +269,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova,
size_t length,
iova_bitmap_free(bitmap);
return ERR_PTR(rc);
}
-EXPORT_SYMBOL_GPL(iova_bitmap_alloc);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_alloc, IOMMUFD);
/**
* iova_bitmap_free() - Frees an IOVA bitmap object
@@ -290,7 +291,7 @@ void iova_bitmap_free(struct iova_bitmap *bitmap)
kfree(bitmap);
}
-EXPORT_SYMBOL_GPL(iova_bitmap_free);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_free, IOMMUFD);
/*
* Returns the remaining bitmap indexes from mapped_total_index to process for
@@ -389,7 +390,7 @@ int iova_bitmap_for_each(struct iova_bitmap *bitmap, void
*opaque,
return ret;
}
-EXPORT_SYMBOL_GPL(iova_bitmap_for_each);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_for_each, IOMMUFD);
/**
* iova_bitmap_set() - Records an IOVA range in bitmap
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 25cfbf67031b..30f1656ac5da 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -558,5 +558,6 @@ MODULE_ALIAS_MISCDEV(VFIO_MINOR);
MODULE_ALIAS("devname:vfio/vfio");
#endif
MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 40732e8ed4c6..a96d97da367d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1693,6 +1693,7 @@ static void __exit vfio_cleanup(void)
module_init(vfio_init);
module_exit(vfio_cleanup);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_VERSION(DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-17 15:20 ` Joao Martins
@ 2023-10-17 15:23 ` Jason Gunthorpe
2023-10-17 15:44 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:23 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Tue, Oct 17, 2023 at 04:20:22PM +0100, Joao Martins wrote:
> On 17/10/2023 13:58, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 07:50:25PM +0100, Joao Martins wrote:
> >> On 16/10/2023 19:37, Joao Martins wrote:
> >>> On 16/10/2023 19:20, Jason Gunthorpe wrote:
> >>>> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
> >>>>
> >>>>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
> >>>>> well later in the series
> >>>>
> >>>> It looks OK, the IS_ENABLES are probably overkill once you have
> >>>> changed the .h file, just saves a few code bytes, not sure we care?
> >>>
> >>> I can remove them
> >>
> >> Additionally, I don't think I can use the symbol namespace for IOMMUFD, as
> >> iova-bitmap can be build builtin with a module iommufd, otherwise we get into
> >> errors like this:
> >>
> >> ERROR: modpost: module iommufd uses symbol iova_bitmap_for_each from namespace
> >> IOMMUFD, but does not import it.
> >> ERROR: modpost: module iommufd uses symbol iova_bitmap_free from namespace
> >> IOMMUFD, but does not import it.
> >> ERROR: modpost: module iommufd uses symbol iova_bitmap_alloc from namespace
> >> IOMMUFD, but does not import it.
> >
> > You cannot self-import the namespace? I'm not that familiar with this stuff
>
> Neither do I. But self-importing looks to work. An alternative is to have an
> alternative namespace (e.g. IOMMUFD_DRIVER) in similar fashion to IOMMUFD_INTERNAL.
>
> But I fear this patch is already doing too much at the late stage. Are you keen
> on getting this moved with namespaces right now, or it can be a post-merge cleanup?
It is our standard, if you want to make two patches in this series that is OK too
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-17 15:23 ` Jason Gunthorpe
@ 2023-10-17 15:44 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 15:44 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 17/10/2023 16:23, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 04:20:22PM +0100, Joao Martins wrote:
>> On 17/10/2023 13:58, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 07:50:25PM +0100, Joao Martins wrote:
>>>> On 16/10/2023 19:37, Joao Martins wrote:
>>>>> On 16/10/2023 19:20, Jason Gunthorpe wrote:
>>>>>> On Mon, Oct 16, 2023 at 07:15:10PM +0100, Joao Martins wrote:
>>>>>>
>>>>>>> Here's a diff, naturally AMD/Intel kconfigs would get a select IOMMUFD_DRIVER as
>>>>>>> well later in the series
>>>>>>
>>>>>> It looks OK, the IS_ENABLES are probably overkill once you have
>>>>>> changed the .h file, just saves a few code bytes, not sure we care?
>>>>>
>>>>> I can remove them
>>>>
>>>> Additionally, I don't think I can use the symbol namespace for IOMMUFD, as
>>>> iova-bitmap can be build builtin with a module iommufd, otherwise we get into
>>>> errors like this:
>>>>
>>>> ERROR: modpost: module iommufd uses symbol iova_bitmap_for_each from namespace
>>>> IOMMUFD, but does not import it.
>>>> ERROR: modpost: module iommufd uses symbol iova_bitmap_free from namespace
>>>> IOMMUFD, but does not import it.
>>>> ERROR: modpost: module iommufd uses symbol iova_bitmap_alloc from namespace
>>>> IOMMUFD, but does not import it.
>>>
>>> You cannot self-import the namespace? I'm not that familiar with this stuff
>>
>> Neither do I. But self-importing looks to work. An alternative is to have an
>> alternative namespace (e.g. IOMMUFD_DRIVER) in similar fashion to IOMMUFD_INTERNAL.
>>
>> But I fear this patch is already doing too much at the late stage. Are you keen
>> on getting this moved with namespaces right now, or it can be a post-merge cleanup?
>
> It is our standard, if you want to make two patches in this series that is OK too
>
This is what I have now (as a separate patch).
It is a little more intrusive as I need to change exist module users
(mlx5-vfio-pci, pds, vfio).
--->8---
From: Joao Martins <joao.m.martins@oracle.com>
Date: Tue, 17 Oct 2023 11:12:28 -0400
Subject: [PATCH v4 03/20] iommufd/iova_bitmap: Move symbols to IOMMUFD namespace
The IOVA bitmap helpers were not using any namespaces, so to adhere with
IOMMUFD symbol export convention, use the IOMMUFD and import in the right
places. This today means to self-import in iommufd/main.c, VFIO and the
vfio-pci drivers that use iova_bitmap_set().
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/iova_bitmap.c | 8 ++++----
drivers/iommu/iommufd/main.c | 1 +
drivers/vfio/pci/mlx5/main.c | 1 +
drivers/vfio/pci/pds/pci_drv.c | 1 +
drivers/vfio/vfio_main.c | 1 +
5 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/iova_bitmap.c
b/drivers/iommu/iommufd/iova_bitmap.c
index f54b56388e00..0a92c9eeaf7f 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -268,7 +268,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova,
size_t length,
iova_bitmap_free(bitmap);
return ERR_PTR(rc);
}
-EXPORT_SYMBOL_GPL(iova_bitmap_alloc);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_alloc, IOMMUFD);
/**
* iova_bitmap_free() - Frees an IOVA bitmap object
@@ -290,7 +290,7 @@ void iova_bitmap_free(struct iova_bitmap *bitmap)
kfree(bitmap);
}
-EXPORT_SYMBOL_GPL(iova_bitmap_free);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_free, IOMMUFD);
/*
* Returns the remaining bitmap indexes from mapped_total_index to process for
@@ -389,7 +389,7 @@ int iova_bitmap_for_each(struct iova_bitmap *bitmap, void
*opaque,
return ret;
}
-EXPORT_SYMBOL_GPL(iova_bitmap_for_each);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_for_each, IOMMUFD);
/**
* iova_bitmap_set() - Records an IOVA range in bitmap
@@ -423,4 +423,4 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
cur_bit += nbits;
} while (cur_bit <= last_bit);
}
-EXPORT_SYMBOL_GPL(iova_bitmap_set);
+EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e71523cbd0de..9b2c18d7af1e 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -552,5 +552,6 @@ MODULE_ALIAS_MISCDEV(VFIO_MINOR);
MODULE_ALIAS("devname:vfio/vfio");
#endif
MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 42ec574a8622..5cf2b491d15a 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1376,6 +1376,7 @@ static struct pci_driver mlx5vf_pci_driver = {
module_pci_driver(mlx5vf_pci_driver);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Max Gurtovoy <mgurtovoy@nvidia.com>");
MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
index ab4b5958e413..dd8c00c895a2 100644
--- a/drivers/vfio/pci/pds/pci_drv.c
+++ b/drivers/vfio/pci/pds/pci_drv.c
@@ -204,6 +204,7 @@ static struct pci_driver pds_vfio_pci_driver = {
module_pci_driver(pds_vfio_pci_driver);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_DESCRIPTION(PDS_VFIO_DRV_DESCRIPTION);
MODULE_AUTHOR("Brett Creeley <brett.creeley@amd.com>");
MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 40732e8ed4c6..a96d97da367d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1693,6 +1693,7 @@ static void __exit vfio_cleanup(void)
module_init(vfio_init);
module_exit(vfio_cleanup);
+MODULE_IMPORT_NS(IOMMUFD);
MODULE_VERSION(DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-16 18:05 ` Jason Gunthorpe
2023-10-16 18:15 ` Joao Martins
@ 2023-10-18 10:19 ` Joao Martins
2023-10-18 12:03 ` Jason Gunthorpe
1 sibling, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-18 10:19 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 16/10/2023 19:05, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
>> On 16/10/2023 17:34, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
>>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>>>> index 99d4b075df49..96ec013d1192 100644
>>>> --- a/drivers/iommu/iommufd/Kconfig
>>>> +++ b/drivers/iommu/iommufd/Kconfig
>>>> @@ -11,6 +11,13 @@ config IOMMUFD
>>>>
>>>> If you don't know what to do here, say N.
>>>>
>>>> +config IOMMUFD_DRIVER
>>>> + bool "IOMMUFD provides iommu drivers supporting functions"
>>>> + default IOMMU_API
>>>> + help
>>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
>>>> + drivers.
>>>
>>> It is not a 'user selectable' kconfig, just make it
>>>
>>> config IOMMUFD_DRIVER
>>> tristate
>>> default n
>>>
>> tristate? More like a bool as IOMMU drivers aren't modloadable
>
> tristate, who knows what people will select. If the modular drivers
> use it then it is forced to a Y not a M. It is the right way to use kconfig..
>
Making it tristate will break build bisection in this module with errors like this:
[I say bisection, because aftewards when we put IOMMU drivers in the mix, these
are always builtin, so it ends up selecting IOMMU_DRIVER=y.]
ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/iommufd/iova_bitmap.o
iova_bitmap is no module, and making it tristate allows to build it as a module
as long as one of the selectors of is a module. 'bool' is actually more accurate
to what it is builtin or not.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-18 10:19 ` Joao Martins
@ 2023-10-18 12:03 ` Jason Gunthorpe
2023-10-18 12:48 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 12:03 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Wed, Oct 18, 2023 at 11:19:07AM +0100, Joao Martins wrote:
> On 16/10/2023 19:05, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
> >> On 16/10/2023 17:34, Jason Gunthorpe wrote:
> >>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
> >>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> >>>> index 99d4b075df49..96ec013d1192 100644
> >>>> --- a/drivers/iommu/iommufd/Kconfig
> >>>> +++ b/drivers/iommu/iommufd/Kconfig
> >>>> @@ -11,6 +11,13 @@ config IOMMUFD
> >>>>
> >>>> If you don't know what to do here, say N.
> >>>>
> >>>> +config IOMMUFD_DRIVER
> >>>> + bool "IOMMUFD provides iommu drivers supporting functions"
> >>>> + default IOMMU_API
> >>>> + help
> >>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
> >>>> + drivers.
> >>>
> >>> It is not a 'user selectable' kconfig, just make it
> >>>
> >>> config IOMMUFD_DRIVER
> >>> tristate
> >>> default n
> >>>
> >> tristate? More like a bool as IOMMU drivers aren't modloadable
> >
> > tristate, who knows what people will select. If the modular drivers
> > use it then it is forced to a Y not a M. It is the right way to use kconfig..
> >
> Making it tristate will break build bisection in this module with errors like this:
>
> [I say bisection, because aftewards when we put IOMMU drivers in the mix, these
> are always builtin, so it ends up selecting IOMMU_DRIVER=y.]
>
> ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/iommufd/iova_bitmap.o
>
> iova_bitmap is no module, and making it tristate allows to build it as a module
> as long as one of the selectors of is a module. 'bool' is actually more accurate
> to what it is builtin or not.
It is a module if you make it tristate, add the MODULE_LICENSE
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-18 12:03 ` Jason Gunthorpe
@ 2023-10-18 12:48 ` Joao Martins
2023-10-18 14:23 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-18 12:48 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 18/10/2023 13:03, Jason Gunthorpe wrote:
> On Wed, Oct 18, 2023 at 11:19:07AM +0100, Joao Martins wrote:
>> On 16/10/2023 19:05, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
>>>> On 16/10/2023 17:34, Jason Gunthorpe wrote:
>>>>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
>>>>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>>>>>> index 99d4b075df49..96ec013d1192 100644
>>>>>> --- a/drivers/iommu/iommufd/Kconfig
>>>>>> +++ b/drivers/iommu/iommufd/Kconfig
>>>>>> @@ -11,6 +11,13 @@ config IOMMUFD
>>>>>>
>>>>>> If you don't know what to do here, say N.
>>>>>>
>>>>>> +config IOMMUFD_DRIVER
>>>>>> + bool "IOMMUFD provides iommu drivers supporting functions"
>>>>>> + default IOMMU_API
>>>>>> + help
>>>>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
>>>>>> + drivers.
>>>>>
>>>>> It is not a 'user selectable' kconfig, just make it
>>>>>
>>>>> config IOMMUFD_DRIVER
>>>>> tristate
>>>>> default n
>>>>>
>>>> tristate? More like a bool as IOMMU drivers aren't modloadable
>>>
>>> tristate, who knows what people will select. If the modular drivers
>>> use it then it is forced to a Y not a M. It is the right way to use kconfig..
>>>
>> Making it tristate will break build bisection in this module with errors like this:
>>
>> [I say bisection, because aftewards when we put IOMMU drivers in the mix, these
>> are always builtin, so it ends up selecting IOMMU_DRIVER=y.]
>>
>> ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/iommufd/iova_bitmap.o
>>
>> iova_bitmap is no module, and making it tristate allows to build it as a module
>> as long as one of the selectors of is a module. 'bool' is actually more accurate
>> to what it is builtin or not.
>
> It is a module if you make it tristate, add the MODULE_LICENSE
It's not just that. It can't work as a module when CONFIG_VFIO=y and another
user is CONFIG_MLX5_VFIO_PCI=m. CONFIG_VFIO uses the API so this is that case
where IS_ENABLED(CONFIG_IOMMUFD_DRIVER) evaluates to true but it is only
technically used by a module so it doesn't link it in. You could have the header
file test for its presence of being a module instead of just IS_ENABLED() . But
then this is useless as CONFIG_VFIO code is what drives the whole VFIO driver
dirty tracking so it must override it as =y.
This extra kconfig change at the end should fix it. But I am a bit skeptical of
these last minute module-user changes, as it is getting a bit too nuanced from
what was a relatively non invasive change.
I would like to reiterate that there's no actual module user, making a bool is a
bit more clear on its usage on what it actually is (you would need IOMMU drivers
to be modules, which I think is a big gamble that is happening anytime soon?)
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 6bda6dbb4878..1db519cce815 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -7,6 +7,7 @@ menuconfig VFIO
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
+ select IOMMUFD_DRIVER
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
diff --git a/drivers/iommu/iommufd/iova_bitmap.c
b/drivers/iommu/iommufd/iova_bitmap.c
index f54b56388e00..350f6b615e91 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -7,6 +7,7 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/highmem.h>
+#include <linux/module.h>
#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_BYTE)
@@ -424,3 +425,5 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
} while (cur_bit <= last_bit);
}
EXPORT_SYMBOL_GPL(iova_bitmap_set);
+
+MODULE_LICENSE("GPL v2");
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-18 12:48 ` Joao Martins
@ 2023-10-18 14:23 ` Jason Gunthorpe
2023-10-18 15:34 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 14:23 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Wed, Oct 18, 2023 at 01:48:04PM +0100, Joao Martins wrote:
> On 18/10/2023 13:03, Jason Gunthorpe wrote:
> > On Wed, Oct 18, 2023 at 11:19:07AM +0100, Joao Martins wrote:
> >> On 16/10/2023 19:05, Jason Gunthorpe wrote:
> >>> On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
> >>>> On 16/10/2023 17:34, Jason Gunthorpe wrote:
> >>>>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
> >>>>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> >>>>>> index 99d4b075df49..96ec013d1192 100644
> >>>>>> --- a/drivers/iommu/iommufd/Kconfig
> >>>>>> +++ b/drivers/iommu/iommufd/Kconfig
> >>>>>> @@ -11,6 +11,13 @@ config IOMMUFD
> >>>>>>
> >>>>>> If you don't know what to do here, say N.
> >>>>>>
> >>>>>> +config IOMMUFD_DRIVER
> >>>>>> + bool "IOMMUFD provides iommu drivers supporting functions"
> >>>>>> + default IOMMU_API
> >>>>>> + help
> >>>>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
> >>>>>> + drivers.
> >>>>>
> >>>>> It is not a 'user selectable' kconfig, just make it
> >>>>>
> >>>>> config IOMMUFD_DRIVER
> >>>>> tristate
> >>>>> default n
> >>>>>
> >>>> tristate? More like a bool as IOMMU drivers aren't modloadable
> >>>
> >>> tristate, who knows what people will select. If the modular drivers
> >>> use it then it is forced to a Y not a M. It is the right way to use kconfig..
> >>>
> >> Making it tristate will break build bisection in this module with errors like this:
> >>
> >> [I say bisection, because aftewards when we put IOMMU drivers in the mix, these
> >> are always builtin, so it ends up selecting IOMMU_DRIVER=y.]
> >>
> >> ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/iommufd/iova_bitmap.o
> >>
> >> iova_bitmap is no module, and making it tristate allows to build it as a module
> >> as long as one of the selectors of is a module. 'bool' is actually more accurate
> >> to what it is builtin or not.
> >
> > It is a module if you make it tristate, add the MODULE_LICENSE
>
> It's not just that. It can't work as a module when CONFIG_VFIO=y and another
> user is CONFIG_MLX5_VFIO_PCI=m. CONFIG_VFIO uses the API so this is that case
> where IS_ENABLED(CONFIG_IOMMUFD_DRIVER) evaluates to true but it is only
> technically used by a module so it doesn't link it in.
Ah! There is a well known kconfig technique for this too:
depends on m || IOMMUFD_DRIVER != m
or
depends on IOMMUFD_DRIVER || IOMMUFD_DRIVER = n
On the VFIO module.
> I would like to reiterate that there's no actual module user, making a bool is a
> bit more clear on its usage on what it actually is (you would need IOMMU drivers
> to be modules, which I think is a big gamble that is happening anytime soon?)
This is all true too, but my thinking was to allow VFIO to use it
without having an IOMMU driver compiled in that supports dirty
tracking. eg for embedded cases.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-18 14:23 ` Jason Gunthorpe
@ 2023-10-18 15:34 ` Joao Martins
2023-10-18 15:43 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-18 15:34 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On 18/10/2023 15:23, Jason Gunthorpe wrote:
> On Wed, Oct 18, 2023 at 01:48:04PM +0100, Joao Martins wrote:
>> On 18/10/2023 13:03, Jason Gunthorpe wrote:
>>> On Wed, Oct 18, 2023 at 11:19:07AM +0100, Joao Martins wrote:
>>>> On 16/10/2023 19:05, Jason Gunthorpe wrote:
>>>>> On Mon, Oct 16, 2023 at 06:52:50PM +0100, Joao Martins wrote:
>>>>>> On 16/10/2023 17:34, Jason Gunthorpe wrote:
>>>>>>> On Mon, Oct 16, 2023 at 05:25:16PM +0100, Joao Martins wrote:
>>>>>>>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>>>>>>>> index 99d4b075df49..96ec013d1192 100644
>>>>>>>> --- a/drivers/iommu/iommufd/Kconfig
>>>>>>>> +++ b/drivers/iommu/iommufd/Kconfig
>>>>>>>> @@ -11,6 +11,13 @@ config IOMMUFD
>>>>>>>>
>>>>>>>> If you don't know what to do here, say N.
>>>>>>>>
>>>>>>>> +config IOMMUFD_DRIVER
>>>>>>>> + bool "IOMMUFD provides iommu drivers supporting functions"
>>>>>>>> + default IOMMU_API
>>>>>>>> + help
>>>>>>>> + IOMMUFD will provides supporting data structures and helpers to IOMMU
>>>>>>>> + drivers.
>>>>>>>
>>>>>>> It is not a 'user selectable' kconfig, just make it
>>>>>>>
>>>>>>> config IOMMUFD_DRIVER
>>>>>>> tristate
>>>>>>> default n
>>>>>>>
>>>>>> tristate? More like a bool as IOMMU drivers aren't modloadable
>>>>>
>>>>> tristate, who knows what people will select. If the modular drivers
>>>>> use it then it is forced to a Y not a M. It is the right way to use kconfig..
>>>>>
>>>> Making it tristate will break build bisection in this module with errors like this:
>>>>
>>>> [I say bisection, because aftewards when we put IOMMU drivers in the mix, these
>>>> are always builtin, so it ends up selecting IOMMU_DRIVER=y.]
>>>>
>>>> ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/iommufd/iova_bitmap.o
>>>>
>>>> iova_bitmap is no module, and making it tristate allows to build it as a module
>>>> as long as one of the selectors of is a module. 'bool' is actually more accurate
>>>> to what it is builtin or not.
>>>
>>> It is a module if you make it tristate, add the MODULE_LICENSE
>>
>> It's not just that. It can't work as a module when CONFIG_VFIO=y and another
>> user is CONFIG_MLX5_VFIO_PCI=m. CONFIG_VFIO uses the API so this is that case
>> where IS_ENABLED(CONFIG_IOMMUFD_DRIVER) evaluates to true but it is only
>> technically used by a module so it doesn't link it in.
>
> Ah! There is a well known kconfig technique for this too:
> depends on m || IOMMUFD_DRIVER != m
> or
> depends on IOMMUFD_DRIVER || IOMMUFD_DRIVER = n
>
These two lead to a recursive dependency:
drivers/vfio/Kconfig:2:error: recursive dependency detected!
drivers/vfio/Kconfig:2: symbol VFIO depends on IOMMUFD_DRIVER
drivers/iommu/iommufd/Kconfig:14: symbol IOMMUFD_DRIVER is selected by
MLX5_VFIO_PCI
drivers/vfio/pci/mlx5/Kconfig:2: symbol MLX5_VFIO_PCI depends on VFIO
For a resolution refer to Documentation/kbuild/kconfig-language.rst
subsection "Kconfig recursive dependency limitations"
Due to the end drivers being the ones actually selecting IOMMUFD_DRIVER. But
well, if we remove those, then no VF dirty tracking either.
> On the VFIO module.
>
The problem is VFIO module being =y with IOMMUFD_DRIVER=m because its end-user
being a module (MLX5_VFIO_PCI=m) which depends on it. If IOMMUFD_DRIVER
>> I would like to reiterate that there's no actual module user, making a bool is a
>> bit more clear on its usage on what it actually is (you would need IOMMU drivers
>> to be modules, which I think is a big gamble that is happening anytime soon?)
>
> This is all true too, but my thinking was to allow VFIO to use it
> without having an IOMMU driver compiled in that supports dirty
> tracking. eg for embedded cases.
OK, I see where you are coming from.
Honestly I think a simple 'select IOMMUFD_DRIVER' would fix it. If it's a module
it will select it as a module, and later if some IOMMU (if supported) selects it
it will make it builtin. The only it doesn't cover is when no VFIO PCI core
drivers are built or when non of the VFIO PCI core drivers do dirty tracking
i.e. it will still build it into CONFIG_VFIO -- but nothing new here as this is
how it works today.
I could restrict the scope with below and it would avoid having to do select in
mlx5/pds drivers:
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 1db519cce815..2abd1c598b65 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -7,7 +7,7 @@ menuconfig VFIO
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
select VFIO_CONTAINER if IOMMUFD=n
- select IOMMUFD_DRIVER
+ select IOMMUFD_DRIVER if VFIO_PCI_CORE
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core
2023-10-18 15:34 ` Joao Martins
@ 2023-10-18 15:43 ` Jason Gunthorpe
0 siblings, 0 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 15:43 UTC (permalink / raw)
To: Joao Martins
Cc: Alex Williamson, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, kvm
On Wed, Oct 18, 2023 at 04:34:17PM +0100, Joao Martins wrote:
> These two lead to a recursive dependency:
>
> drivers/vfio/Kconfig:2:error: recursive dependency detected!
> drivers/vfio/Kconfig:2: symbol VFIO depends on IOMMUFD_DRIVER
> drivers/iommu/iommufd/Kconfig:14: symbol IOMMUFD_DRIVER is selected by
> MLX5_VFIO_PCI
> drivers/vfio/pci/mlx5/Kconfig:2: symbol MLX5_VFIO_PCI depends on VFIO
> For a resolution refer to Documentation/kbuild/kconfig-language.rst
> subsection "Kconfig recursive dependency limitations"
>
> Due to the end drivers being the ones actually selecting IOMMUFD_DRIVER. But
> well, if we remove those, then no VF dirty tracking either.
Oh I'm surprised by this
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 1db519cce815..2abd1c598b65 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -7,7 +7,7 @@ menuconfig VFIO
> select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> select VFIO_DEVICE_CDEV if !VFIO_GROUP
> select VFIO_CONTAINER if IOMMUFD=n
> - select IOMMUFD_DRIVER
> + select IOMMUFD_DRIVER if VFIO_PCI_CORE
I think you are better to stick with the bool non-modular in this
case
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
2023-09-23 1:24 ` [PATCH v3 01/19] vfio/iova_bitmap: Export more API symbols Joao Martins
2023-09-23 1:24 ` [PATCH v3 02/19] vfio: Move iova_bitmap into iommu core Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-10-13 16:05 ` Jason Gunthorpe
2023-09-23 1:24 ` [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
` (17 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Add to iommu domain operations a set of callbacks to perform dirty
tracking, particulary to start and stop tracking and to read and clear the
dirty data.
Drivers are generally expected to dynamically change its translation
structures to toggle the tracking and flush some form of control state
structure that stands in the IOVA translation path. Though it's not
mandatory, as drivers can also enable dirty tracking at boot, and just
clear the dirty bits before setting dirty tracking. For each of the newly
added IOMMU core APIs:
iommu_cap::IOMMU_CAP_DIRTY: new device iommu_capable value when probing for
capabilities of the device.
.set_dirty_tracking(): an iommu driver is expected to change its
translation structures and enable dirty tracking for the devices in the
iommu_domain. For drivers making dirty tracking always-enabled, it should
just return 0.
.read_and_clear_dirty(): an iommu driver is expected to walk the pagetables
for the iova range passed in and use iommu_dirty_bitmap_record() to record
dirty info per IOVA. When detecting that a given IOVA is dirty it should
also clear its dirty state from the PTE, *unless* the flag
IOMMU_DIRTY_NO_CLEAR is passed in -- flushing is steered from the caller of
the domain_op via iotlb_gather. The iommu core APIs use the same data
structure in use for dirty tracking for VFIO device dirty (struct
iova_bitmap) abstracted by iommu_dirty_bitmap_record() helper function.
domain::dirty_ops: IOMMU domains will store the dirty ops depending on
whether the iommu device supports dirty tracking or not. iommu drivers can
then use this field to figure if the dirty tracking is supported+enforced
on attach. The enforcement is enable via domain_alloc_user() which is done
via IOMMUFD hwpt flag introduced later.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/linux/io-pgtable.h | 4 +++
include/linux/iommu.h | 56 ++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 1b7a44b35616..25142a0e2fc2 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -166,6 +166,10 @@ struct io_pgtable_ops {
struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
+ int (*read_and_clear_dirty)(struct io_pgtable_ops *ops,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty);
};
/**
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 660dc1931dc9..f3c27b1f2510 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -13,6 +13,7 @@
#include <linux/errno.h>
#include <linux/err.h>
#include <linux/of.h>
+#include <linux/iova_bitmap.h>
#include <uapi/linux/iommu.h>
#define IOMMU_READ (1 << 0)
@@ -37,6 +38,7 @@ struct bus_type;
struct device;
struct iommu_domain;
struct iommu_domain_ops;
+struct iommu_dirty_ops;
struct notifier_block;
struct iommu_sva;
struct iommu_fault_event;
@@ -95,6 +97,8 @@ struct iommu_domain_geometry {
struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
+ const struct iommu_dirty_ops *dirty_ops;
+
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
@@ -133,6 +137,7 @@ enum iommu_cap {
* usefully support the non-strict DMA flush queue.
*/
IOMMU_CAP_DEFERRED_FLUSH,
+ IOMMU_CAP_DIRTY, /* IOMMU supports dirty tracking */
};
/* These are the possible reserved region types */
@@ -227,6 +232,32 @@ struct iommu_iotlb_gather {
bool queued;
};
+/**
+ * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
+ * @bitmap: IOVA bitmap
+ * @gather: Range information for a pending IOTLB flush
+ */
+struct iommu_dirty_bitmap {
+ struct iova_bitmap *bitmap;
+ struct iommu_iotlb_gather *gather;
+};
+
+/**
+ * struct iommu_dirty_ops - domain specific dirty tracking operations
+ * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
+ * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
+ * into a bitmap, with a bit represented as a page.
+ * Reads the dirty PTE bits and clears it from IO
+ * pagetables.
+ */
+struct iommu_dirty_ops {
+ int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
+ int (*read_and_clear_dirty)(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty);
+};
+
/**
* struct iommu_ops - iommu ops and capabilities
* @capable: check capability
@@ -640,6 +671,28 @@ static inline bool iommu_iotlb_gather_queued(struct iommu_iotlb_gather *gather)
return gather && gather->queued;
}
+static inline void iommu_dirty_bitmap_init(struct iommu_dirty_bitmap *dirty,
+ struct iova_bitmap *bitmap,
+ struct iommu_iotlb_gather *gather)
+{
+ if (gather)
+ iommu_iotlb_gather_init(gather);
+
+ dirty->bitmap = bitmap;
+ dirty->gather = gather;
+}
+
+static inline void
+iommu_dirty_bitmap_record(struct iommu_dirty_bitmap *dirty, unsigned long iova,
+ unsigned long length)
+{
+ if (dirty->bitmap)
+ iova_bitmap_set(dirty->bitmap, iova, length);
+
+ if (dirty->gather)
+ iommu_iotlb_gather_add_range(dirty->gather, iova, length);
+}
+
/* PCI device grouping function */
extern struct iommu_group *pci_device_group(struct device *dev);
/* Generic device grouping function */
@@ -670,6 +723,9 @@ struct iommu_fwspec {
/* ATS is supported */
#define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0)
+/* Read but do not clear any dirty bits */
+#define IOMMU_DIRTY_NO_CLEAR (1 << 0)
+
/**
* struct iommu_sva - handle to a device-mm bond
*/
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking
2023-09-23 1:24 ` [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking Joao Martins
@ 2023-10-13 16:05 ` Jason Gunthorpe
2023-10-13 16:27 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:05 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:55AM +0100, Joao Martins wrote:
> +/**
> + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
> + * @bitmap: IOVA bitmap
> + * @gather: Range information for a pending IOTLB flush
> + */
> +struct iommu_dirty_bitmap {
> + struct iova_bitmap *bitmap;
> + struct iommu_iotlb_gather *gather;
> +};
Why the struct ?
> +
> +/**
> + * struct iommu_dirty_ops - domain specific dirty tracking operations
> + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
> + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
> + * into a bitmap, with a bit represented as a page.
> + * Reads the dirty PTE bits and clears it from IO
> + * pagetables.
> + */
> +struct iommu_dirty_ops {
> + int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
> + int (*read_and_clear_dirty)(struct iommu_domain *domain,
> + unsigned long iova, size_t size,
> + unsigned long flags,
> + struct iommu_dirty_bitmap *dirty);
> +};
vs 1 more parameter here?
vs putting more stuff in the struct?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking
2023-10-13 16:05 ` Jason Gunthorpe
@ 2023-10-13 16:27 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:05, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:55AM +0100, Joao Martins wrote:
>
>> +/**
>> + * struct iommu_dirty_bitmap - Dirty IOVA bitmap state
>> + * @bitmap: IOVA bitmap
>> + * @gather: Range information for a pending IOTLB flush
>> + */
>> +struct iommu_dirty_bitmap {
>> + struct iova_bitmap *bitmap;
>> + struct iommu_iotlb_gather *gather;
>> +};
>
> Why the struct ?
>
...
>> +
>> +/**
>> + * struct iommu_dirty_ops - domain specific dirty tracking operations
>> + * @set_dirty_tracking: Enable or Disable dirty tracking on the iommu domain
>> + * @read_and_clear_dirty: Walk IOMMU page tables for dirtied PTEs marshalled
>> + * into a bitmap, with a bit represented as a page.
>> + * Reads the dirty PTE bits and clears it from IO
>> + * pagetables.
>> + */
>> +struct iommu_dirty_ops {
>> + int (*set_dirty_tracking)(struct iommu_domain *domain, bool enabled);
>> + int (*read_and_clear_dirty)(struct iommu_domain *domain,
>> + unsigned long iova, size_t size,
>> + unsigned long flags,
>> + struct iommu_dirty_bitmap *dirty);
>> +};
>
> vs 1 more parameter here?
>
.. I was just trying to avoid one parameter, and I wanted to abstract the
iotlb_gather from the iommu driver, to simplify it for the iommu driver.
But honestly I was quite undecided with 5 args vs 6 args sounded like stretching
to the max, and then the other simplification sort of made sense to consolidate.
Is there one you go at preferably?
> vs putting more stuff in the struct?
>
This I sort of disagree as iova-bitmap has no knowledge of IOTLB -- it's
serializing bits into the bitmap
> Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (2 preceding siblings ...)
2023-09-23 1:24 ` [PATCH v3 03/19] iommu: Add iommu_domain ops for dirty tracking Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-10-13 15:52 ` Jason Gunthorpe
2023-09-23 1:24 ` [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags Joao Martins
` (16 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Throughout IOMMU domain lifetime that wants to use dirty tracking, some
guarantees are needed such that any device attached to the iommu_domain
supports dirty tracking.
The idea is to handle a case where IOMMU in the system are assymetric
feature-wise and thus the capability may not be supported for all devices.
The enforcement is done by adding a flag into HWPT_ALLOC namely:
IOMMUFD_HWPT_ALLOC_ENFORCE_DIRTY
.. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating
a iommu_domain via domain_alloc_user() and validating the requested flags
with what the device IOMMU supports (and failing accordingly) advertised).
Advertising the new IOMMU domain feature flag requires that the individual
iommu driver capability is supported when a future device attachment
happens.
Link: https://lore.kernel.org/kvm/20220721142421.GB4609@nvidia.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/hw_pagetable.c | 8 ++++++--
include/uapi/linux/iommufd.h | 3 +++
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 26a8a818ffa3..32e259245314 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -83,7 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
lockdep_assert_held(&ioas->mutex);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user)
+ if ((flags & (IOMMU_HWPT_ALLOC_NEST_PARENT|
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) &&
+ !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
@@ -157,7 +159,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
struct iommufd_ioas *ioas;
int rc;
- if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
+ if ((cmd->flags &
+ ~(IOMMU_HWPT_ALLOC_NEST_PARENT|IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
+ cmd->__reserved)
return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 4a7c5c8fdbb4..cd94a9d8ce66 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -352,9 +352,12 @@ struct iommu_vfio_ioas {
* @IOMMU_HWPT_ALLOC_NEST_PARENT: If set, allocate a domain which can serve
* as the parent domain in the nesting
* configuration.
+ * @IOMMU_HWPT_ALLOC_ENFORCE_DIRTY: Dirty tracking support for device IOMMU is
+ * enforced on device attachment
*/
enum iommufd_hwpt_alloc_flags {
IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY = 1 << 1,
};
/**
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach
2023-09-23 1:24 ` [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
@ 2023-10-13 15:52 ` Jason Gunthorpe
2023-10-13 16:14 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 15:52 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:56AM +0100, Joao Martins wrote:
> Throughout IOMMU domain lifetime that wants to use dirty tracking, some
> guarantees are needed such that any device attached to the iommu_domain
> supports dirty tracking.
>
> The idea is to handle a case where IOMMU in the system are assymetric
> feature-wise and thus the capability may not be supported for all devices.
> The enforcement is done by adding a flag into HWPT_ALLOC namely:
>
> IOMMUFD_HWPT_ALLOC_ENFORCE_DIRTY
>
> .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating
> a iommu_domain via domain_alloc_user() and validating the requested flags
> with what the device IOMMU supports (and failing accordingly) advertised).
> Advertising the new IOMMU domain feature flag requires that the individual
> iommu driver capability is supported when a future device attachment
> happens.
>
> Link: https://lore.kernel.org/kvm/20220721142421.GB4609@nvidia.com/
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/iommu/iommufd/hw_pagetable.c | 8 ++++++--
> include/uapi/linux/iommufd.h | 3 +++
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 26a8a818ffa3..32e259245314 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -83,7 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>
> lockdep_assert_held(&ioas->mutex);
>
> - if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user)
> + if ((flags & (IOMMU_HWPT_ALLOC_NEST_PARENT|
> + IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) &&
> + !ops->domain_alloc_user)
> return ERR_PTR(-EOPNOTSUPP);
This seems strange, why are we testing flags here? shouldn't this just
be
if (flags && !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
?
> hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
> @@ -157,7 +159,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
> struct iommufd_ioas *ioas;
> int rc;
>
> - if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
> + if ((cmd->flags &
> + ~(IOMMU_HWPT_ALLOC_NEST_PARENT|IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
> + cmd->__reserved)
> return -EOPNOTSUPP;
Please checkpatch your stuff, and even better feed the patches to
clang-format and do most of what it says.
Otherwise seems like the right thing to do
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach
2023-10-13 15:52 ` Jason Gunthorpe
@ 2023-10-13 16:14 ` Joao Martins
2023-10-13 16:16 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 16:52, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:56AM +0100, Joao Martins wrote:
>> Throughout IOMMU domain lifetime that wants to use dirty tracking, some
>> guarantees are needed such that any device attached to the iommu_domain
>> supports dirty tracking.
>>
>> The idea is to handle a case where IOMMU in the system are assymetric
>> feature-wise and thus the capability may not be supported for all devices.
>> The enforcement is done by adding a flag into HWPT_ALLOC namely:
>>
>> IOMMUFD_HWPT_ALLOC_ENFORCE_DIRTY
>>
>> .. Passed in HWPT_ALLOC ioctl() flags. The enforcement is done by creating
>> a iommu_domain via domain_alloc_user() and validating the requested flags
>> with what the device IOMMU supports (and failing accordingly) advertised).
>> Advertising the new IOMMU domain feature flag requires that the individual
>> iommu driver capability is supported when a future device attachment
>> happens.
>>
>> Link: https://lore.kernel.org/kvm/20220721142421.GB4609@nvidia.com/
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/iommu/iommufd/hw_pagetable.c | 8 ++++++--
>> include/uapi/linux/iommufd.h | 3 +++
>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
>> index 26a8a818ffa3..32e259245314 100644
>> --- a/drivers/iommu/iommufd/hw_pagetable.c
>> +++ b/drivers/iommu/iommufd/hw_pagetable.c
>> @@ -83,7 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
>>
>> lockdep_assert_held(&ioas->mutex);
>>
>> - if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user)
>> + if ((flags & (IOMMU_HWPT_ALLOC_NEST_PARENT|
>> + IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) &&
>> + !ops->domain_alloc_user)
>> return ERR_PTR(-EOPNOTSUPP);
>
> This seems strange, why are we testing flags here? shouldn't this just
> be
>
> if (flags && !ops->domain_alloc_user)
> return ERR_PTR(-EOPNOTSUPP);
>
> ?
Yeah makes sense, let me switch to that.
It achieves the same without into the weeds of missing a flag update. And any
flag essentially requires alloc_user regardless.
>
>> hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
>> @@ -157,7 +159,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>> struct iommufd_ioas *ioas;
>> int rc;
>>
>> - if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
>> + if ((cmd->flags &
>> + ~(IOMMU_HWPT_ALLOC_NEST_PARENT|IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
>> + cmd->__reserved)
>> return -EOPNOTSUPP;
>
> Please checkpatch your stuff,
I always do this, and there was no issues reported on this patch.
Sometimes there's one or another I ignore albeit very rarely (e.g. passing 1
character post 80 column gap and which fixing makes the code uglier).
> and even better feed the patches to
> clang-format and do most of what it says.
>
This one I don't (but I wasn't aware either that it's a thing)
> Otherwise seems like the right thing to do
>
> Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach
2023-10-13 16:14 ` Joao Martins
@ 2023-10-13 16:16 ` Jason Gunthorpe
2023-10-13 16:29 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:16 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 05:14:26PM +0100, Joao Martins wrote:
> >> hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
> >> @@ -157,7 +159,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
> >> struct iommufd_ioas *ioas;
> >> int rc;
> >>
> >> - if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
> >> + if ((cmd->flags &
> >> + ~(IOMMU_HWPT_ALLOC_NEST_PARENT|IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
> >> + cmd->__reserved)
> >> return -EOPNOTSUPP;
> >
> > Please checkpatch your stuff,
>
> I always do this, and there was no issues reported on this patch.
Really? The missing spaces around ' | ' are not kernel style..
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach
2023-10-13 16:16 ` Jason Gunthorpe
@ 2023-10-13 16:29 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:29 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:16, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 05:14:26PM +0100, Joao Martins wrote:
>>>> hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE);
>>>> @@ -157,7 +159,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
>>>> struct iommufd_ioas *ioas;
>>>> int rc;
>>>>
>>>> - if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
>>>> + if ((cmd->flags &
>>>> + ~(IOMMU_HWPT_ALLOC_NEST_PARENT|IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)) ||
>>>> + cmd->__reserved)
>>>> return -EOPNOTSUPP;
>>>
>>> Please checkpatch your stuff,
>>
>> I always do this, and there was no issues reported on this patch.
>
> Really? The missing spaces around ' | ' are not kernel style..
>
I just ran it and it doesn't complain really.
But btw I am not doing spaces around single | ? Only around ' || ' and that is
quite common in kernel code?
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (3 preceding siblings ...)
2023-09-23 1:24 ` [PATCH v3 04/19] iommufd: Add a flag to enforce dirty tracking on attach Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-10-13 16:02 ` Jason Gunthorpe
2023-09-23 1:24 ` [PATCH v3 06/19] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
` (15 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Expand mock_domain test to be able to manipulate the device
capabilities. This allows testing with mockdev without dirty
tracking support advertised and thus make sure enforce_dirty
test does the expected.
To avoid breaking IOMMUFD_TEST UABI replicate the mock_domain
struct and thus add an input dev_flags at the end.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/iommufd_test.h | 12 ++++++++
drivers/iommu/iommufd/selftest.c | 11 +++++--
tools/testing/selftests/iommu/iommufd_utils.h | 30 +++++++++++++++++++
3 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 3f3644375bf1..9817edcd8968 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -19,6 +19,7 @@ enum {
IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
+ IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
};
enum {
@@ -40,6 +41,10 @@ enum {
MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0,
};
+enum {
+ MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
+};
+
struct iommu_test_cmd {
__u32 size;
__u32 op;
@@ -56,6 +61,13 @@ struct iommu_test_cmd {
/* out_idev_id is the standard iommufd_bind object */
__u32 out_idev_id;
} mock_domain;
+ struct {
+ __u32 out_stdev_id;
+ __u32 out_hwpt_id;
+ __u32 out_idev_id;
+ /* Expand mock_domain to set mock device flags */
+ __u32 dev_flags;
+ } mock_domain_flags;
struct {
__u32 pt_id;
} mock_domain_replace;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index b54cbfb1862b..966a98c0935e 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -96,6 +96,7 @@ enum selftest_obj_type {
struct mock_dev {
struct device dev;
+ unsigned long flags;
};
struct selftest_obj {
@@ -378,7 +379,7 @@ static void mock_dev_release(struct device *dev)
kfree(mdev);
}
-static struct mock_dev *mock_dev_create(void)
+static struct mock_dev *mock_dev_create(unsigned long dev_flags)
{
struct mock_dev *mdev;
int rc;
@@ -388,6 +389,7 @@ static struct mock_dev *mock_dev_create(void)
return ERR_PTR(-ENOMEM);
device_initialize(&mdev->dev);
+ mdev->flags = dev_flags;
mdev->dev.release = mock_dev_release;
mdev->dev.bus = &iommufd_mock_bus_type.bus;
@@ -423,6 +425,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
struct iommufd_device *idev;
struct selftest_obj *sobj;
u32 pt_id = cmd->id;
+ u32 dev_flags = 0;
u32 idev_id;
int rc;
@@ -433,7 +436,10 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
sobj->idev.ictx = ucmd->ictx;
sobj->type = TYPE_IDEV;
- sobj->idev.mock_dev = mock_dev_create();
+ if (cmd->op == IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS)
+ dev_flags = cmd->mock_domain_flags.dev_flags;
+
+ sobj->idev.mock_dev = mock_dev_create(dev_flags);
if (IS_ERR(sobj->idev.mock_dev)) {
rc = PTR_ERR(sobj->idev.mock_dev);
goto out_sobj;
@@ -1016,6 +1022,7 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
cmd->add_reserved.start,
cmd->add_reserved.length);
case IOMMU_TEST_OP_MOCK_DOMAIN:
+ case IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS:
return iommufd_test_mock_domain(ucmd, cmd);
case IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE:
return iommufd_test_mock_domain_replace(
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index be4970a84977..8e84d2592f2d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -74,6 +74,36 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \
stdev_id, hwpt_id, NULL))
+static int _test_cmd_mock_domain_flags(int fd, unsigned int ioas_id,
+ __u32 stdev_flags,
+ __u32 *stdev_id, __u32 *hwpt_id,
+ __u32 *idev_id)
+{
+ struct iommu_test_cmd cmd = {
+ .size = sizeof(cmd),
+ .op = IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
+ .id = ioas_id,
+ .mock_domain_flags = { .dev_flags = stdev_flags },
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_TEST_CMD, &cmd);
+ if (ret)
+ return ret;
+ if (stdev_id)
+ *stdev_id = cmd.mock_domain_flags.out_stdev_id;
+ assert(cmd.id != 0);
+ if (hwpt_id)
+ *hwpt_id = cmd.mock_domain_flags.out_hwpt_id;
+ if (idev_id)
+ *idev_id = cmd.mock_domain_flags.out_idev_id;
+ return 0;
+}
+#define test_err_mock_domain_flags(_errno, ioas_id, flags, stdev_id, hwpt_id) \
+ EXPECT_ERRNO(_errno, _test_cmd_mock_domain_flags(self->fd, ioas_id, \
+ flags, stdev_id, \
+ hwpt_id, NULL))
+
static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
__u32 *hwpt_id)
{
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags
2023-09-23 1:24 ` [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags Joao Martins
@ 2023-10-13 16:02 ` Jason Gunthorpe
2023-10-13 16:21 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:02 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:57AM +0100, Joao Martins wrote:
> @@ -56,6 +61,13 @@ struct iommu_test_cmd {
> /* out_idev_id is the standard iommufd_bind object */
> __u32 out_idev_id;
> } mock_domain;
> + struct {
> + __u32 out_stdev_id;
> + __u32 out_hwpt_id;
> + __u32 out_idev_id;
> + /* Expand mock_domain to set mock device flags */
> + __u32 dev_flags;
> + } mock_domain_flags;
I wonder if this is really needed? Did this change make the struct
bigger?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags
2023-10-13 16:02 ` Jason Gunthorpe
@ 2023-10-13 16:21 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:21 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:02, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:57AM +0100, Joao Martins wrote:
>
>> @@ -56,6 +61,13 @@ struct iommu_test_cmd {
>> /* out_idev_id is the standard iommufd_bind object */
>> __u32 out_idev_id;
>> } mock_domain;
>> + struct {
>> + __u32 out_stdev_id;
>> + __u32 out_hwpt_id;
>> + __u32 out_idev_id;
>> + /* Expand mock_domain to set mock device flags */
>> + __u32 dev_flags;
>> + } mock_domain_flags;
>
> I wonder if this is really needed? Did this change make the struct
> bigger?
>
This was addressing your earlier comment where I could be breaking ABI, by
adding dev_flags top the end of mock_domain anon struct. So figured the simplest
and less ugly to existing mock_domain flow.
I didn't explicitly checked the size of the struct exactly, but just lookign at
it there's bigger ones for sure that the mock_domain_flags I added. e.g.
check_map with 3 u64, or access_pages
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 06/19] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (4 preceding siblings ...)
2023-09-23 1:24 ` [PATCH v3 05/19] iommufd/selftest: Expand mock_domain with dev_flags Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-09-23 1:24 ` [PATCH v3 07/19] iommufd: Dirty tracking data support Joao Martins
` (14 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
In order to selftest the iommu domain dirty enforcing implement the
mock_domain necessary support and add a new dev_flags to test that the
hwpt_alloc/attach_device fails as expected.
Expand the existing mock_domain fixture with a enforce_dirty test that
exercises the hwpt_alloc and device attachment.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/selftest.c | 36 ++++++++++++++
tools/testing/selftests/iommu/iommufd.c | 49 +++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 3 ++
3 files changed, 88 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 966a98c0935e..4cf5a2b859e7 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -119,6 +119,12 @@ static void mock_domain_blocking_free(struct iommu_domain *domain)
static int mock_domain_nop_attach(struct iommu_domain *domain,
struct device *dev)
{
+ struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
+
+ if (domain->dirty_ops &&
+ (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY))
+ return -EINVAL;
+
return 0;
}
@@ -147,6 +153,26 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type)
return info;
}
+static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
+ bool enable)
+{
+ return 0;
+}
+
+static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty)
+{
+ return 0;
+}
+
+const struct iommu_dirty_ops dirty_ops = {
+ .set_dirty_tracking = mock_domain_set_dirty_tracking,
+ .read_and_clear_dirty = mock_domain_read_and_clear_dirty,
+};
+
+
static const struct iommu_ops mock_ops;
static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
@@ -174,9 +200,16 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type)
static struct iommu_domain *
mock_domain_alloc_user(struct device *dev, u32 flags)
{
+ struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
struct iommu_domain *domain;
+ if ((flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY) &&
+ (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY))
+ return ERR_PTR(-EOPNOTSUPP);
+
domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+ if (domain && !(mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY))
+ domain->dirty_ops = &dirty_ops;
if (!domain)
domain = ERR_PTR(-ENOMEM);
return domain;
@@ -384,6 +417,9 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
struct mock_dev *mdev;
int rc;
+ if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY))
+ return ERR_PTR(-EINVAL);
+
mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
if (!mdev)
return ERR_PTR(-ENOMEM);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 9c129e63d7c7..71ad12867da6 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1430,6 +1430,55 @@ TEST_F(iommufd_mock_domain, alloc_hwpt)
}
}
+FIXTURE(iommufd_dirty_tracking)
+{
+ int fd;
+ uint32_t ioas_id;
+ uint32_t hwpt_id;
+ uint32_t stdev_id;
+ uint32_t idev_id;
+};
+
+FIXTURE_SETUP(iommufd_dirty_tracking)
+{
+ self->fd = open("/dev/iommu", O_RDWR);
+ ASSERT_NE(-1, self->fd);
+
+ test_ioctl_ioas_alloc(&self->ioas_id);
+ test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+ &self->hwpt_id, &self->idev_id);
+}
+
+FIXTURE_TEARDOWN(iommufd_dirty_tracking)
+{
+ teardown_iommufd(self->fd, _metadata);
+}
+
+TEST_F(iommufd_dirty_tracking, enforce_dirty)
+{
+ uint32_t ioas_id, stddev_id, idev_id;
+ uint32_t hwpt_id, _hwpt_id;
+ uint32_t dev_flags;
+
+ /* Regular case */
+ dev_flags = MOCK_FLAGS_DEVICE_NO_DIRTY;
+ test_cmd_hwpt_alloc(self->idev_id, self->ioas_id,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+ test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+ test_err_mock_domain_flags(EINVAL, hwpt_id, dev_flags,
+ &stddev_id, NULL);
+ test_ioctl_destroy(stddev_id);
+ test_ioctl_destroy(hwpt_id);
+
+ /* IOMMU device does not support dirty tracking */
+ test_ioctl_ioas_alloc(&ioas_id);
+ test_cmd_mock_domain_flags(ioas_id, dev_flags,
+ &stddev_id, &_hwpt_id, &idev_id);
+ test_err_hwpt_alloc(EOPNOTSUPP, idev_id, ioas_id,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+ test_ioctl_destroy(stddev_id);
+}
+
/* VFIO compatibility IOCTLs */
TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8e84d2592f2d..930edfe693c7 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -99,6 +99,9 @@ static int _test_cmd_mock_domain_flags(int fd, unsigned int ioas_id,
*idev_id = cmd.mock_domain_flags.out_idev_id;
return 0;
}
+#define test_cmd_mock_domain_flags(ioas_id, flags, stdev_id, hwpt_id, idev_id) \
+ ASSERT_EQ(0, _test_cmd_mock_domain_flags(self->fd, ioas_id, flags, \
+ stdev_id, hwpt_id, idev_id))
#define test_err_mock_domain_flags(_errno, ioas_id, flags, stdev_id, hwpt_id) \
EXPECT_ERRNO(_errno, _test_cmd_mock_domain_flags(self->fd, ioas_id, \
flags, stdev_id, \
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (5 preceding siblings ...)
2023-09-23 1:24 ` [PATCH v3 06/19] iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY Joao Martins
@ 2023-09-23 1:24 ` Joao Martins
2023-09-23 1:40 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 08/19] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
` (13 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:24 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Add an IO pagetable API iopt_read_and_clear_dirty_data() that performs the
reading of dirty IOPTEs for a given IOVA range and then copying back to
userspace bitmap.
Underneath it uses the IOMMU domain kernel API which will read the dirty
bits, as well as atomically clearing the IOPTE dirty bit and flushing the
IOTLB at the end. The IOVA bitmaps usage takes care of the iteration of the
bitmaps user pages efficiently and without copies.
This is in preparation for a set dirty tracking ioctl which will clear
the dirty bits before enabling it.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/io_pagetable.c | 70 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 14 +++++
2 files changed, 84 insertions(+)
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index 3a598182b761..d70617447392 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -15,6 +15,7 @@
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/errno.h>
+#include <uapi/linux/iommufd.h>
#include "io_pagetable.h"
#include "double_span.h"
@@ -412,6 +413,75 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
return 0;
}
+struct iova_bitmap_fn_arg {
+ struct iommu_domain *domain;
+ struct iommu_dirty_bitmap *dirty;
+};
+
+static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
+ unsigned long iova, size_t length,
+ void *opaque)
+{
+ struct iova_bitmap_fn_arg *arg = opaque;
+ struct iommu_domain *domain = arg->domain;
+ struct iommu_dirty_bitmap *dirty = arg->dirty;
+ const struct iommu_dirty_ops *ops = domain->dirty_ops;
+
+ return ops->read_and_clear_dirty(domain, iova, length, 0, dirty);
+}
+
+static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
+ unsigned long flags,
+ struct iommufd_dirty_data *bitmap)
+{
+ const struct iommu_dirty_ops *ops = domain->dirty_ops;
+ struct iommu_iotlb_gather gather;
+ struct iommu_dirty_bitmap dirty;
+ struct iova_bitmap_fn_arg arg;
+ struct iova_bitmap *iter;
+ int ret = 0;
+
+ if (!ops || !ops->read_and_clear_dirty)
+ return -EOPNOTSUPP;
+
+ iter = iova_bitmap_alloc(bitmap->iova, bitmap->length,
+ bitmap->page_size, bitmap->data);
+ if (IS_ERR(iter))
+ return -ENOMEM;
+
+ iommu_dirty_bitmap_init(&dirty, iter, &gather);
+
+ arg.domain = domain;
+ arg.dirty = &dirty;
+ iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
+
+ iommu_iotlb_sync(domain, &gather);
+ iova_bitmap_free(iter);
+
+ return ret;
+}
+
+int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
+ struct iommu_domain *domain,
+ unsigned long flags,
+ struct iommufd_dirty_data *bitmap)
+{
+ unsigned long last_iova, iova = bitmap->iova;
+ unsigned long length = bitmap->length;
+ int ret = -EOPNOTSUPP;
+
+ if ((iova & (iopt->iova_alignment - 1)))
+ return -EINVAL;
+
+ if (check_add_overflow(iova, length - 1, &last_iova))
+ return -EOVERFLOW;
+
+ down_read(&iopt->iova_rwsem);
+ ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
+ up_read(&iopt->iova_rwsem);
+ return ret;
+}
+
int iopt_get_pages(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, struct list_head *pages_list)
{
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 3064997a0181..84ec1df29074 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -8,6 +8,8 @@
#include <linux/xarray.h>
#include <linux/refcount.h>
#include <linux/uaccess.h>
+#include <linux/iommu.h>
+#include <linux/iova_bitmap.h>
struct iommu_domain;
struct iommu_group;
@@ -70,6 +72,18 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
+struct iommufd_dirty_data {
+ unsigned long iova;
+ unsigned long length;
+ unsigned long page_size;
+ unsigned long long *data;
+};
+
+int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
+ struct iommu_domain *domain,
+ unsigned long flags,
+ struct iommufd_dirty_data *bitmap);
+
void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
unsigned long length);
int iopt_table_add_domain(struct io_pagetable *iopt,
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-09-23 1:24 ` [PATCH v3 07/19] iommufd: Dirty tracking data support Joao Martins
@ 2023-09-23 1:40 ` Joao Martins
2023-10-17 12:06 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:40 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 23/09/2023 02:24, Joao Martins wrote:
> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
> + struct iommu_domain *domain,
> + unsigned long flags,
> + struct iommufd_dirty_data *bitmap)
> +{
> + unsigned long last_iova, iova = bitmap->iova;
> + unsigned long length = bitmap->length;
> + int ret = -EOPNOTSUPP;
> +
> + if ((iova & (iopt->iova_alignment - 1)))
> + return -EINVAL;
> +
> + if (check_add_overflow(iova, length - 1, &last_iova))
> + return -EOVERFLOW;
> +
> + down_read(&iopt->iova_rwsem);
> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
> + up_read(&iopt->iova_rwsem);
> + return ret;
> +}
I need to call out that a mistake I made, noticed while submitting. I should be
walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
area::pages. So this is a comment I have to fix for next version. I did that for
clear_dirty but not for this function hence half-backed. There are a lot of
other changes nonetheless, so didn't wanted to break what I had there. Apologies
for the distraction.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-09-23 1:40 ` Joao Martins
@ 2023-10-17 12:06 ` Joao Martins
2023-10-17 15:29 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 12:06 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 23/09/2023 02:40, Joao Martins wrote:
> On 23/09/2023 02:24, Joao Martins wrote:
>> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
>> + struct iommu_domain *domain,
>> + unsigned long flags,
>> + struct iommufd_dirty_data *bitmap)
>> +{
>> + unsigned long last_iova, iova = bitmap->iova;
>> + unsigned long length = bitmap->length;
>> + int ret = -EOPNOTSUPP;
>> +
>> + if ((iova & (iopt->iova_alignment - 1)))
>> + return -EINVAL;
>> +
>> + if (check_add_overflow(iova, length - 1, &last_iova))
>> + return -EOVERFLOW;
>> +
>> + down_read(&iopt->iova_rwsem);
>> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
>> + up_read(&iopt->iova_rwsem);
>> + return ret;
>> +}
>
> I need to call out that a mistake I made, noticed while submitting. I should be
> walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
> area::pages. So this is a comment I have to fix for next version.
Below is how I fixed it.
Essentially the thinking being that the user passes either an mapped IOVA area
it mapped *or* a subset of a mapped IOVA area. This should also allow the
possibility of having multiple threads read dirties from huge IOVA area splitted
in different chunks (in the case it gets splitted into lowest level).
diff --git a/drivers/iommu/iommufd/io_pagetable.c
b/drivers/iommu/iommufd/io_pagetable.c
index 5a35885aef04..991c57458725 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -473,7 +473,9 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
{
unsigned long last_iova, iova = bitmap->iova;
unsigned long length = bitmap->length;
- int ret = -EOPNOTSUPP;
+ struct iopt_area *area;
+ bool found = false;
+ int ret = -EINVAL;
if ((iova & (iopt->iova_alignment - 1)))
return -EINVAL;
@@ -482,7 +484,22 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
return -EOVERFLOW;
down_read(&iopt->iova_rwsem);
- ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
+
+ /* Find the portion of IOVA space belonging to area */
+ while ((area = iopt_area_iter_first(iopt, iova, last_iova))) {
+ unsigned long area_last = iopt_area_last_iova(area);
+ unsigned long area_first = iopt_area_iova(area);
+
+ if (!area->pages)
+ continue;
+
+ found = (iova >= area_first && last_iova <= area_last);
+ if (found)
+ break;
+ }
+
+ if (found)
+ ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
up_read(&iopt->iova_rwsem);
return ret;
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 12:06 ` Joao Martins
@ 2023-10-17 15:29 ` Jason Gunthorpe
2023-10-17 15:51 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:29 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Tue, Oct 17, 2023 at 01:06:12PM +0100, Joao Martins wrote:
> On 23/09/2023 02:40, Joao Martins wrote:
> > On 23/09/2023 02:24, Joao Martins wrote:
> >> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
> >> + struct iommu_domain *domain,
> >> + unsigned long flags,
> >> + struct iommufd_dirty_data *bitmap)
> >> +{
> >> + unsigned long last_iova, iova = bitmap->iova;
> >> + unsigned long length = bitmap->length;
> >> + int ret = -EOPNOTSUPP;
> >> +
> >> + if ((iova & (iopt->iova_alignment - 1)))
> >> + return -EINVAL;
> >> +
> >> + if (check_add_overflow(iova, length - 1, &last_iova))
> >> + return -EOVERFLOW;
> >> +
> >> + down_read(&iopt->iova_rwsem);
> >> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
> >> + up_read(&iopt->iova_rwsem);
> >> + return ret;
> >> +}
> >
> > I need to call out that a mistake I made, noticed while submitting. I should be
> > walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
> > area::pages. So this is a comment I have to fix for next version.
>
> Below is how I fixed it.
>
> Essentially the thinking being that the user passes either an mapped IOVA area
> it mapped *or* a subset of a mapped IOVA area. This should also allow the
> possibility of having multiple threads read dirties from huge IOVA area splitted
> in different chunks (in the case it gets splitted into lowest level).
What happens if the iommu_read_and_clear_dirty is done on unmapped
PTEs? It fails?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 15:29 ` Jason Gunthorpe
@ 2023-10-17 15:51 ` Joao Martins
2023-10-17 16:01 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 15:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 16:29, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 01:06:12PM +0100, Joao Martins wrote:
>> On 23/09/2023 02:40, Joao Martins wrote:
>>> On 23/09/2023 02:24, Joao Martins wrote:
>>>> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
>>>> + struct iommu_domain *domain,
>>>> + unsigned long flags,
>>>> + struct iommufd_dirty_data *bitmap)
>>>> +{
>>>> + unsigned long last_iova, iova = bitmap->iova;
>>>> + unsigned long length = bitmap->length;
>>>> + int ret = -EOPNOTSUPP;
>>>> +
>>>> + if ((iova & (iopt->iova_alignment - 1)))
>>>> + return -EINVAL;
>>>> +
>>>> + if (check_add_overflow(iova, length - 1, &last_iova))
>>>> + return -EOVERFLOW;
>>>> +
>>>> + down_read(&iopt->iova_rwsem);
>>>> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
>>>> + up_read(&iopt->iova_rwsem);
>>>> + return ret;
>>>> +}
>>>
>>> I need to call out that a mistake I made, noticed while submitting. I should be
>>> walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
>>> area::pages. So this is a comment I have to fix for next version.
>>
>> Below is how I fixed it.
>>
>> Essentially the thinking being that the user passes either an mapped IOVA area
>> it mapped *or* a subset of a mapped IOVA area. This should also allow the
>> possibility of having multiple threads read dirties from huge IOVA area splitted
>> in different chunks (in the case it gets splitted into lowest level).
>
> What happens if the iommu_read_and_clear_dirty is done on unmapped
> PTEs? It fails?
If there's no IOPTE or the IOPTE is non-present, it keeps walking to the next
base page (or level-0 IOVA range). For both drivers in this series.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 15:51 ` Joao Martins
@ 2023-10-17 16:01 ` Jason Gunthorpe
2023-10-17 16:51 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 16:01 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Tue, Oct 17, 2023 at 04:51:25PM +0100, Joao Martins wrote:
> On 17/10/2023 16:29, Jason Gunthorpe wrote:
> > On Tue, Oct 17, 2023 at 01:06:12PM +0100, Joao Martins wrote:
> >> On 23/09/2023 02:40, Joao Martins wrote:
> >>> On 23/09/2023 02:24, Joao Martins wrote:
> >>>> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
> >>>> + struct iommu_domain *domain,
> >>>> + unsigned long flags,
> >>>> + struct iommufd_dirty_data *bitmap)
> >>>> +{
> >>>> + unsigned long last_iova, iova = bitmap->iova;
> >>>> + unsigned long length = bitmap->length;
> >>>> + int ret = -EOPNOTSUPP;
> >>>> +
> >>>> + if ((iova & (iopt->iova_alignment - 1)))
> >>>> + return -EINVAL;
> >>>> +
> >>>> + if (check_add_overflow(iova, length - 1, &last_iova))
> >>>> + return -EOVERFLOW;
> >>>> +
> >>>> + down_read(&iopt->iova_rwsem);
> >>>> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
> >>>> + up_read(&iopt->iova_rwsem);
> >>>> + return ret;
> >>>> +}
> >>>
> >>> I need to call out that a mistake I made, noticed while submitting. I should be
> >>> walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
> >>> area::pages. So this is a comment I have to fix for next version.
> >>
> >> Below is how I fixed it.
> >>
> >> Essentially the thinking being that the user passes either an mapped IOVA area
> >> it mapped *or* a subset of a mapped IOVA area. This should also allow the
> >> possibility of having multiple threads read dirties from huge IOVA area splitted
> >> in different chunks (in the case it gets splitted into lowest level).
> >
> > What happens if the iommu_read_and_clear_dirty is done on unmapped
> > PTEs? It fails?
>
> If there's no IOPTE or the IOPTE is non-present, it keeps walking to the next
> base page (or level-0 IOVA range). For both drivers in this series.
Hum, so this check doesn't seem quite right then as it is really an
input validation that the iova is within the tree. It should be able
to span contiguous areas.
Write it with the intersection logic:
for (area = iopt_area_iter_first(iopt, iova, iova_last); area;
area = iopt_area_iter_next(area, iova, iova_last)) {
if (!area->pages)
// fail
if (cur_iova < area_first)
// fail
if (last_iova <= area_last)
// success, do iommu_read_and_clear_dirty()
cur_iova = area_last + 1;
}
// else fail if not success
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 16:01 ` Jason Gunthorpe
@ 2023-10-17 16:51 ` Joao Martins
2023-10-17 17:13 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 16:51 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 17:01, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 04:51:25PM +0100, Joao Martins wrote:
>> On 17/10/2023 16:29, Jason Gunthorpe wrote:
>>> On Tue, Oct 17, 2023 at 01:06:12PM +0100, Joao Martins wrote:
>>>> On 23/09/2023 02:40, Joao Martins wrote:
>>>>> On 23/09/2023 02:24, Joao Martins wrote:
>>>>>> +int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
>>>>>> + struct iommu_domain *domain,
>>>>>> + unsigned long flags,
>>>>>> + struct iommufd_dirty_data *bitmap)
>>>>>> +{
>>>>>> + unsigned long last_iova, iova = bitmap->iova;
>>>>>> + unsigned long length = bitmap->length;
>>>>>> + int ret = -EOPNOTSUPP;
>>>>>> +
>>>>>> + if ((iova & (iopt->iova_alignment - 1)))
>>>>>> + return -EINVAL;
>>>>>> +
>>>>>> + if (check_add_overflow(iova, length - 1, &last_iova))
>>>>>> + return -EOVERFLOW;
>>>>>> +
>>>>>> + down_read(&iopt->iova_rwsem);
>>>>>> + ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
>>>>>> + up_read(&iopt->iova_rwsem);
>>>>>> + return ret;
>>>>>> +}
>>>>>
>>>>> I need to call out that a mistake I made, noticed while submitting. I should be
>>>>> walk over iopt_areas here (or in iommu_read_and_clear_dirty()) to check
>>>>> area::pages. So this is a comment I have to fix for next version.
>>>>
>>>> Below is how I fixed it.
>>>>
>>>> Essentially the thinking being that the user passes either an mapped IOVA area
>>>> it mapped *or* a subset of a mapped IOVA area. This should also allow the
>>>> possibility of having multiple threads read dirties from huge IOVA area splitted
>>>> in different chunks (in the case it gets splitted into lowest level).
>>>
>>> What happens if the iommu_read_and_clear_dirty is done on unmapped
>>> PTEs? It fails?
>>
>> If there's no IOPTE or the IOPTE is non-present, it keeps walking to the next
>> base page (or level-0 IOVA range). For both drivers in this series.
>
> Hum, so this check doesn't seem quite right then as it is really an
> input validation that the iova is within the tree. It should be able
> to span contiguous areas.
>
> Write it with the intersection logic:
>
> for (area = iopt_area_iter_first(iopt, iova, iova_last); area;
> area = iopt_area_iter_next(area, iova, iova_last)) {
> if (!area->pages)
> // fail
>
> if (cur_iova < area_first)
> // fail
>
> if (last_iova <= area_last)
> // success, do iommu_read_and_clear_dirty()
>
> cur_iova = area_last + 1;
> }
>
> // else fail if not success
>
Perhaps that could be rewritten as e.g.
ret = -EINVAL;
iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
// do iommu_read_and_clear_dirty();
}
// else fail.
Though OTOH, the places you wrote as to fail are skipped instead.
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 16:51 ` Joao Martins
@ 2023-10-17 17:13 ` Jason Gunthorpe
2023-10-17 17:30 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 17:13 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Tue, Oct 17, 2023 at 05:51:49PM +0100, Joao Martins wrote:
> Perhaps that could be rewritten as e.g.
>
> ret = -EINVAL;
> iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
> // do iommu_read_and_clear_dirty();
> }
>
> // else fail.
>
> Though OTOH, the places you wrote as to fail are skipped instead.
Yeah, if consolidating the areas isn't important (it probably isn't)
then this is the better API
It does check all the same things: iopt_area_contig_done() will fail
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 17:13 ` Jason Gunthorpe
@ 2023-10-17 17:30 ` Joao Martins
2023-10-17 18:14 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 17:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 18:13, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 05:51:49PM +0100, Joao Martins wrote:
>
>> Perhaps that could be rewritten as e.g.
>>
>> ret = -EINVAL;
>> iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
>> // do iommu_read_and_clear_dirty();
>> }
>>
>> // else fail.
>>
>> Though OTOH, the places you wrote as to fail are skipped instead.
>
> Yeah, if consolidating the areas isn't important (it probably isn't)
> then this is the better API
>
Doing it in a single iommu_read_and_clear_dirty() saves me from manipulating the
bitmap address in an atypical way. Considering that the first bit in each u8 is
the iova we initialize the bitmap, so if I call it in multiple times in a single
IOVA range (in each individual area as I think you suggested) then I need to
align down the iova-length to the minimum granularity of the bitmap, which is an
u8 (32k).
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 07/19] iommufd: Dirty tracking data support
2023-10-17 17:30 ` Joao Martins
@ 2023-10-17 18:14 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 18:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 18:30, Joao Martins wrote:
> On 17/10/2023 18:13, Jason Gunthorpe wrote:
>> On Tue, Oct 17, 2023 at 05:51:49PM +0100, Joao Martins wrote:
>>
>>> Perhaps that could be rewritten as e.g.
>>>
>>> ret = -EINVAL;
>>> iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
>>> // do iommu_read_and_clear_dirty();
>>> }
>>>
>>> // else fail.
>>>
>>> Though OTOH, the places you wrote as to fail are skipped instead.
>>
>> Yeah, if consolidating the areas isn't important (it probably isn't)
>> then this is the better API
>>
>
> Doing it in a single iommu_read_and_clear_dirty() saves me from manipulating the
> bitmap address in an atypical way. Considering that the first bit in each u8 is
> the iova we initialize the bitmap, so if I call it in multiple times in a single
> IOVA range (in each individual area as I think you suggested) then I need to
> align down the iova-length to the minimum granularity of the bitmap, which is an
> u8 (32k).
Or I can do the reverse, which is to iterate the bitmap like right now, and
iterate over individual iopt areas from within the iova-bitmap iterator, and
avoid this. That's should be a lot cleaner:
diff --git a/drivers/iommu/iommufd/io_pagetable.c
b/drivers/iommu/iommufd/io_pagetable.c
index 991c57458725..179afc6b74f2 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -415,6 +415,7 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct
io_pagetable *iopt,
struct iova_bitmap_fn_arg {
unsigned long flags;
+ struct io_pagetable *iopt;
struct iommu_domain *domain;
struct iommu_dirty_bitmap *dirty;
};
@@ -423,16 +424,34 @@ static int __iommu_read_and_clear_dirty(struct iova_bitmap
*bitmap,
unsigned long iova, size_t length,
void *opaque)
{
+ struct iopt_area *area;
+ struct iopt_area_contig_iter iter;
struct iova_bitmap_fn_arg *arg = opaque;
struct iommu_domain *domain = arg->domain;
struct iommu_dirty_bitmap *dirty = arg->dirty;
const struct iommu_dirty_ops *ops = domain->dirty_ops;
unsigned long flags = arg->flags;
+ unsigned long last_iova = iova + length - 1;
+ int ret = -EINVAL;
- return ops->read_and_clear_dirty(domain, iova, length, flags, dirty);
+ iopt_for_each_contig_area(&iter, area, arg->iopt, iova, last_iova) {
+ unsigned long last = min(last_iova, iopt_area_last_iova(area));
+
+ ret = ops->read_and_clear_dirty(domain, iter.cur_iova,
+ last - iter.cur_iova + 1,
+ flags, dirty);
+ if (ret)
+ break;
+ }
+
+ if (!iopt_area_contig_done(&iter))
+ ret = -EINVAL;
+
+ return ret;
}
static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
+ struct io_pagetable *iopt,
unsigned long flags,
struct iommu_hwpt_get_dirty_iova *bitmap)
{
@@ -453,6 +472,7 @@ static int iommu_read_and_clear_dirty(struct iommu_domain
*domain,
iommu_dirty_bitmap_init(&dirty, iter, &gather);
+ arg.iopt = iopt;
arg.flags = flags;
arg.domain = domain;
arg.dirty = &dirty;
^ permalink raw reply related [flat|nested] 140+ messages in thread
* [PATCH v3 08/19] iommufd: Add IOMMU_HWPT_SET_DIRTY
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (6 preceding siblings ...)
2023-09-23 1:24 ` [PATCH v3 07/19] iommufd: Dirty tracking data support Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-10-13 16:13 ` Jason Gunthorpe
2023-09-23 1:25 ` [PATCH v3 09/19] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
` (12 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Every IOMMU driver should be able to implement the needed iommu domain ops
to control dirty tracking.
Connect a hw_pagetable to the IOMMU core dirty tracking ops, specifically
the ability to enable/disable dirty tracking on an IOMMU domain
(hw_pagetable id). To that end add an io_pagetable kernel API to toggle
dirty tracking:
* iopt_set_dirty_tracking(iopt, [domain], state)
The intended caller of this is via the hw_pagetable object that is created.
Internally it will ensure the leftover dirty state is cleared /right
before/ dirty tracking starts. This is also useful for iommu drivers
which may decide that dirty tracking is always-enabled at boot without
wanting to toggle dynamically via corresponding iommu domain op.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/hw_pagetable.c | 21 ++++++++++
drivers/iommu/iommufd/io_pagetable.c | 56 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 11 +++++
drivers/iommu/iommufd/main.c | 3 ++
include/uapi/linux/iommufd.h | 27 ++++++++++++
5 files changed, 118 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 32e259245314..22354b0ba554 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -198,3 +198,24 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
iommufd_put_object(&idev->obj);
return rc;
}
+
+int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_hwpt_set_dirty *cmd = ucmd->cmd;
+ struct iommufd_hw_pagetable *hwpt;
+ struct iommufd_ioas *ioas;
+ int rc = -EOPNOTSUPP;
+ bool enable;
+
+ hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+ if (IS_ERR(hwpt))
+ return PTR_ERR(hwpt);
+
+ ioas = hwpt->ioas;
+ enable = cmd->flags & IOMMU_DIRTY_TRACKING_ENABLED;
+
+ rc = iopt_set_dirty_tracking(&ioas->iopt, hwpt->domain, enable);
+
+ iommufd_put_object(&hwpt->obj);
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index d70617447392..b9e58601d1d4 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -479,6 +479,62 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
down_read(&iopt->iova_rwsem);
ret = iommu_read_and_clear_dirty(domain, flags, bitmap);
up_read(&iopt->iova_rwsem);
+
+ return ret;
+}
+
+static int iopt_clear_dirty_data(struct io_pagetable *iopt,
+ struct iommu_domain *domain)
+{
+ const struct iommu_dirty_ops *ops = domain->dirty_ops;
+ struct iommu_iotlb_gather gather;
+ struct iommu_dirty_bitmap dirty;
+ struct iopt_area *area;
+ int ret = 0;
+
+ lockdep_assert_held_read(&iopt->iova_rwsem);
+
+ iommu_dirty_bitmap_init(&dirty, NULL, &gather);
+
+ for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
+ area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
+ if (!area->pages)
+ continue;
+
+ ret = ops->read_and_clear_dirty(domain,
+ iopt_area_iova(area),
+ iopt_area_length(area), 0,
+ &dirty);
+ if (ret)
+ break;
+ }
+
+ iommu_iotlb_sync(domain, &gather);
+ return ret;
+}
+
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+ struct iommu_domain *domain, bool enable)
+{
+ const struct iommu_dirty_ops *ops = domain->dirty_ops;
+ int ret = 0;
+
+ if (!ops)
+ return -EOPNOTSUPP;
+
+ down_read(&iopt->iova_rwsem);
+
+ /* Clear dirty bits from PTEs to ensure a clean snapshot */
+ if (enable) {
+ ret = iopt_clear_dirty_data(iopt, domain);
+ if (ret)
+ goto out_unlock;
+ }
+
+ ret = ops->set_dirty_tracking(domain, enable);
+
+out_unlock:
+ up_read(&iopt->iova_rwsem);
return ret;
}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 84ec1df29074..1101a1914513 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -10,6 +10,7 @@
#include <linux/uaccess.h>
#include <linux/iommu.h>
#include <linux/iova_bitmap.h>
+#include <uapi/linux/iommufd.h>
struct iommu_domain;
struct iommu_group;
@@ -83,6 +84,8 @@ int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
unsigned long flags,
struct iommufd_dirty_data *bitmap);
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+ struct iommu_domain *domain, bool enable);
void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
unsigned long length);
@@ -254,6 +257,14 @@ struct iommufd_hw_pagetable {
struct list_head hwpt_item;
};
+static inline struct iommufd_hw_pagetable *iommufd_get_hwpt(
+ struct iommufd_ucmd *ucmd, u32 id)
+{
+ return container_of(iommufd_get_object(ucmd->ictx, id,
+ IOMMUFD_OBJ_HW_PAGETABLE),
+ struct iommufd_hw_pagetable, obj);
+}
+int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd);
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
struct iommufd_device *idev, u32 flags,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e71523cbd0de..ec0c34086af3 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -315,6 +315,7 @@ union ucmd_buffer {
struct iommu_ioas_unmap unmap;
struct iommu_option option;
struct iommu_vfio_ioas vfio_ioas;
+ struct iommu_hwpt_set_dirty set_dirty;
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
@@ -358,6 +359,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
val64),
IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
__reserved),
+ IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
+ struct iommu_hwpt_set_dirty, __reserved),
#ifdef CONFIG_IOMMUFD_TEST
IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
#endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index cd94a9d8ce66..37079e72d243 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -47,6 +47,7 @@ enum {
IOMMUFD_CMD_VFIO_IOAS,
IOMMUFD_CMD_HWPT_ALLOC,
IOMMUFD_CMD_GET_HW_INFO,
+ IOMMUFD_CMD_HWPT_SET_DIRTY,
};
/**
@@ -454,4 +455,30 @@ struct iommu_hw_info {
__u32 __reserved;
};
#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
+
+/*
+ * enum iommufd_set_dirty_flags - Flags for steering dirty tracking
+ * @IOMMU_DIRTY_TRACKING_DISABLED: Disables dirty tracking
+ * @IOMMU_DIRTY_TRACKING_ENABLED: Enables dirty tracking
+ */
+enum iommufd_hwpt_set_dirty_flags {
+ IOMMU_DIRTY_TRACKING_DISABLED = 0,
+ IOMMU_DIRTY_TRACKING_ENABLED = 1,
+};
+
+/**
+ * struct iommu_hwpt_set_dirty - ioctl(IOMMU_HWPT_SET_DIRTY)
+ * @size: sizeof(struct iommu_hwpt_set_dirty)
+ * @flags: Flags to control dirty tracking status.
+ * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
+ *
+ * Toggle dirty tracking on an HW pagetable.
+ */
+struct iommu_hwpt_set_dirty {
+ __u32 size;
+ __u32 flags;
+ __u32 hwpt_id;
+ __u32 __reserved;
+};
+#define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
#endif
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 08/19] iommufd: Add IOMMU_HWPT_SET_DIRTY
2023-09-23 1:25 ` [PATCH v3 08/19] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-10-13 16:13 ` Jason Gunthorpe
0 siblings, 0 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:13 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:25:00AM +0100, Joao Martins wrote:
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 32e259245314..22354b0ba554 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -198,3 +198,24 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
> iommufd_put_object(&idev->obj);
> return rc;
> }
> +
> +int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd)
> +{
> + struct iommu_hwpt_set_dirty *cmd = ucmd->cmd;
> + struct iommufd_hw_pagetable *hwpt;
> + struct iommufd_ioas *ioas;
> + int rc = -EOPNOTSUPP;
Default is never used?
> + bool enable;
> +
> + hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
> + if (IS_ERR(hwpt))
> + return PTR_ERR(hwpt);
> +
> + ioas = hwpt->ioas;
> + enable = cmd->flags & IOMMU_DIRTY_TRACKING_ENABLED;
Check that incoming flags are not invalid
if (cmd->flags & ~IOMMU_DIRTY_TRACKING_ENABLED)
return -EOPNOTSUPP
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index e71523cbd0de..ec0c34086af3 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -315,6 +315,7 @@ union ucmd_buffer {
> struct iommu_ioas_unmap unmap;
> struct iommu_option option;
> struct iommu_vfio_ioas vfio_ioas;
> + struct iommu_hwpt_set_dirty set_dirty;
> #ifdef CONFIG_IOMMUFD_TEST
> struct iommu_test_cmd test;
> #endif
> @@ -358,6 +359,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
> val64),
> IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
> __reserved),
> + IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
> + struct iommu_hwpt_set_dirty, __reserved),
> #ifdef CONFIG_IOMMUFD_TEST
> IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
> #endif
These two lists of things are sorted
> +
> +/*
/** ?
> + * enum iommufd_set_dirty_flags - Flags for steering dirty tracking
> + * @IOMMU_DIRTY_TRACKING_DISABLED: Disables dirty tracking
> + * @IOMMU_DIRTY_TRACKING_ENABLED: Enables dirty tracking
> + */
> +enum iommufd_hwpt_set_dirty_flags {
> + IOMMU_DIRTY_TRACKING_DISABLED = 0,
> + IOMMU_DIRTY_TRACKING_ENABLED = 1,
> +};
Probably get rid of disabled and call it _ENABLE so it is actualy a
flag
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 09/19] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (7 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 08/19] iommufd: Add IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
` (11 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Change mock_domain to supporting dirty tracking and add tests to exercise
the new SET_DIRTY API in the iommufd_dirty_tracking selftest fixture.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/selftest.c | 17 +++++++++++++++
tools/testing/selftests/iommu/iommufd.c | 15 +++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++++++++++++
3 files changed, 53 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 4cf5a2b859e7..507ada06d5ad 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -24,6 +24,7 @@ static struct platform_device *selftest_iommu_dev;
size_t iommufd_test_memory_limit = 65536;
enum {
+ MOCK_DIRTY_TRACK = 1,
MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2,
/*
@@ -86,6 +87,7 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
}
struct mock_iommu_domain {
+ unsigned long flags;
struct iommu_domain domain;
struct xarray pfns;
};
@@ -156,6 +158,21 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type)
static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
+ struct mock_iommu_domain *mock =
+ container_of(domain, struct mock_iommu_domain, domain);
+ unsigned long flags = mock->flags;
+
+ if (enable && !domain->dirty_ops)
+ return -EINVAL;
+
+ /* No change? */
+ if (!(enable ^ !!(flags & MOCK_DIRTY_TRACK)))
+ return 0;
+
+ flags = (enable ?
+ flags | MOCK_DIRTY_TRACK : flags & ~MOCK_DIRTY_TRACK);
+
+ mock->flags = flags;
return 0;
}
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 71ad12867da6..dba173d1dd02 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1479,6 +1479,21 @@ TEST_F(iommufd_dirty_tracking, enforce_dirty)
test_ioctl_destroy(stddev_id);
}
+TEST_F(iommufd_dirty_tracking, set_dirty)
+{
+ uint32_t stddev_id;
+ uint32_t hwpt_id;
+
+ test_cmd_hwpt_alloc(self->idev_id, self->ioas_id,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+ test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+ test_cmd_set_dirty(hwpt_id, true);
+ test_cmd_set_dirty(hwpt_id, false);
+
+ test_ioctl_destroy(stddev_id);
+ test_ioctl_destroy(hwpt_id);
+}
+
/* VFIO compatibility IOCTLs */
TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 930edfe693c7..1626e2efbfb1 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -177,9 +177,30 @@ static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
return ret;
return 0;
}
+
+
#define test_cmd_access_replace_ioas(access_id, ioas_id) \
ASSERT_EQ(0, _test_cmd_access_replace_ioas(self->fd, access_id, ioas_id))
+static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
+{
+ struct iommu_hwpt_set_dirty cmd = {
+ .size = sizeof(cmd),
+ .flags = enabled ? IOMMU_DIRTY_TRACKING_ENABLED :
+ IOMMU_DIRTY_TRACKING_DISABLED,
+ .hwpt_id = hwpt_id,
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_HWPT_SET_DIRTY, &cmd);
+ if (ret)
+ return -errno;
+ return 0;
+}
+
+#define test_cmd_set_dirty(hwpt_id, enabled) \
+ ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
+
static int _test_cmd_create_access(int fd, unsigned int ioas_id,
__u32 *access_id, unsigned int flags)
{
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (8 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 09/19] iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-10-13 16:22 ` Jason Gunthorpe
2023-09-23 1:25 ` [PATCH v3 11/19] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
` (10 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Connect a hw_pagetable to the IOMMU core dirty tracking
read_and_clear_dirty iommu domain op. It exposes all of the functionality
for the UAPI that read the dirtied IOVAs while clearing the Dirty bits from
the PTEs
In doing so the previously internal iommufd_dirty_data structure is moved
over as the UAPI intermediate structure for representing iommufd dirty
bitmaps.
Contrary to past incantation of a similar interface in VFIO the IOVA range
to be scanned is tied in to the bitmap size, thus the application needs to
pass a appropriately sized bitmap address taking into account the iova
range being passed *and* page size ... as opposed to allowing
bitmap-iova != iova.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/hw_pagetable.c | 55 +++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 11 ++---
drivers/iommu/iommufd/main.c | 3 ++
include/uapi/linux/iommufd.h | 36 ++++++++++++++++
4 files changed, 98 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 22354b0ba554..a5712992bb4b 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -219,3 +219,58 @@ int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd)
iommufd_put_object(&hwpt->obj);
return rc;
}
+
+int iommufd_check_iova_range(struct iommufd_ioas *ioas,
+ struct iommufd_dirty_data *bitmap)
+{
+ unsigned long pgshift, npages;
+ size_t iommu_pgsize;
+ int rc = -EINVAL;
+
+ pgshift = __ffs(bitmap->page_size);
+ npages = bitmap->length >> pgshift;
+
+ if (!npages || (npages > ULONG_MAX))
+ return rc;
+
+ iommu_pgsize = 1 << __ffs(ioas->iopt.iova_alignment);
+
+ /* allow only smallest supported pgsize */
+ if (bitmap->page_size != iommu_pgsize)
+ return rc;
+
+ if (bitmap->iova & (iommu_pgsize - 1))
+ return rc;
+
+ if (!bitmap->length || bitmap->length & (iommu_pgsize - 1))
+ return rc;
+
+ return 0;
+}
+
+int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd)
+{
+ struct iommu_hwpt_get_dirty_iova *cmd = ucmd->cmd;
+ struct iommufd_hw_pagetable *hwpt;
+ struct iommufd_ioas *ioas;
+ int rc = -EOPNOTSUPP;
+
+ if ((cmd->flags || cmd->__reserved))
+ return -EOPNOTSUPP;
+
+ hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
+ if (IS_ERR(hwpt))
+ return PTR_ERR(hwpt);
+
+ ioas = hwpt->ioas;
+ rc = iommufd_check_iova_range(ioas, &cmd->bitmap);
+ if (rc)
+ goto out_put;
+
+ rc = iopt_read_and_clear_dirty_data(&ioas->iopt, hwpt->domain,
+ cmd->flags, &cmd->bitmap);
+
+out_put:
+ iommufd_put_object(&hwpt->obj);
+ return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 1101a1914513..608bb6eae64b 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -73,13 +73,6 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
unsigned long length, unsigned long *unmapped);
int iopt_unmap_all(struct io_pagetable *iopt, unsigned long *unmapped);
-struct iommufd_dirty_data {
- unsigned long iova;
- unsigned long length;
- unsigned long page_size;
- unsigned long long *data;
-};
-
int iopt_read_and_clear_dirty_data(struct io_pagetable *iopt,
struct iommu_domain *domain,
unsigned long flags,
@@ -239,6 +232,8 @@ int iommufd_option_rlimit_mode(struct iommu_option *cmd,
struct iommufd_ctx *ictx);
int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd);
+int iommufd_check_iova_range(struct iommufd_ioas *ioas,
+ struct iommufd_dirty_data *bitmap);
/*
* A HW pagetable is called an iommu_domain inside the kernel. This user object
@@ -265,6 +260,8 @@ static inline struct iommufd_hw_pagetable *iommufd_get_hwpt(
struct iommufd_hw_pagetable, obj);
}
int iommufd_hwpt_set_dirty(struct iommufd_ucmd *ucmd);
+int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd);
+
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
struct iommufd_device *idev, u32 flags,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index ec0c34086af3..17e356ffdf31 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -316,6 +316,7 @@ union ucmd_buffer {
struct iommu_option option;
struct iommu_vfio_ioas vfio_ioas;
struct iommu_hwpt_set_dirty set_dirty;
+ struct iommu_hwpt_get_dirty_iova get_dirty_iova;
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
@@ -361,6 +362,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
__reserved),
IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
struct iommu_hwpt_set_dirty, __reserved),
+ IOCTL_OP(IOMMU_HWPT_GET_DIRTY_IOVA, iommufd_hwpt_get_dirty_iova,
+ struct iommu_hwpt_get_dirty_iova, bitmap.data),
#ifdef CONFIG_IOMMUFD_TEST
IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
#endif
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 37079e72d243..b35b7d0c4be0 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -48,6 +48,7 @@ enum {
IOMMUFD_CMD_HWPT_ALLOC,
IOMMUFD_CMD_GET_HW_INFO,
IOMMUFD_CMD_HWPT_SET_DIRTY,
+ IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA,
};
/**
@@ -481,4 +482,39 @@ struct iommu_hwpt_set_dirty {
__u32 __reserved;
};
#define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
+
+/**
+ * struct iommufd_dirty_bitmap - Dirty IOVA tracking bitmap
+ * @iova: base IOVA of the bitmap
+ * @length: IOVA size
+ * @page_size: page size granularity of each bit in the bitmap
+ * @data: bitmap where to set the dirty bits. The bitmap bits each
+ * represent a page_size which you deviate from an arbitrary iova.
+ * Checking a given IOVA is dirty:
+ *
+ * data[(iova / page_size) / 64] & (1ULL << (iova % 64))
+ */
+struct iommufd_dirty_data {
+ __aligned_u64 iova;
+ __aligned_u64 length;
+ __aligned_u64 page_size;
+ __aligned_u64 *data;
+};
+
+/**
+ * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
+ * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
+ * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
+ * @flags: Flags to control dirty tracking status.
+ * @bitmap: Bitmap of the range of IOVA to read out
+ */
+struct iommu_hwpt_get_dirty_iova {
+ __u32 size;
+ __u32 hwpt_id;
+ __u32 flags;
+ __u32 __reserved;
+ struct iommufd_dirty_data bitmap;
+};
+#define IOMMU_HWPT_GET_DIRTY_IOVA _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA)
+
#endif
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
2023-09-23 1:25 ` [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
@ 2023-10-13 16:22 ` Jason Gunthorpe
2023-10-13 16:58 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:22 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:25:02AM +0100, Joao Martins wrote:
> +int iommufd_check_iova_range(struct iommufd_ioas *ioas,
> + struct iommufd_dirty_data *bitmap)
> +{
> + unsigned long pgshift, npages;
> + size_t iommu_pgsize;
> + int rc = -EINVAL;
> +
> + pgshift = __ffs(bitmap->page_size);
> + npages = bitmap->length >> pgshift;
> +
> + if (!npages || (npages > ULONG_MAX))
> + return rc;
> +
> + iommu_pgsize = 1 << __ffs(ioas->iopt.iova_alignment);
iova_alignment is not a bitmask, it is the alignment itself, so is
redundant.
> + /* allow only smallest supported pgsize */
> + if (bitmap->page_size != iommu_pgsize)
> + return rc;
!= is smallest?
Why are we restricting this anyhow? I thought the iova bitmap stuff
did all the adaptation automatically?
I can sort of see restricting the start/stop iova
> + if (bitmap->iova & (iommu_pgsize - 1))
> + return rc;
> +
> + if (!bitmap->length || bitmap->length & (iommu_pgsize - 1))
> + return rc;
> +
> + return 0;
> +}
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -316,6 +316,7 @@ union ucmd_buffer {
> struct iommu_option option;
> struct iommu_vfio_ioas vfio_ioas;
> struct iommu_hwpt_set_dirty set_dirty;
> + struct iommu_hwpt_get_dirty_iova get_dirty_iova;
> #ifdef CONFIG_IOMMUFD_TEST
> struct iommu_test_cmd test;
> #endif
> @@ -361,6 +362,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
> __reserved),
> IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
> struct iommu_hwpt_set_dirty, __reserved),
> + IOCTL_OP(IOMMU_HWPT_GET_DIRTY_IOVA, iommufd_hwpt_get_dirty_iova,
> + struct iommu_hwpt_get_dirty_iova, bitmap.data),
Also keep sorted
> #ifdef CONFIG_IOMMUFD_TEST
> IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
> #endif
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 37079e72d243..b35b7d0c4be0 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -48,6 +48,7 @@ enum {
> IOMMUFD_CMD_HWPT_ALLOC,
> IOMMUFD_CMD_GET_HW_INFO,
> IOMMUFD_CMD_HWPT_SET_DIRTY,
> + IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA,
> };
>
> /**
> @@ -481,4 +482,39 @@ struct iommu_hwpt_set_dirty {
> __u32 __reserved;
> };
> #define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
> +
> +/**
> + * struct iommufd_dirty_bitmap - Dirty IOVA tracking bitmap
> + * @iova: base IOVA of the bitmap
> + * @length: IOVA size
> + * @page_size: page size granularity of each bit in the bitmap
> + * @data: bitmap where to set the dirty bits. The bitmap bits each
> + * represent a page_size which you deviate from an arbitrary iova.
> + * Checking a given IOVA is dirty:
> + *
> + * data[(iova / page_size) / 64] & (1ULL << (iova % 64))
> + */
> +struct iommufd_dirty_data {
> + __aligned_u64 iova;
> + __aligned_u64 length;
> + __aligned_u64 page_size;
> + __aligned_u64 *data;
> +};
Is there a reason to add this struct? Does something else use it?
> +/**
> + * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
> + * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
> + * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
> + * @flags: Flags to control dirty tracking status.
> + * @bitmap: Bitmap of the range of IOVA to read out
> + */
> +struct iommu_hwpt_get_dirty_iova {
> + __u32 size;
> + __u32 hwpt_id;
> + __u32 flags;
> + __u32 __reserved;
> + struct iommufd_dirty_data bitmap;
vs inlining here?
I see you are passing it around the internal API, but that could
easily pass the whole command too
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
2023-10-13 16:22 ` Jason Gunthorpe
@ 2023-10-13 16:58 ` Joao Martins
2023-10-13 17:03 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 16:58 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 13/10/2023 17:22, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:25:02AM +0100, Joao Martins wrote:
>
>> +int iommufd_check_iova_range(struct iommufd_ioas *ioas,
>> + struct iommufd_dirty_data *bitmap)
>> +{
>> + unsigned long pgshift, npages;
>> + size_t iommu_pgsize;
>> + int rc = -EINVAL;
>> +
>> + pgshift = __ffs(bitmap->page_size);
>> + npages = bitmap->length >> pgshift;
>> +
>> + if (!npages || (npages > ULONG_MAX))
>> + return rc;
>> +
>> + iommu_pgsize = 1 << __ffs(ioas->iopt.iova_alignment);
>
> iova_alignment is not a bitmask, it is the alignment itself, so is
> redundant.
>
Yes, let me remove it
>> + /* allow only smallest supported pgsize */
>> + if (bitmap->page_size != iommu_pgsize)
>> + return rc;
>
> != is smallest?
>
> Why are we restricting this anyhow? I thought the iova bitmap stuff
> did all the adaptation automatically?
>
yes, it does
> I can sort of see restricting the start/stop iova
>
There's no fundamental reason to restricting it; I am probably just too obsessed
with making the most granular tracking, but I shouldn't restrict the user to
track at some other page granularity
>
>> + if (bitmap->iova & (iommu_pgsize - 1))
>> + return rc;
>> +
>> + if (!bitmap->length || bitmap->length & (iommu_pgsize - 1))
>> + return rc;
>> +
>> + return 0;
>> +}
>
>> --- a/drivers/iommu/iommufd/main.c
>> +++ b/drivers/iommu/iommufd/main.c
>> @@ -316,6 +316,7 @@ union ucmd_buffer {
>> struct iommu_option option;
>> struct iommu_vfio_ioas vfio_ioas;
>> struct iommu_hwpt_set_dirty set_dirty;
>> + struct iommu_hwpt_get_dirty_iova get_dirty_iova;
>> #ifdef CONFIG_IOMMUFD_TEST
>> struct iommu_test_cmd test;
>> #endif
>> @@ -361,6 +362,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>> __reserved),
>> IOCTL_OP(IOMMU_HWPT_SET_DIRTY, iommufd_hwpt_set_dirty,
>> struct iommu_hwpt_set_dirty, __reserved),
>> + IOCTL_OP(IOMMU_HWPT_GET_DIRTY_IOVA, iommufd_hwpt_get_dirty_iova,
>> + struct iommu_hwpt_get_dirty_iova, bitmap.data),
>
> Also keep sorted
>
OK
>> #ifdef CONFIG_IOMMUFD_TEST
>> IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
>> #endif
>> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
>> index 37079e72d243..b35b7d0c4be0 100644
>> --- a/include/uapi/linux/iommufd.h
>> +++ b/include/uapi/linux/iommufd.h
>> @@ -48,6 +48,7 @@ enum {
>> IOMMUFD_CMD_HWPT_ALLOC,
>> IOMMUFD_CMD_GET_HW_INFO,
>> IOMMUFD_CMD_HWPT_SET_DIRTY,
>> + IOMMUFD_CMD_HWPT_GET_DIRTY_IOVA,
>> };
>>
>> /**
>> @@ -481,4 +482,39 @@ struct iommu_hwpt_set_dirty {
>> __u32 __reserved;
>> };
>> #define IOMMU_HWPT_SET_DIRTY _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_SET_DIRTY)
>> +
>> +/**
>> + * struct iommufd_dirty_bitmap - Dirty IOVA tracking bitmap
>> + * @iova: base IOVA of the bitmap
>> + * @length: IOVA size
>> + * @page_size: page size granularity of each bit in the bitmap
>> + * @data: bitmap where to set the dirty bits. The bitmap bits each
>> + * represent a page_size which you deviate from an arbitrary iova.
>> + * Checking a given IOVA is dirty:
>> + *
>> + * data[(iova / page_size) / 64] & (1ULL << (iova % 64))
>> + */
>> +struct iommufd_dirty_data {
>> + __aligned_u64 iova;
>> + __aligned_u64 length;
>> + __aligned_u64 page_size;
>> + __aligned_u64 *data;
>> +};
>
> Is there a reason to add this struct? Does something else use it?
>
I was just reducing how much data I really need to pass around so consolidated
all that into a struct representing the bitmap data considering (...)
>> +/**
>> + * struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
>> + * @size: sizeof(struct iommu_hwpt_get_dirty_iova)
>> + * @hwpt_id: HW pagetable ID that represents the IOMMU domain.
>> + * @flags: Flags to control dirty tracking status.
>> + * @bitmap: Bitmap of the range of IOVA to read out
>> + */
>> +struct iommu_hwpt_get_dirty_iova {
>> + __u32 size;
>> + __u32 hwpt_id;
>> + __u32 flags;
>> + __u32 __reserved;
>> + struct iommufd_dirty_data bitmap;
>
> vs inlining here?
>
> I see you are passing it around the internal API, but that could
> easily pass the whole command too
I use it for the read_and_clear_dirty_data (and it's input validation). Kinda
weird to do:
iommu_read_and_clear(domain, flags, cmd)
Considering none of those functions pass command data around. If you prefer with
passing the whole command then I can go at it;
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
2023-10-13 16:58 ` Joao Martins
@ 2023-10-13 17:03 ` Jason Gunthorpe
0 siblings, 0 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 17:03 UTC (permalink / raw)
To: Joao Martins
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Fri, Oct 13, 2023 at 05:58:43PM +0100, Joao Martins wrote:
> > I can sort of see restricting the start/stop iova
> There's no fundamental reason to restricting it; I am probably just too obsessed
> with making the most granular tracking, but I shouldn't restrict the user to
> track at some other page granularity
I would not restrict it, it makes the ABI less compatbile if you restrict
it.
> > I see you are passing it around the internal API, but that could
> > easily pass the whole command too
>
> I use it for the read_and_clear_dirty_data (and it's input validation). Kinda
> weird to do:
>
> iommu_read_and_clear(domain, flags, cmd)
>
> Considering none of those functions pass command data around. If you prefer with
> passing the whole command then I can go at it;
Compared to adding some weirdness to the uapi header, I would prefer
weirdness in the internal code
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 11/19] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (9 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 10/19] iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 12/19] iommufd: Add capabilities to IOMMU_GET_HW_INFO Joao Martins
` (9 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Add a new test ioctl for simulating the dirty IOVAs in the mock domain, and
implement the mock iommu domain ops that get the dirty tracking supported.
The selftest exercises the usual main workflow of:
1) Setting dirty tracking from the iommu domain
2) Read and clear dirty IOPTEs
Different fixtures will test different IOVA range sizes, that exercise
corner cases of the bitmaps.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/iommufd_test.h | 9 ++
drivers/iommu/iommufd/selftest.c | 88 +++++++++++++-
tools/testing/selftests/iommu/iommufd.c | 99 ++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 111 ++++++++++++++++++
4 files changed, 304 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 9817edcd8968..1f2e93d3d4e8 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -20,6 +20,7 @@ enum {
IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
+ IOMMU_TEST_OP_DIRTY,
};
enum {
@@ -107,6 +108,14 @@ struct iommu_test_cmd {
struct {
__u32 ioas_id;
} access_replace_ioas;
+ struct {
+ __u32 flags;
+ __aligned_u64 iova;
+ __aligned_u64 length;
+ __aligned_u64 page_size;
+ __aligned_u64 uptr;
+ __aligned_u64 out_nr_dirty;
+ } dirty;
};
__u32 last;
};
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 507ada06d5ad..d8fb7328f93c 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -37,6 +37,7 @@ enum {
_MOCK_PFN_START = MOCK_PFN_MASK + 1,
MOCK_PFN_START_IOVA = _MOCK_PFN_START,
MOCK_PFN_LAST_IOVA = _MOCK_PFN_START,
+ MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1,
};
/*
@@ -181,6 +182,31 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
unsigned long flags,
struct iommu_dirty_bitmap *dirty)
{
+ struct mock_iommu_domain *mock =
+ container_of(domain, struct mock_iommu_domain, domain);
+ unsigned long i, max = size / MOCK_IO_PAGE_SIZE;
+ void *ent, *old;
+
+ if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap)
+ return -EINVAL;
+
+ for (i = 0; i < max; i++) {
+ unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE;
+
+ ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
+ if (ent &&
+ (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
+ unsigned long val;
+
+ /* Clear dirty */
+ val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+ old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
+ xa_mk_value(val), GFP_KERNEL);
+ WARN_ON_ONCE(ent != old);
+ iommu_dirty_bitmap_record(dirty, cur, MOCK_IO_PAGE_SIZE);
+ }
+ }
+
return 0;
}
@@ -309,7 +335,7 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
for (cur = 0; cur != pgsize; cur += MOCK_IO_PAGE_SIZE) {
ent = xa_erase(&mock->pfns, iova / MOCK_IO_PAGE_SIZE);
- WARN_ON(!ent);
+
/*
* iommufd generates unmaps that must be a strict
* superset of the map's performend So every starting
@@ -319,12 +345,12 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
* passed to map_pages
*/
if (first) {
- WARN_ON(!(xa_to_value(ent) &
+ WARN_ON(ent && !(xa_to_value(ent) &
MOCK_PFN_START_IOVA));
first = false;
}
if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
- WARN_ON(!(xa_to_value(ent) &
+ WARN_ON(ent && !(xa_to_value(ent) &
MOCK_PFN_LAST_IOVA));
iova += MOCK_IO_PAGE_SIZE;
@@ -1052,6 +1078,56 @@ static_assert((unsigned int)MOCK_ACCESS_RW_WRITE == IOMMUFD_ACCESS_RW_WRITE);
static_assert((unsigned int)MOCK_ACCESS_RW_SLOW_PATH ==
__IOMMUFD_ACCESS_RW_SLOW_PATH);
+static int iommufd_test_dirty(struct iommufd_ucmd *ucmd,
+ unsigned int mockpt_id, unsigned long iova,
+ size_t length, unsigned long page_size,
+ void __user *uptr, u32 flags)
+{
+ unsigned long i, max = length / page_size;
+ struct iommu_test_cmd *cmd = ucmd->cmd;
+ struct iommufd_hw_pagetable *hwpt;
+ struct mock_iommu_domain *mock;
+ int rc, count = 0;
+
+ if (iova % page_size || length % page_size ||
+ (uintptr_t)uptr % page_size)
+ return -EINVAL;
+
+ hwpt = get_md_pagetable(ucmd, mockpt_id, &mock);
+ if (IS_ERR(hwpt))
+ return PTR_ERR(hwpt);
+
+ if (!(mock->flags & MOCK_DIRTY_TRACK)) {
+ rc = -EINVAL;
+ goto out_put;
+ }
+
+ for (i = 0; i < max; i++) {
+ unsigned long cur = iova + i * page_size;
+ void *ent, *old;
+
+ if (!test_bit(i, (unsigned long *) uptr))
+ continue;
+
+ ent = xa_load(&mock->pfns, cur / page_size);
+ if (ent) {
+ unsigned long val;
+
+ val = xa_to_value(ent) | MOCK_PFN_DIRTY_IOVA;
+ old = xa_store(&mock->pfns, cur / page_size,
+ xa_mk_value(val), GFP_KERNEL);
+ WARN_ON_ONCE(ent != old);
+ count++;
+ }
+ }
+
+ cmd->dirty.out_nr_dirty = count;
+ rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+out_put:
+ iommufd_put_object(&hwpt->obj);
+ return rc;
+}
+
void iommufd_selftest_destroy(struct iommufd_object *obj)
{
struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1117,6 +1193,12 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
return -EINVAL;
iommufd_test_memory_limit = cmd->memory_limit.limit;
return 0;
+ case IOMMU_TEST_OP_DIRTY:
+ return iommufd_test_dirty(
+ ucmd, cmd->id, cmd->dirty.iova,
+ cmd->dirty.length, cmd->dirty.page_size,
+ u64_to_user_ptr(cmd->dirty.uptr),
+ cmd->dirty.flags);
default:
return -EOPNOTSUPP;
}
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index dba173d1dd02..6eba8e880a55 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -12,6 +12,7 @@
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
+#define BITS_PER_BYTE 8
static unsigned long get_huge_page_size(void)
{
@@ -1437,13 +1438,47 @@ FIXTURE(iommufd_dirty_tracking)
uint32_t hwpt_id;
uint32_t stdev_id;
uint32_t idev_id;
+ unsigned long page_size;
+ unsigned long bitmap_size;
+ void *bitmap;
+ void *buffer;
+};
+
+FIXTURE_VARIANT(iommufd_dirty_tracking)
+{
+ unsigned long buffer_size;
};
FIXTURE_SETUP(iommufd_dirty_tracking)
{
+ void *vrc;
+ int rc;
+
self->fd = open("/dev/iommu", O_RDWR);
ASSERT_NE(-1, self->fd);
+ rc = posix_memalign(&self->buffer, HUGEPAGE_SIZE, variant->buffer_size);
+ if (rc || !self->buffer) {
+ SKIP(return, "Skipping buffer_size=%lu due to errno=%d",
+ variant->buffer_size, rc);
+ }
+
+ assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
+ vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+ assert(vrc == self->buffer);
+
+ self->page_size = MOCK_PAGE_SIZE;
+ self->bitmap_size = variant->buffer_size /
+ self->page_size / BITS_PER_BYTE;
+
+ /* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */
+ rc = posix_memalign(&self->bitmap, PAGE_SIZE,
+ self->bitmap_size + MOCK_PAGE_SIZE);
+ assert(!rc);
+ assert(self->bitmap);
+ assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
+
test_ioctl_ioas_alloc(&self->ioas_id);
test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
&self->hwpt_id, &self->idev_id);
@@ -1451,9 +1486,41 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
FIXTURE_TEARDOWN(iommufd_dirty_tracking)
{
+ munmap(self->buffer, variant->buffer_size);
+ munmap(self->bitmap, self->bitmap_size);
teardown_iommufd(self->fd, _metadata);
}
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128k)
+{
+ /* one u32 index bitmap */
+ .buffer_size = 128UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256k)
+{
+ /* one u64 index bitmap */
+ .buffer_size = 256UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty640k)
+{
+ /* two u64 index and trailing end bitmap */
+ .buffer_size = 640UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M)
+{
+ /* 4K bitmap (128M IOVA range) */
+ .buffer_size = 128UL * 1024UL * 1024UL,
+};
+
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M)
+{
+ /* 8K bitmap (256M IOVA range) */
+ .buffer_size = 256UL * 1024UL * 1024UL,
+};
+
TEST_F(iommufd_dirty_tracking, enforce_dirty)
{
uint32_t ioas_id, stddev_id, idev_id;
@@ -1494,6 +1561,38 @@ TEST_F(iommufd_dirty_tracking, set_dirty)
test_ioctl_destroy(hwpt_id);
}
+TEST_F(iommufd_dirty_tracking, get_dirty_iova)
+{
+ uint32_t stddev_id;
+ uint32_t hwpt_id;
+ uint32_t ioas_id;
+
+ test_ioctl_ioas_alloc(&ioas_id);
+ test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
+ variant->buffer_size,
+ MOCK_APERTURE_START);
+
+ test_cmd_hwpt_alloc(self->idev_id, ioas_id,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+ test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+
+ test_cmd_set_dirty(hwpt_id, true);
+
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START,
+ self->page_size, self->bitmap,
+ self->bitmap_size, _metadata);
+
+ /* PAGE_SIZE unaligned bitmap */
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START,
+ self->page_size, self->bitmap + MOCK_PAGE_SIZE,
+ self->bitmap_size, _metadata);
+
+ test_ioctl_destroy(stddev_id);
+ test_ioctl_destroy(hwpt_id);
+}
+
/* VFIO compatibility IOCTLs */
TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 1626e2efbfb1..8c0c1bc91128 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -9,6 +9,8 @@
#include <sys/ioctl.h>
#include <stdint.h>
#include <assert.h>
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
#include "../kselftest_harness.h"
#include "../../../../drivers/iommu/iommufd/iommufd_test.h"
@@ -201,6 +203,104 @@ static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
#define test_cmd_set_dirty(hwpt_id, enabled) \
ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
+static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
+ __u64 iova, size_t page_size, __u64 *bitmap)
+{
+ struct iommu_hwpt_get_dirty_iova cmd = {
+ .size = sizeof(cmd),
+ .hwpt_id = hwpt_id,
+ .bitmap = {
+ .iova = iova,
+ .length = length,
+ .page_size = page_size,
+ .data = bitmap,
+ }
+ };
+ int ret;
+
+ ret = ioctl(fd, IOMMU_HWPT_GET_DIRTY_IOVA, &cmd);
+ if (ret)
+ return ret;
+ return 0;
+}
+
+#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap) \
+ ASSERT_EQ(0, _test_cmd_get_dirty_iova(fd, hwpt_id, length, \
+ iova, page_size, bitmap))
+
+static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
+ __u64 iova, size_t page_size,
+ __u64 *bitmap, __u64 *dirty)
+{
+ struct iommu_test_cmd cmd = {
+ .size = sizeof(cmd),
+ .op = IOMMU_TEST_OP_DIRTY,
+ .id = hwpt_id,
+ .dirty = {
+ .iova = iova,
+ .length = length,
+ .page_size = page_size,
+ .uptr = (uintptr_t) bitmap,
+ }
+ };
+ int ret;
+
+ ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_DIRTY), &cmd);
+ if (ret)
+ return -ret;
+ if (dirty)
+ *dirty = cmd.dirty.out_nr_dirty;
+ return 0;
+}
+
+#define test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size, bitmap, nr) \
+ ASSERT_EQ(0, _test_cmd_mock_domain_set_dirty(fd, hwpt_id, \
+ length, iova, \
+ page_size, bitmap, \
+ nr))
+
+static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
+ __u64 iova, size_t page_size,
+ __u64 *bitmap, __u64 bitmap_size,
+ struct __test_metadata *_metadata)
+{
+ unsigned long i, count, nbits = bitmap_size * BITS_PER_BYTE;
+ unsigned long nr = nbits / 2;
+ __u64 out_dirty = 0;
+
+ /* Mark all even bits as dirty in the mock domain */
+ for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
+ if (!(i % 2))
+ __set_bit(i, (unsigned long *) bitmap);
+ ASSERT_EQ(nr, count);
+
+ test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size,
+ bitmap, &out_dirty);
+ ASSERT_EQ(nr, out_dirty);
+
+ /* Expect all even bits as dirty in the user bitmap */
+ memset(bitmap, 0, bitmap_size);
+ test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+ for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
+ ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *) bitmap));
+ ASSERT_EQ(count, out_dirty);
+
+ memset(bitmap, 0, bitmap_size);
+ test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+
+ /* It as read already -- expect all zeroes */
+ for (i = 0; i < nbits; i++)
+ ASSERT_EQ(0, test_bit(i, (unsigned long *) bitmap));
+
+ return 0;
+}
+#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \
+ bitmap_size, _metadata) \
+ ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, \
+ length, iova, \
+ page_size, bitmap, \
+ bitmap_size, _metadata))
+
static int _test_cmd_create_access(int fd, unsigned int ioas_id,
__u32 *access_id, unsigned int flags)
{
@@ -325,6 +425,17 @@ static int _test_ioctl_ioas_map(int fd, unsigned int ioas_id, void *buffer,
IOMMU_IOAS_MAP_READABLE)); \
})
+#define test_ioctl_ioas_map_fixed_id(ioas_id, buffer, length, iova) \
+ ({ \
+ __u64 __iova = iova; \
+ ASSERT_EQ(0, _test_ioctl_ioas_map( \
+ self->fd, ioas_id, buffer, length, \
+ &__iova, \
+ IOMMU_IOAS_MAP_FIXED_IOVA | \
+ IOMMU_IOAS_MAP_WRITEABLE | \
+ IOMMU_IOAS_MAP_READABLE)); \
+ })
+
#define test_err_ioctl_ioas_map_fixed(_errno, buffer, length, iova) \
({ \
__u64 __iova = iova; \
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 12/19] iommufd: Add capabilities to IOMMU_GET_HW_INFO
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (10 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 11/19] iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 13/19] iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO Joao Martins
` (8 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Extend IOMMUFD_CMD_GET_HW_INFO op to query generic iommu capabilities
for a given device.
Capabilities are IOMMU agnostic and use device_iommu_capable() API passing
one of the IOMMU_CAP_*. Enumerate IOMMU_CAP_DIRTY for now in the
out_capabilities field returned back to userspace.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/device.c | 4 ++++
include/uapi/linux/iommufd.h | 11 +++++++++++
2 files changed, 15 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e88fa73a45e6..71ee22dc1a85 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -1185,6 +1185,10 @@ int iommufd_get_hw_info(struct iommufd_ucmd *ucmd)
*/
cmd->data_len = data_len;
+ cmd->out_capabilities = 0;
+ if (device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY))
+ cmd->out_capabilities |= IOMMU_HW_CAP_DIRTY_TRACKING;
+
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
out_free:
kfree(data);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index b35b7d0c4be0..34703683eb8e 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -419,6 +419,14 @@ enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_INTEL_VTD,
};
+/**
+ * enum iommufd_hw_info_capabilities
+ * @IOMMU_CAP_DIRTY_TRACKING: IOMMU hardware support for dirty tracking
+ */
+enum iommufd_hw_capabilities {
+ IOMMU_HW_CAP_DIRTY_TRACKING = 1 << 0,
+};
+
/**
* struct iommu_hw_info - ioctl(IOMMU_GET_HW_INFO)
* @size: sizeof(struct iommu_hw_info)
@@ -430,6 +438,8 @@ enum iommu_hw_info_type {
* the iommu type specific hardware information data
* @out_data_type: Output the iommu hardware info type as defined in the enum
* iommu_hw_info_type.
+ * @out_capabilities: Output the iommu capability info type as defined in the
+ * enum iommu_hw_capabilities.
* @__reserved: Must be 0
*
* Query an iommu type specific hardware information data from an iommu behind
@@ -454,6 +464,7 @@ struct iommu_hw_info {
__aligned_u64 data_uptr;
__u32 out_data_type;
__u32 __reserved;
+ __aligned_u64 out_capabilities;
};
#define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO)
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 13/19] iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (11 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 12/19] iommufd: Add capabilities to IOMMU_GET_HW_INFO Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 14/19] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
` (7 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Enumerate the capabilities from the mock device and test whether it
advertises as expected. Include it as part of the iommufd_dirty_tracking
fixture.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/selftest.c | 13 ++++++++++++-
tools/testing/selftests/iommu/iommufd.c | 17 +++++++++++++++++
.../testing/selftests/iommu/iommufd_fail_nth.c | 2 +-
tools/testing/selftests/iommu/iommufd_utils.h | 14 +++++++++++---
4 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index d8fb7328f93c..6a2c82eed19e 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -375,7 +375,18 @@ static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain,
static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
{
- return cap == IOMMU_CAP_CACHE_COHERENCY;
+ struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
+
+ switch (cap) {
+ case IOMMU_CAP_CACHE_COHERENCY:
+ return true;
+ case IOMMU_CAP_DIRTY:
+ return !(mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY);
+ default:
+ break;
+ }
+
+ return false;
}
static void mock_domain_set_plaform_dma_ops(struct device *dev)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 6eba8e880a55..1005282ced56 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1561,6 +1561,23 @@ TEST_F(iommufd_dirty_tracking, set_dirty)
test_ioctl_destroy(hwpt_id);
}
+TEST_F(iommufd_dirty_tracking, device_dirty_capability)
+{
+ uint32_t caps = 0;
+ uint32_t stddev_id;
+ uint32_t hwpt_id;
+
+ test_cmd_hwpt_alloc(self->idev_id, self->ioas_id, 0, &hwpt_id);
+ test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+ test_cmd_get_hw_capabilities(self->idev_id, caps,
+ IOMMU_HW_CAP_DIRTY_TRACKING);
+ ASSERT_EQ(IOMMU_HW_CAP_DIRTY_TRACKING,
+ caps & IOMMU_HW_CAP_DIRTY_TRACKING);
+
+ test_ioctl_destroy(stddev_id);
+ test_ioctl_destroy(hwpt_id);
+}
+
TEST_F(iommufd_dirty_tracking, get_dirty_iova)
{
uint32_t stddev_id;
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 3d7838506bfe..1fcd69cb0e41 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -612,7 +612,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
&idev_id))
return -1;
- if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info)))
+ if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
return -1;
if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id))
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8c0c1bc91128..0b83cf200e9f 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -525,7 +525,8 @@ static void teardown_iommufd(int fd, struct __test_metadata *_metadata)
/* @data can be NULL */
static int _test_cmd_get_hw_info(int fd, __u32 device_id,
- void *data, size_t data_len)
+ void *data, size_t data_len,
+ uint32_t *capabilities)
{
struct iommu_test_hw_info *info = (struct iommu_test_hw_info *)data;
struct iommu_hw_info cmd = {
@@ -533,6 +534,7 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id,
.dev_id = device_id,
.data_len = data_len,
.data_uptr = (uint64_t)data,
+ .out_capabilities = 0,
};
int ret;
@@ -569,14 +571,20 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id,
assert(!info->flags);
}
+ if (capabilities)
+ *capabilities = cmd.out_capabilities;
+
return 0;
}
#define test_cmd_get_hw_info(device_id, data, data_len) \
ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, \
- data, data_len))
+ data, data_len, NULL))
#define test_err_get_hw_info(_errno, device_id, data, data_len) \
EXPECT_ERRNO(_errno, \
_test_cmd_get_hw_info(self->fd, device_id, \
- data, data_len))
+ data, data_len, NULL))
+
+#define test_cmd_get_hw_capabilities(device_id, caps, mask) \
+ ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 14/19] iommufd: Add a flag to skip clearing of IOPTE dirty
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (12 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 13/19] iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 15/19] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
` (6 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
VFIO has an operation where it unmaps an IOVA while returning a bitmap
with the dirty data. In reality the operation doesn't quite query the IO
pagetables that the PTE was dirty or not. Instead it marks as dirty on
anything that was mapped, and doing so in one syscall.
In IOMMUFD the equivalent is done in two operations by querying with
GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB
flushes given that after clearing dirty bits IOMMU implementations require
invalidating their IOTLB, plus another invalidation needed for the UNMAP.
To allow dirty bits to be queried faster, add a flag
(IOMMU_GET_DIRTY_IOVA_NO_CLEAR) that requests to not clear the dirty bits
from the PTE (but just reading them), under the expectation that the next
operation is the unmap. An alternative is to unmap and just perpectually
mark as dirty as that's the same behaviour as today. So here equivalent
functionally can be provided with unmap alone, and if real dirty info is
required it will amortize the cost while querying.
There's still a race against DMA where in theory the unmap of the IOVA
(when the guest invalidates the IOTLB via emulated iommu) would race
against the VF performing DMA on the same IOVA. As discussed in [0],
we are accepting to resolve this race as throwing away the DMA and it
doesn't matter if it hit physical DRAM or not, the VM can't tell if we
threw it away because the DMA was blocked or because we failed to copy
the DRAM.
[0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/hw_pagetable.c | 3 ++-
drivers/iommu/iommufd/io_pagetable.c | 9 +++++++--
include/uapi/linux/iommufd.h | 12 ++++++++++++
3 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index a5712992bb4b..386cf0e61b4e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -255,7 +255,8 @@ int iommufd_hwpt_get_dirty_iova(struct iommufd_ucmd *ucmd)
struct iommufd_ioas *ioas;
int rc = -EOPNOTSUPP;
- if ((cmd->flags || cmd->__reserved))
+ if ((cmd->flags & ~(IOMMU_GET_DIRTY_IOVA_NO_CLEAR)) ||
+ cmd->__reserved)
return -EOPNOTSUPP;
hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index b9e58601d1d4..e22c17da877c 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -414,6 +414,7 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
}
struct iova_bitmap_fn_arg {
+ unsigned long flags;
struct iommu_domain *domain;
struct iommu_dirty_bitmap *dirty;
};
@@ -426,8 +427,9 @@ static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
struct iommu_domain *domain = arg->domain;
struct iommu_dirty_bitmap *dirty = arg->dirty;
const struct iommu_dirty_ops *ops = domain->dirty_ops;
+ unsigned long flags = arg->flags;
- return ops->read_and_clear_dirty(domain, iova, length, 0, dirty);
+ return ops->read_and_clear_dirty(domain, iova, length, flags, dirty);
}
static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
@@ -451,11 +453,14 @@ static int iommu_read_and_clear_dirty(struct iommu_domain *domain,
iommu_dirty_bitmap_init(&dirty, iter, &gather);
+ arg.flags = flags;
arg.domain = domain;
arg.dirty = &dirty;
iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
- iommu_iotlb_sync(domain, &gather);
+ if (!(flags & IOMMU_DIRTY_NO_CLEAR))
+ iommu_iotlb_sync(domain, &gather);
+
iova_bitmap_free(iter);
return ret;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 34703683eb8e..796ebc7b60ac 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -512,6 +512,18 @@ struct iommufd_dirty_data {
__aligned_u64 *data;
};
+/**
+ * enum iommufd_get_dirty_iova_flags - Flags for getting dirty bits
+ * @IOMMU_GET_DIRTY_IOVA_NO_CLEAR: Just read the PTEs without clearing any dirty
+ * bits metadata. This flag can be passed in the
+ * expectation where the next operation is
+ * an unmap of the same IOVA range.
+ *
+ */
+enum iommufd_hwpt_get_dirty_iova_flags {
+ IOMMU_GET_DIRTY_IOVA_NO_CLEAR = 1,
+};
+
/**
* struct iommu_hwpt_get_dirty_iova - ioctl(IOMMU_HWPT_GET_DIRTY_IOVA)
* @size: sizeof(struct iommu_hwpt_get_dirty_iova)
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 15/19] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (13 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 14/19] iommufd: Add a flag to skip clearing of IOPTE dirty Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-23 1:25 ` [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation Joao Martins
` (5 subsequent siblings)
20 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Change test_mock_dirty_bitmaps() to pass a flag where it specifies the flag
under test. The test does the same thing as the GET_DIRTY_IOVA regular
test. Except that it tests whether the dirtied bits are fetched all the
same a second time, as opposed to observing them cleared.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/iommufd/selftest.c | 15 ++++---
tools/testing/selftests/iommu/iommufd.c | 40 ++++++++++++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++-----
3 files changed, 64 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 6a2c82eed19e..7222e37962b4 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -196,13 +196,16 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
if (ent &&
(xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
- unsigned long val;
-
/* Clear dirty */
- val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
- old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
- xa_mk_value(val), GFP_KERNEL);
- WARN_ON_ONCE(ent != old);
+ if (!(flags & IOMMU_GET_DIRTY_IOVA_NO_CLEAR)) {
+ unsigned long val;
+
+ val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+ old = xa_store(&mock->pfns,
+ cur / MOCK_IO_PAGE_SIZE,
+ xa_mk_value(val), GFP_KERNEL);
+ WARN_ON_ONCE(ent != old);
+ }
iommu_dirty_bitmap_record(dirty, cur, MOCK_IO_PAGE_SIZE);
}
}
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1005282ced56..24211efc1a88 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1598,13 +1598,49 @@ TEST_F(iommufd_dirty_tracking, get_dirty_iova)
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
MOCK_APERTURE_START,
self->page_size, self->bitmap,
- self->bitmap_size, _metadata);
+ self->bitmap_size, 0, _metadata);
/* PAGE_SIZE unaligned bitmap */
test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
MOCK_APERTURE_START,
self->page_size, self->bitmap + MOCK_PAGE_SIZE,
- self->bitmap_size, _metadata);
+ self->bitmap_size, 0, _metadata);
+
+ test_ioctl_destroy(stddev_id);
+ test_ioctl_destroy(hwpt_id);
+}
+
+TEST_F(iommufd_dirty_tracking, get_dirty_iova_no_clear)
+{
+ uint32_t stddev_id;
+ uint32_t hwpt_id;
+ uint32_t ioas_id;
+
+ test_ioctl_ioas_alloc(&ioas_id);
+ test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
+ variant->buffer_size,
+ MOCK_APERTURE_START);
+
+ test_cmd_hwpt_alloc(self->idev_id, ioas_id,
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY, &hwpt_id);
+ test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+
+ test_cmd_set_dirty(hwpt_id, true);
+
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START,
+ self->page_size, self->bitmap,
+ self->bitmap_size,
+ IOMMU_GET_DIRTY_IOVA_NO_CLEAR,
+ _metadata);
+
+ /* Unaligned bitmap */
+ test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+ MOCK_APERTURE_START,
+ self->page_size, self->bitmap + MOCK_PAGE_SIZE,
+ self->bitmap_size,
+ IOMMU_GET_DIRTY_IOVA_NO_CLEAR,
+ _metadata);
test_ioctl_destroy(stddev_id);
test_ioctl_destroy(hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 0b83cf200e9f..e65994fbe91a 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -204,11 +204,13 @@ static int _test_cmd_set_dirty(int fd, __u32 hwpt_id, bool enabled)
ASSERT_EQ(0, _test_cmd_set_dirty(self->fd, hwpt_id, enabled))
static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
- __u64 iova, size_t page_size, __u64 *bitmap)
+ __u64 iova, size_t page_size, __u64 *bitmap,
+ __u32 flags)
{
struct iommu_hwpt_get_dirty_iova cmd = {
.size = sizeof(cmd),
.hwpt_id = hwpt_id,
+ .flags = flags,
.bitmap = {
.iova = iova,
.length = length,
@@ -224,9 +226,10 @@ static int _test_cmd_get_dirty_iova(int fd, __u32 hwpt_id, size_t length,
return 0;
}
-#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap) \
+#define test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap, \
+ flags) \
ASSERT_EQ(0, _test_cmd_get_dirty_iova(fd, hwpt_id, length, \
- iova, page_size, bitmap))
+ iova, page_size, bitmap, flags))
static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
__u64 iova, size_t page_size,
@@ -262,6 +265,7 @@ static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
__u64 iova, size_t page_size,
__u64 *bitmap, __u64 bitmap_size,
+ __u32 flags,
struct __test_metadata *_metadata)
{
unsigned long i, count, nbits = bitmap_size * BITS_PER_BYTE;
@@ -280,26 +284,30 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
/* Expect all even bits as dirty in the user bitmap */
memset(bitmap, 0, bitmap_size);
- test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+ test_cmd_get_dirty_iova(fd, hwpt_id, length, iova,
+ page_size, bitmap, flags);
for (count = 0, i = 0; i < nbits; count += !(i%2), i++)
ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *) bitmap));
ASSERT_EQ(count, out_dirty);
memset(bitmap, 0, bitmap_size);
- test_cmd_get_dirty_iova(fd, hwpt_id, length, iova, page_size, bitmap);
+ test_cmd_get_dirty_iova(fd, hwpt_id, length, iova,
+ page_size, bitmap, flags);
/* It as read already -- expect all zeroes */
- for (i = 0; i < nbits; i++)
- ASSERT_EQ(0, test_bit(i, (unsigned long *) bitmap));
+ for (i = 0; i < nbits; i++) {
+ ASSERT_EQ(!(i % 2) && (flags & IOMMU_GET_DIRTY_IOVA_NO_CLEAR),
+ test_bit(i, (unsigned long *) bitmap));
+ }
return 0;
}
#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap, \
- bitmap_size, _metadata) \
+ bitmap_size, flags, _metadata) \
ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, \
length, iova, \
page_size, bitmap, \
- bitmap_size, _metadata))
+ bitmap_size, flags, _metadata))
static int _test_cmd_create_access(int fd, unsigned int ioas_id,
__u32 *access_id, unsigned int flags)
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (14 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 15/19] iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-10-17 2:00 ` Suthikulpanit, Suravee
2023-09-23 1:25 ` [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
` (4 subsequent siblings)
20 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Add the domain_alloc_user op implementation. To that end, refactor
amd_iommu_domain_alloc() to receive a dev pointer and flags, while
renaming it to .. such that it becomes a common function shared with
domain_alloc_user() implementation. The sole difference with
domain_alloc_user() is that we initialize also other fields that
iommu_domain_alloc() does. It lets it return the iommu domain
correctly initialized in one function.
This is in preparation to add dirty enforcement on AMD implementation
of domain_alloc_user.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/amd/iommu.c | 46 ++++++++++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 95bd7c25ba6f..af36c627022f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -37,6 +37,7 @@
#include <asm/iommu.h>
#include <asm/gart.h>
#include <asm/dma.h>
+#include <uapi/linux/iommufd.h>
#include "amd_iommu.h"
#include "../dma-iommu.h"
@@ -2155,7 +2156,10 @@ static inline u64 dma_max_address(void)
return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
}
-static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
+static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
+ struct amd_iommu *iommu,
+ struct device *dev,
+ u32 flags)
{
struct protection_domain *domain;
@@ -2164,19 +2168,54 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
* default to use IOMMU_DOMAIN_DMA[_FQ].
*/
if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY))
- return NULL;
+ return ERR_PTR(-EINVAL);
domain = protection_domain_alloc(type);
if (!domain)
- return NULL;
+ return ERR_PTR(-ENOMEM);
domain->domain.geometry.aperture_start = 0;
domain->domain.geometry.aperture_end = dma_max_address();
domain->domain.geometry.force_aperture = true;
+ if (dev) {
+ domain->domain.type = type;
+ domain->domain.pgsize_bitmap =
+ iommu->iommu.ops->pgsize_bitmap;
+ domain->domain.ops =
+ iommu->iommu.ops->default_domain_ops;
+ }
+
return &domain->domain;
}
+static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
+{
+ struct iommu_domain *domain;
+
+ domain = do_iommu_domain_alloc(type, NULL, NULL, 0);
+ if (IS_ERR(domain))
+ return NULL;
+
+ return domain;
+}
+
+static struct iommu_domain *amd_iommu_domain_alloc_user(struct device *dev,
+ u32 flags)
+{
+ unsigned int type = IOMMU_DOMAIN_UNMANAGED;
+ struct amd_iommu *iommu;
+
+ iommu = rlookup_amd_iommu(dev);
+ if (!iommu)
+ return ERR_PTR(-ENODEV);
+
+ if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ return do_iommu_domain_alloc(type, iommu, dev, flags);
+}
+
static void amd_iommu_domain_free(struct iommu_domain *dom)
{
struct protection_domain *domain;
@@ -2464,6 +2503,7 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
+ .domain_alloc_user = amd_iommu_domain_alloc_user,
.probe_device = amd_iommu_probe_device,
.release_device = amd_iommu_release_device,
.probe_finalize = amd_iommu_probe_finalize,
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-09-23 1:25 ` [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation Joao Martins
@ 2023-10-17 2:00 ` Suthikulpanit, Suravee
2023-10-17 9:07 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-17 2:00 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
Hi Joao,
On 9/23/2023 8:25 AM, Joao Martins wrote:
> Add the domain_alloc_user op implementation. To that end, refactor
> amd_iommu_domain_alloc() to receive a dev pointer and flags, while
> renaming it to .. such that it becomes a common function shared with
> domain_alloc_user() implementation. The sole difference with
> domain_alloc_user() is that we initialize also other fields that
> iommu_domain_alloc() does. It lets it return the iommu domain
> correctly initialized in one function.
>
> This is in preparation to add dirty enforcement on AMD implementation
> of domain_alloc_user.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/iommu/amd/iommu.c | 46 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 95bd7c25ba6f..af36c627022f 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -37,6 +37,7 @@
> #include <asm/iommu.h>
> #include <asm/gart.h>
> #include <asm/dma.h>
> +#include <uapi/linux/iommufd.h>
>
> #include "amd_iommu.h"
> #include "../dma-iommu.h"
> @@ -2155,7 +2156,10 @@ static inline u64 dma_max_address(void)
> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> }
>
> -static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
> +static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
> + struct amd_iommu *iommu,
> + struct device *dev,
> + u32 flags)
Instead of passing in the struct amd_iommu here, what if we just derive
it in the do_iommu_domain_alloc() as needed? This way, we don't need to
... (see below)
> {
> struct protection_domain *domain;
>
> @@ -2164,19 +2168,54 @@ static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
> * default to use IOMMU_DOMAIN_DMA[_FQ].
> */
> if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY))
> - return NULL;
> + return ERR_PTR(-EINVAL);
>
> domain = protection_domain_alloc(type);
> if (!domain)
> - return NULL;
> + return ERR_PTR(-ENOMEM);
>
> domain->domain.geometry.aperture_start = 0;
> domain->domain.geometry.aperture_end = dma_max_address();
> domain->domain.geometry.force_aperture = true;
>
> + if (dev) {
> + domain->domain.type = type;
> + domain->domain.pgsize_bitmap =
> + iommu->iommu.ops->pgsize_bitmap;
> + domain->domain.ops =
> + iommu->iommu.ops->default_domain_ops;
> + }
> +
> return &domain->domain;
> }
>
> +static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
> +{
> + struct iommu_domain *domain;
> +
> + domain = do_iommu_domain_alloc(type, NULL, NULL, 0);
... pass iommu = NULL here unnecessarily.
> + if (IS_ERR(domain))
> + return NULL;
> +
> + return domain;
> +}
> +
> +static struct iommu_domain *amd_iommu_domain_alloc_user(struct device *dev,
> + u32 flags)
> +{
> + unsigned int type = IOMMU_DOMAIN_UNMANAGED;
> + struct amd_iommu *iommu;
> +
> + iommu = rlookup_amd_iommu(dev);
> + if (!iommu)
> + return ERR_PTR(-ENODEV);
We should not need to derive this here.
Other than this part.
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 2:00 ` Suthikulpanit, Suravee
@ 2023-10-17 9:07 ` Joao Martins
2023-10-17 13:10 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 9:07 UTC (permalink / raw)
To: Suthikulpanit, Suravee, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 17/10/2023 03:00, Suthikulpanit, Suravee wrote:
> Hi Joao,
>
> On 9/23/2023 8:25 AM, Joao Martins wrote:
>> Add the domain_alloc_user op implementation. To that end, refactor
>> amd_iommu_domain_alloc() to receive a dev pointer and flags, while
>> renaming it to .. such that it becomes a common function shared with
>> domain_alloc_user() implementation. The sole difference with
>> domain_alloc_user() is that we initialize also other fields that
>> iommu_domain_alloc() does. It lets it return the iommu domain
>> correctly initialized in one function.
>>
>> This is in preparation to add dirty enforcement on AMD implementation
>> of domain_alloc_user.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/iommu/amd/iommu.c | 46 ++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 43 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index 95bd7c25ba6f..af36c627022f 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -37,6 +37,7 @@
>> #include <asm/iommu.h>
>> #include <asm/gart.h>
>> #include <asm/dma.h>
>> +#include <uapi/linux/iommufd.h>
>> #include "amd_iommu.h"
>> #include "../dma-iommu.h"
>> @@ -2155,7 +2156,10 @@ static inline u64 dma_max_address(void)
>> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
>> }
>> -static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
>> +static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
>> + struct amd_iommu *iommu,
>> + struct device *dev,
>> + u32 flags)
>
> Instead of passing in the struct amd_iommu here, what if we just derive it in
> the do_iommu_domain_alloc() as needed? This way, we don't need to ... (see below)
>
Hmm, you mean to derive amd_iommu from the dev pointer. Yeah, sounds good.
>> {
>> struct protection_domain *domain;
>> @@ -2164,19 +2168,54 @@ static struct iommu_domain
>> *amd_iommu_domain_alloc(unsigned type)
>> * default to use IOMMU_DOMAIN_DMA[_FQ].
>> */
>> if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY))
>> - return NULL;
>> + return ERR_PTR(-EINVAL);
>> domain = protection_domain_alloc(type);
>> if (!domain)
>> - return NULL;
>> + return ERR_PTR(-ENOMEM);
>> domain->domain.geometry.aperture_start = 0;
>> domain->domain.geometry.aperture_end = dma_max_address();
>> domain->domain.geometry.force_aperture = true;
>> + if (dev) {
>> + domain->domain.type = type;
>> + domain->domain.pgsize_bitmap =
>> + iommu->iommu.ops->pgsize_bitmap;
>> + domain->domain.ops =
>> + iommu->iommu.ops->default_domain_ops;
>> + }
>> +
>> return &domain->domain;
>> }
>> +static struct iommu_domain *amd_iommu_domain_alloc(unsigned type)
>> +{
>> + struct iommu_domain *domain;
>> +
>> + domain = do_iommu_domain_alloc(type, NULL, NULL, 0);
>
> ... pass iommu = NULL here unnecessarily.
>
OK
>> + if (IS_ERR(domain))
>> + return NULL;
>> +
>> + return domain;
>> +}
>> +
>> +static struct iommu_domain *amd_iommu_domain_alloc_user(struct device *dev,
>> + u32 flags)
>> +{
>> + unsigned int type = IOMMU_DOMAIN_UNMANAGED;
>> + struct amd_iommu *iommu;
>> +
>> + iommu = rlookup_amd_iommu(dev);
>> + if (!iommu)
>> + return ERR_PTR(-ENODEV);
>
> We should not need to derive this here.
>
> Other than this part.
>
OK, will do.
> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>
Thanks!
Here's the diff
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index af36c627022f..cfc7d2992aa6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2157,11 +2157,17 @@ static inline u64 dma_max_address(void)
}
static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
- struct amd_iommu *iommu,
struct device *dev,
u32 flags)
{
struct protection_domain *domain;
+ struct amd_iommu *iommu = NULL;
+
+ if (dev) {
+ iommu = rlookup_amd_iommu(dev);
+ if (!iommu)
+ return ERR_PTR(-ENODEV);
+ }
/*
* Since DTE[Mode]=0 is prohibited on SNP-enabled system,
@@ -2178,7 +2184,7 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned
int type,
domain->domain.geometry.aperture_end = dma_max_address();
domain->domain.geometry.force_aperture = true;
- if (dev) {
+ if (iommu) {
domain->domain.type = type;
domain->domain.pgsize_bitmap =
iommu->iommu.ops->pgsize_bitmap;
@@ -2193,7 +2199,7 @@ static struct iommu_domain
*amd_iommu_domain_alloc(unsigned type)
{
struct iommu_domain *domain;
- domain = do_iommu_domain_alloc(type, NULL, NULL, 0);
+ domain = do_iommu_domain_alloc(type, NULL, 0);
if (IS_ERR(domain))
return NULL;
@@ -2204,16 +2210,11 @@ static struct iommu_domain
*amd_iommu_domain_alloc_user(struct device *dev,
u32 flags)
{
unsigned int type = IOMMU_DOMAIN_UNMANAGED;
- struct amd_iommu *iommu;
-
- iommu = rlookup_amd_iommu(dev);
- if (!iommu)
- return ERR_PTR(-ENODEV);
if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)
return ERR_PTR(-EOPNOTSUPP);
- return do_iommu_domain_alloc(type, iommu, dev, flags);
+ return do_iommu_domain_alloc(type, dev, flags);
}
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 9:07 ` Joao Martins
@ 2023-10-17 13:10 ` Jason Gunthorpe
2023-10-17 14:14 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 13:10 UTC (permalink / raw)
To: Joao Martins
Cc: Suthikulpanit, Suravee, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On Tue, Oct 17, 2023 at 10:07:11AM +0100, Joao Martins wrote:
>
> static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
> - struct amd_iommu *iommu,
> struct device *dev,
> u32 flags)
> {
> struct protection_domain *domain;
> + struct amd_iommu *iommu = NULL;
> +
> + if (dev) {
> + iommu = rlookup_amd_iommu(dev);
> + if (!iommu)
This really shouldn't be rlookup_amd_iommu, didn't the series fixing
this get merged?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 13:10 ` Jason Gunthorpe
@ 2023-10-17 14:14 ` Joao Martins
2023-10-17 14:37 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suthikulpanit, Suravee, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 17/10/2023 14:10, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 10:07:11AM +0100, Joao Martins wrote:
>>
>> static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
>> - struct amd_iommu *iommu,
>> struct device *dev,
>> u32 flags)
>> {
>> struct protection_domain *domain;
>> + struct amd_iommu *iommu = NULL;
>> +
>> + if (dev) {
>> + iommu = rlookup_amd_iommu(dev);
>> + if (!iommu)
>
> This really shouldn't be rlookup_amd_iommu, didn't the series fixing
> this get merged?
From the latest linux-next, it's still there.
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 14:14 ` Joao Martins
@ 2023-10-17 14:37 ` Joao Martins
2023-10-17 15:32 ` Jason Gunthorpe
2023-10-18 8:29 ` Vasant Hegde
0 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suthikulpanit, Suravee, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 17/10/2023 15:14, Joao Martins wrote:
> On 17/10/2023 14:10, Jason Gunthorpe wrote:
>> On Tue, Oct 17, 2023 at 10:07:11AM +0100, Joao Martins wrote:
>>>
>>> static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
>>> - struct amd_iommu *iommu,
>>> struct device *dev,
>>> u32 flags)
>>> {
>>> struct protection_domain *domain;
>>> + struct amd_iommu *iommu = NULL;
>>> +
>>> + if (dev) {
>>> + iommu = rlookup_amd_iommu(dev);
>>> + if (!iommu)
>>
>> This really shouldn't be rlookup_amd_iommu, didn't the series fixing
>> this get merged?
>
> From the latest linux-next, it's still there.
>
I'm assuming you refer to this new helper:
https://lore.kernel.org/linux-iommu/20231013151652.6008-3-vasant.hegde@amd.com/
But it's part 3 out of a 4-part multi-series; and only the first part has been
merged.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 14:37 ` Joao Martins
@ 2023-10-17 15:32 ` Jason Gunthorpe
2023-10-18 8:29 ` Vasant Hegde
1 sibling, 0 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:32 UTC (permalink / raw)
To: Joao Martins
Cc: Suthikulpanit, Suravee, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On Tue, Oct 17, 2023 at 03:37:47PM +0100, Joao Martins wrote:
> On 17/10/2023 15:14, Joao Martins wrote:
> > On 17/10/2023 14:10, Jason Gunthorpe wrote:
> >> On Tue, Oct 17, 2023 at 10:07:11AM +0100, Joao Martins wrote:
> >>>
> >>> static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
> >>> - struct amd_iommu *iommu,
> >>> struct device *dev,
> >>> u32 flags)
> >>> {
> >>> struct protection_domain *domain;
> >>> + struct amd_iommu *iommu = NULL;
> >>> +
> >>> + if (dev) {
> >>> + iommu = rlookup_amd_iommu(dev);
> >>> + if (!iommu)
> >>
> >> This really shouldn't be rlookup_amd_iommu, didn't the series fixing
> >> this get merged?
> >
> > From the latest linux-next, it's still there.
> >
> I'm assuming you refer to this new helper:
>
> https://lore.kernel.org/linux-iommu/20231013151652.6008-3-vasant.hegde@amd.com/
>
> But it's part 3 out of a 4-part multi-series; and only the first part has been
> merged.
Okay, then nothing to do here :\
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation
2023-10-17 14:37 ` Joao Martins
2023-10-17 15:32 ` Jason Gunthorpe
@ 2023-10-18 8:29 ` Vasant Hegde
1 sibling, 0 replies; 140+ messages in thread
From: Vasant Hegde @ 2023-10-18 8:29 UTC (permalink / raw)
To: Joao Martins, Jason Gunthorpe
Cc: Suthikulpanit, Suravee, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 10/17/2023 8:07 PM, Joao Martins wrote:
> On 17/10/2023 15:14, Joao Martins wrote:
>> On 17/10/2023 14:10, Jason Gunthorpe wrote:
>>> On Tue, Oct 17, 2023 at 10:07:11AM +0100, Joao Martins wrote:
>>>>
>>>> static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
>>>> - struct amd_iommu *iommu,
>>>> struct device *dev,
>>>> u32 flags)
>>>> {
>>>> struct protection_domain *domain;
>>>> + struct amd_iommu *iommu = NULL;
>>>> +
>>>> + if (dev) {
>>>> + iommu = rlookup_amd_iommu(dev);
>>>> + if (!iommu)
>>>
>>> This really shouldn't be rlookup_amd_iommu, didn't the series fixing
>>> this get merged?
>>
>> From the latest linux-next, it's still there.
>>
> I'm assuming you refer to this new helper:
>
> https://lore.kernel.org/linux-iommu/20231013151652.6008-3-vasant.hegde@amd.com/
>
> But it's part 3 out of a 4-part multi-series; and only the first part has been
> merged.
That's correct. Part 2 parts are merged. For now you can use
rlookup_amd_iommu(). I can fix it later.
-Vasant
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (15 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 16/19] iommu/amd: Add domain_alloc_user based domain allocation Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-10-04 17:01 ` Joao Martins
2023-10-17 8:18 ` Suthikulpanit, Suravee
2023-09-23 1:25 ` [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported Joao Martins
` (3 subsequent siblings)
20 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
IOMMU advertises Access/Dirty bits if the extended feature register
reports it. Relevant AMD IOMMU SDM ref[0]
"1.3.8 Enhanced Support for Access and Dirty Bits"
To enable it set the DTE flag in bits 7 and 8 to enable access, or
access+dirty. With that, the IOMMU starts marking the D and A flags on
every Memory Request or ATS translation request. It is on the VMM side
to steer whether to enable dirty tracking or not, rather than wrongly
doing in IOMMU. Relevant AMD IOMMU SDM ref [0], "Table 7. Device Table
Entry (DTE) Field Definitions" particularly the entry "HAD".
To actually toggle on and off it's relatively simple as it's setting
2 bits on DTE and flush the device DTE cache.
To get what's dirtied use existing AMD io-pgtable support, by walking
the pagetables over each IOVA, with fetch_pte(). The IOTLB flushing is
left to the caller (much like unmap), and iommu_dirty_bitmap_record() is
the one adding page-ranges to invalidate. This allows caller to batch
the flush over a big span of IOVA space, without the iommu wondering
about when to flush.
Worthwhile sections from AMD IOMMU SDM:
"2.2.3.1 Host Access Support"
"2.2.3.2 Host Dirty Support"
For details on how IOMMU hardware updates the dirty bit see,
and expects from its consequent clearing by CPU:
"2.2.7.4 Updating Accessed and Dirty Bits in the Guest Address Tables"
"2.2.7.5 Clearing Accessed and Dirty Bits"
Quoting the SDM:
"The setting of accessed and dirty status bits in the page tables is
visible to both the CPU and the peripheral when sharing guest page
tables. The IOMMU interlocked operations to update A and D bits must be
64-bit operations and naturally aligned on a 64-bit boundary"
.. and for the IOMMU update sequence to Dirty bit, essentially is states:
1. Decodes the read and write intent from the memory access.
2. If P=0 in the page descriptor, fail the access.
3. Compare the A & D bits in the descriptor with the read and write
intent in the request.
4. If the A or D bits need to be updated in the descriptor:
* Start atomic operation.
* Read the descriptor as a 64-bit access.
* If the descriptor no longer appears to require an update, release the
atomic lock with
no further action and continue to step 5.
* Calculate the new A & D bits.
* Write the descriptor as a 64-bit access.
* End atomic operation.
5. Continue to the next stage of translation or to the memory access.
Access/Dirty bits readout also need to consider the non-default
page-sizes (aka replicated PTEs as mentined by manual), as AMD
supports all powers of two (except 512G) page sizes.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/amd/amd_iommu_types.h | 12 ++++
drivers/iommu/amd/io_pgtable.c | 84 +++++++++++++++++++++++++
drivers/iommu/amd/iommu.c | 98 +++++++++++++++++++++++++++++
3 files changed, 194 insertions(+)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 7dc30c2b56b3..dec4e5c2b66b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -97,7 +97,9 @@
#define FEATURE_GATS_MASK (3ULL)
#define FEATURE_GAM_VAPIC BIT_ULL(21)
#define FEATURE_GIOSUP BIT_ULL(48)
+#define FEATURE_HASUP BIT_ULL(49)
#define FEATURE_EPHSUP BIT_ULL(50)
+#define FEATURE_HDSUP BIT_ULL(52)
#define FEATURE_SNP BIT_ULL(63)
#define FEATURE_PASID_SHIFT 32
@@ -212,6 +214,7 @@
/* macros and definitions for device table entries */
#define DEV_ENTRY_VALID 0x00
#define DEV_ENTRY_TRANSLATION 0x01
+#define DEV_ENTRY_HAD 0x07
#define DEV_ENTRY_PPR 0x34
#define DEV_ENTRY_IR 0x3d
#define DEV_ENTRY_IW 0x3e
@@ -370,10 +373,16 @@
#define PTE_LEVEL_PAGE_SIZE(level) \
(1ULL << (12 + (9 * (level))))
+/*
+ * The IOPTE dirty bit
+ */
+#define IOMMU_PTE_HD_BIT (6)
+
/*
* Bit value definition for I/O PTE fields
*/
#define IOMMU_PTE_PR BIT_ULL(0)
+#define IOMMU_PTE_HD BIT_ULL(IOMMU_PTE_HD_BIT)
#define IOMMU_PTE_U BIT_ULL(59)
#define IOMMU_PTE_FC BIT_ULL(60)
#define IOMMU_PTE_IR BIT_ULL(61)
@@ -384,6 +393,7 @@
*/
#define DTE_FLAG_V BIT_ULL(0)
#define DTE_FLAG_TV BIT_ULL(1)
+#define DTE_FLAG_HAD (3ULL << 7)
#define DTE_FLAG_GIOV BIT_ULL(54)
#define DTE_FLAG_GV BIT_ULL(55)
#define DTE_GLX_SHIFT (56)
@@ -413,6 +423,7 @@
#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
+#define IOMMU_PTE_DIRTY(pte) ((pte) & IOMMU_PTE_HD)
#define IOMMU_PTE_PAGE(pte) (iommu_phys_to_virt((pte) & IOMMU_PAGE_MASK))
#define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
@@ -563,6 +574,7 @@ struct protection_domain {
int nid; /* Node ID */
u64 *gcr3_tbl; /* Guest CR3 table */
unsigned long flags; /* flags to find out type of domain */
+ bool dirty_tracking; /* dirty tracking is enabled in the domain */
unsigned dev_cnt; /* devices assigned to this domain */
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
};
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 2892aa1b4dc1..099ccb04f52f 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -486,6 +486,89 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
return (__pte & ~offset_mask) | (iova & offset_mask);
}
+static bool pte_test_dirty(u64 *ptep, unsigned long size)
+{
+ bool dirty = false;
+ int i, count;
+
+ /*
+ * 2.2.3.2 Host Dirty Support
+ * When a non-default page size is used , software must OR the
+ * Dirty bits in all of the replicated host PTEs used to map
+ * the page. The IOMMU does not guarantee the Dirty bits are
+ * set in all of the replicated PTEs. Any portion of the page
+ * may have been written even if the Dirty bit is set in only
+ * one of the replicated PTEs.
+ */
+ count = PAGE_SIZE_PTE_COUNT(size);
+ for (i = 0; i < count; i++) {
+ if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
+ dirty = true;
+ break;
+ }
+ }
+
+ return dirty;
+}
+
+static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
+{
+ bool dirty = false;
+ int i, count;
+
+ /*
+ * 2.2.3.2 Host Dirty Support
+ * When a non-default page size is used , software must OR the
+ * Dirty bits in all of the replicated host PTEs used to map
+ * the page. The IOMMU does not guarantee the Dirty bits are
+ * set in all of the replicated PTEs. Any portion of the page
+ * may have been written even if the Dirty bit is set in only
+ * one of the replicated PTEs.
+ */
+ count = PAGE_SIZE_PTE_COUNT(size);
+ for (i = 0; i < count; i++)
+ if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
+ (unsigned long *) &ptep[i]))
+ dirty = true;
+
+ return dirty;
+}
+
+static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty)
+{
+ struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+ unsigned long end = iova + size - 1;
+
+ do {
+ unsigned long pgsize = 0;
+ u64 *ptep, pte;
+
+ ptep = fetch_pte(pgtable, iova, &pgsize);
+ if (ptep)
+ pte = READ_ONCE(*ptep);
+ if (!ptep || !IOMMU_PTE_PRESENT(pte)) {
+ pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0);
+ iova += pgsize;
+ continue;
+ }
+
+ /*
+ * Mark the whole IOVA range as dirty even if only one of
+ * the replicated PTEs were marked dirty.
+ */
+ if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
+ pte_test_dirty(ptep, pgsize)) ||
+ pte_test_and_clear_dirty(ptep, pgsize))
+ iommu_dirty_bitmap_record(dirty, iova, pgsize);
+ iova += pgsize;
+ } while (iova < end);
+
+ return 0;
+}
+
/*
* ----------------------------------------------------
*/
@@ -527,6 +610,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
pgtable->iop.ops.map_pages = iommu_v1_map_pages;
pgtable->iop.ops.unmap_pages = iommu_v1_unmap_pages;
pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
+ pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty;
return &pgtable->iop;
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index af36c627022f..31b333cc6fe1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -66,6 +66,7 @@ LIST_HEAD(hpet_map);
LIST_HEAD(acpihid_map);
const struct iommu_ops amd_iommu_ops;
+const struct iommu_dirty_ops amd_dirty_ops;
static ATOMIC_NOTIFIER_HEAD(ppr_notifier);
int amd_iommu_max_glx_val = -1;
@@ -1611,6 +1612,9 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
pte_root |= 1ULL << DEV_ENTRY_PPR;
}
+ if (domain->dirty_tracking)
+ pte_root |= DTE_FLAG_HAD;
+
if (domain->flags & PD_IOMMUV2_MASK) {
u64 gcr3 = iommu_virt_to_phys(domain->gcr3_tbl);
u64 glx = domain->glx;
@@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
}
+static bool amd_iommu_hd_support(struct amd_iommu *iommu)
+{
+ return iommu && (iommu->features & FEATURE_HDSUP);
+}
+
static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
struct amd_iommu *iommu,
struct device *dev,
u32 flags)
{
+ bool enforce_dirty = flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY;
struct protection_domain *domain;
/*
@@ -2170,6 +2180,9 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
if (amd_iommu_snp_en && (type == IOMMU_DOMAIN_IDENTITY))
return ERR_PTR(-EINVAL);
+ if (enforce_dirty && !amd_iommu_hd_support(iommu))
+ return ERR_PTR(-EOPNOTSUPP);
+
domain = protection_domain_alloc(type);
if (!domain)
return ERR_PTR(-ENOMEM);
@@ -2184,6 +2197,9 @@ static struct iommu_domain *do_iommu_domain_alloc(unsigned int type,
iommu->iommu.ops->pgsize_bitmap;
domain->domain.ops =
iommu->iommu.ops->default_domain_ops;
+
+ if (enforce_dirty)
+ domain->domain.dirty_ops = &amd_dirty_ops;
}
return &domain->domain;
@@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
return 0;
dev_data->defer_attach = false;
+ if (dom->dirty_ops && iommu &&
+ !(iommu->features & FEATURE_HDSUP))
+ return -EINVAL;
if (dev_data->domain)
detach_device(dev);
@@ -2371,6 +2390,11 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
return true;
case IOMMU_CAP_DEFERRED_FLUSH:
return true;
+ case IOMMU_CAP_DIRTY: {
+ struct amd_iommu *iommu = rlookup_amd_iommu(dev);
+
+ return amd_iommu_hd_support(iommu);
+ }
default:
break;
}
@@ -2378,6 +2402,75 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
return false;
}
+static int amd_iommu_set_dirty_tracking(struct iommu_domain *domain,
+ bool enable)
+{
+ struct protection_domain *pdomain = to_pdomain(domain);
+ struct dev_table_entry *dev_table;
+ struct iommu_dev_data *dev_data;
+ struct amd_iommu *iommu;
+ unsigned long flags;
+ u64 pte_root;
+
+ spin_lock_irqsave(&pdomain->lock, flags);
+ if (!(pdomain->dirty_tracking ^ enable)) {
+ spin_unlock_irqrestore(&pdomain->lock, flags);
+ return 0;
+ }
+
+ list_for_each_entry(dev_data, &pdomain->dev_list, list) {
+ iommu = rlookup_amd_iommu(dev_data->dev);
+ if (!iommu)
+ continue;
+
+ dev_table = get_dev_table(iommu);
+ pte_root = dev_table[dev_data->devid].data[0];
+
+ pte_root = (enable ?
+ pte_root | DTE_FLAG_HAD : pte_root & ~DTE_FLAG_HAD);
+
+ /* Flush device DTE */
+ dev_table[dev_data->devid].data[0] = pte_root;
+ device_flush_dte(dev_data);
+ }
+
+ /* Flush IOTLB to mark IOPTE dirty on the next translation(s) */
+ amd_iommu_domain_flush_tlb_pde(pdomain);
+ amd_iommu_domain_flush_complete(pdomain);
+ pdomain->dirty_tracking = enable;
+ spin_unlock_irqrestore(&pdomain->lock, flags);
+
+ return 0;
+}
+
+static int amd_iommu_read_and_clear_dirty(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty)
+{
+ struct protection_domain *pdomain = to_pdomain(domain);
+ struct io_pgtable_ops *ops = &pdomain->iop.iop.ops;
+ unsigned long lflags;
+ int ret;
+
+ if (!ops || !ops->read_and_clear_dirty)
+ return -EOPNOTSUPP;
+
+ spin_lock_irqsave(&pdomain->lock, lflags);
+ if (!pdomain->dirty_tracking && dirty->bitmap) {
+ spin_unlock_irqrestore(&pdomain->lock, lflags);
+ return -EINVAL;
+ }
+ spin_unlock_irqrestore(&pdomain->lock, lflags);
+
+ rcu_read_lock();
+ ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
+ rcu_read_unlock();
+
+ return ret;
+}
+
+
static void amd_iommu_get_resv_regions(struct device *dev,
struct list_head *head)
{
@@ -2500,6 +2593,11 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
return true;
}
+const struct iommu_dirty_ops amd_dirty_ops = {
+ .set_dirty_tracking = amd_iommu_set_dirty_tracking,
+ .read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
+};
+
const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-09-23 1:25 ` [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
@ 2023-10-04 17:01 ` Joao Martins
2023-10-17 8:18 ` Suthikulpanit, Suravee
1 sibling, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-04 17:01 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 23/09/2023 02:25, Joao Martins wrote:
> +static int amd_iommu_read_and_clear_dirty(struct iommu_domain *domain,
> + unsigned long iova, size_t size,
> + unsigned long flags,
> + struct iommu_dirty_bitmap *dirty)
> +{
> + struct protection_domain *pdomain = to_pdomain(domain);
> + struct io_pgtable_ops *ops = &pdomain->iop.iop.ops;
> + unsigned long lflags;
> + int ret;
> +
> + if (!ops || !ops->read_and_clear_dirty)
> + return -EOPNOTSUPP;
> +
> + spin_lock_irqsave(&pdomain->lock, lflags);
> + if (!pdomain->dirty_tracking && dirty->bitmap) {
> + spin_unlock_irqrestore(&pdomain->lock, lflags);
> + return -EINVAL;
> + }
> + spin_unlock_irqrestore(&pdomain->lock, lflags);
> +
> + rcu_read_lock();
> + ret = ops->read_and_clear_dirty(ops, iova, size, flags, dirty);
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +
These rcu_read_{unlock,lock} are spurious given discussion on RFCv2 and is also
removed for v4. I did remove from core code, but not the driver implementations
still had these sprinkled here.
Likewise, for Intel IOMMU implementation as well.
> +
> static void amd_iommu_get_resv_regions(struct device *dev,
> struct list_head *head)
> {
> @@ -2500,6 +2593,11 @@ static bool amd_iommu_enforce_cache_coherency(struct iommu_domain *domain)
> return true;
> }
>
> +const struct iommu_dirty_ops amd_dirty_ops = {
> + .set_dirty_tracking = amd_iommu_set_dirty_tracking,
> + .read_and_clear_dirty = amd_iommu_read_and_clear_dirty,
> +};
> +
> const struct iommu_ops amd_iommu_ops = {
> .capable = amd_iommu_capable,
> .domain_alloc = amd_iommu_domain_alloc,
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-09-23 1:25 ` [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
2023-10-04 17:01 ` Joao Martins
@ 2023-10-17 8:18 ` Suthikulpanit, Suravee
2023-10-17 9:54 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-17 8:18 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
Hi Joao,
On 9/23/2023 8:25 AM, Joao Martins wrote:
> ...
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 2892aa1b4dc1..099ccb04f52f 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -486,6 +486,89 @@ static phys_addr_t iommu_v1_iova_to_phys(struct io_pgtable_ops *ops, unsigned lo
> return (__pte & ~offset_mask) | (iova & offset_mask);
> }
>
> +static bool pte_test_dirty(u64 *ptep, unsigned long size)
> +{
> + bool dirty = false;
> + int i, count;
> +
> + /*
> + * 2.2.3.2 Host Dirty Support
> + * When a non-default page size is used , software must OR the
> + * Dirty bits in all of the replicated host PTEs used to map
> + * the page. The IOMMU does not guarantee the Dirty bits are
> + * set in all of the replicated PTEs. Any portion of the page
> + * may have been written even if the Dirty bit is set in only
> + * one of the replicated PTEs.
> + */
> + count = PAGE_SIZE_PTE_COUNT(size);
> + for (i = 0; i < count; i++) {
> + if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
> + dirty = true;
> + break;
> + }
> + }
> +
> + return dirty;
> +}
> +
> +static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
> +{
> + bool dirty = false;
> + int i, count;
> +
> + /*
> + * 2.2.3.2 Host Dirty Support
> + * When a non-default page size is used , software must OR the
> + * Dirty bits in all of the replicated host PTEs used to map
> + * the page. The IOMMU does not guarantee the Dirty bits are
> + * set in all of the replicated PTEs. Any portion of the page
> + * may have been written even if the Dirty bit is set in only
> + * one of the replicated PTEs.
> + */
> + count = PAGE_SIZE_PTE_COUNT(size);
> + for (i = 0; i < count; i++)
> + if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
> + (unsigned long *) &ptep[i]))
> + dirty = true;
> +
> + return dirty;
> +}
Can we consolidate the two functions above where we can pass the flag
and check if IOMMU_DIRTY_NO_CLEAR is set?
> +
> +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
> + unsigned long iova, size_t size,
> + unsigned long flags,
> + struct iommu_dirty_bitmap *dirty)
> +{
> + struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> + unsigned long end = iova + size - 1;
> +
> + do {
> + unsigned long pgsize = 0;
> + u64 *ptep, pte;
> +
> + ptep = fetch_pte(pgtable, iova, &pgsize);
> + if (ptep)
> + pte = READ_ONCE(*ptep);
> + if (!ptep || !IOMMU_PTE_PRESENT(pte)) {
> + pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0);
> + iova += pgsize;
> + continue;
> + }
> +
> + /*
> + * Mark the whole IOVA range as dirty even if only one of
> + * the replicated PTEs were marked dirty.
> + */
> + if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
> + pte_test_dirty(ptep, pgsize)) ||
> + pte_test_and_clear_dirty(ptep, pgsize))
> + iommu_dirty_bitmap_record(dirty, iova, pgsize);
> + iova += pgsize;
> + } while (iova < end);
> +
> + return 0;
> +}
> +
> /*
> * ----------------------------------------------------
> */
> @@ -527,6 +610,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
> pgtable->iop.ops.map_pages = iommu_v1_map_pages;
> pgtable->iop.ops.unmap_pages = iommu_v1_unmap_pages;
> pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
> + pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty;
>
> return &pgtable->iop;
> }
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index af36c627022f..31b333cc6fe1 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> ....
> @@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
> }
>
> +static bool amd_iommu_hd_support(struct amd_iommu *iommu)
> +{
> + return iommu && (iommu->features & FEATURE_HDSUP);
> +}
> +
You can use the newly introduced check_feature(u64 mask) to check the HD
support.
(See
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec)
> ...
> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
> return 0;
>
> dev_data->defer_attach = false;
> + if (dom->dirty_ops && iommu &&
> + !(iommu->features & FEATURE_HDSUP))
if (dom->dirty_ops && !check_feature(FEATURE_HDSUP))
> + return -EINVAL;
>
> if (dev_data->domain)
> detach_device(dev);
> @@ -2371,6 +2390,11 @@ static bool amd_iommu_capable(struct device *dev, enum iommu_cap cap)
> return true;
> case IOMMU_CAP_DEFERRED_FLUSH:
> return true;
> + case IOMMU_CAP_DIRTY: {
> + struct amd_iommu *iommu = rlookup_amd_iommu(dev);
> +
> + return amd_iommu_hd_support(iommu);
return check_feature(FEATURE_HDSUP);
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 8:18 ` Suthikulpanit, Suravee
@ 2023-10-17 9:54 ` Joao Martins
2023-10-17 18:32 ` Joao Martins
` (2 more replies)
0 siblings, 3 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 9:54 UTC (permalink / raw)
To: Suthikulpanit, Suravee, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 17/10/2023 09:18, Suthikulpanit, Suravee wrote:
> Hi Joao,
>
> On 9/23/2023 8:25 AM, Joao Martins wrote:
>> ...
>> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
>> index 2892aa1b4dc1..099ccb04f52f 100644
>> --- a/drivers/iommu/amd/io_pgtable.c
>> +++ b/drivers/iommu/amd/io_pgtable.c
>> @@ -486,6 +486,89 @@ static phys_addr_t iommu_v1_iova_to_phys(struct
>> io_pgtable_ops *ops, unsigned lo
>> return (__pte & ~offset_mask) | (iova & offset_mask);
>> }
>> +static bool pte_test_dirty(u64 *ptep, unsigned long size)
>> +{
>> + bool dirty = false;
>> + int i, count;
>> +
>> + /*
>> + * 2.2.3.2 Host Dirty Support
>> + * When a non-default page size is used , software must OR the
>> + * Dirty bits in all of the replicated host PTEs used to map
>> + * the page. The IOMMU does not guarantee the Dirty bits are
>> + * set in all of the replicated PTEs. Any portion of the page
>> + * may have been written even if the Dirty bit is set in only
>> + * one of the replicated PTEs.
>> + */
>> + count = PAGE_SIZE_PTE_COUNT(size);
>> + for (i = 0; i < count; i++) {
>> + if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
>> + dirty = true;
>> + break;
>> + }
>> + }
>> +
>> + return dirty;
>> +}
>> +
>> +static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
>> +{
>> + bool dirty = false;
>> + int i, count;
>> +
>> + /*
>> + * 2.2.3.2 Host Dirty Support
>> + * When a non-default page size is used , software must OR the
>> + * Dirty bits in all of the replicated host PTEs used to map
>> + * the page. The IOMMU does not guarantee the Dirty bits are
>> + * set in all of the replicated PTEs. Any portion of the page
>> + * may have been written even if the Dirty bit is set in only
>> + * one of the replicated PTEs.
>> + */
>> + count = PAGE_SIZE_PTE_COUNT(size);
>> + for (i = 0; i < count; i++)
>> + if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
>> + (unsigned long *) &ptep[i]))
>> + dirty = true;
>> +
>> + return dirty;
>> +}
>
> Can we consolidate the two functions above where we can pass the flag and check
> if IOMMU_DIRTY_NO_CLEAR is set?
>
I guess so yes -- it was initially to have an efficient tight loop to check all
replicated PTEs, but I think I found a way to merge everything e.g.
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 099ccb04f52f..953f867b4943 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -486,8 +486,10 @@ static phys_addr_t iommu_v1_iova_to_phys(struct
io_pgtable_ops *ops, unsigned lo
return (__pte & ~offset_mask) | (iova & offset_mask);
}
-static bool pte_test_dirty(u64 *ptep, unsigned long size)
+static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size,
+ unsigned long flags)
{
+ bool test_only = flags & IOMMU_DIRTY_NO_CLEAR;
bool dirty = false;
int i, count;
@@ -501,35 +503,20 @@ static bool pte_test_dirty(u64 *ptep, unsigned long size)
* one of the replicated PTEs.
*/
count = PAGE_SIZE_PTE_COUNT(size);
- for (i = 0; i < count; i++) {
- if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
+ for (i = 0; i < count && test_only; i++) {
+ if (test_bit(IOMMU_PTE_HD_BIT,
+ (unsigned long *) &ptep[i])) {
dirty = true;
break;
}
}
- return dirty;
-}
-
-static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
-{
- bool dirty = false;
- int i, count;
-
- /*
- * 2.2.3.2 Host Dirty Support
- * When a non-default page size is used , software must OR the
- * Dirty bits in all of the replicated host PTEs used to map
- * the page. The IOMMU does not guarantee the Dirty bits are
- * set in all of the replicated PTEs. Any portion of the page
- * may have been written even if the Dirty bit is set in only
- * one of the replicated PTEs.
- */
- count = PAGE_SIZE_PTE_COUNT(size);
- for (i = 0; i < count; i++)
+ for (i = 0; i < count && !test_only; i++) {
if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
- (unsigned long *) &ptep[i]))
+ (unsigned long *) &ptep[i])) {
dirty = true;
+ }
+ }
return dirty;
}
@@ -559,9 +546,7 @@ static int iommu_v1_read_and_clear_dirty(struct
io_pgtable_ops *ops,
* Mark the whole IOVA range as dirty even if only one of
* the replicated PTEs were marked dirty.
*/
- if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
- pte_test_dirty(ptep, pgsize)) ||
- pte_test_and_clear_dirty(ptep, pgsize))
+ if (pte_test_and_clear_dirty(ptep, pgsize, flags))
iommu_dirty_bitmap_record(dirty, iova, pgsize);
iova += pgsize;
} while (iova < end);
>> +
>> +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
>> + unsigned long iova, size_t size,
>> + unsigned long flags,
>> + struct iommu_dirty_bitmap *dirty)
>> +{
>> + struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
>> + unsigned long end = iova + size - 1;
>> +
>> + do {
>> + unsigned long pgsize = 0;
>> + u64 *ptep, pte;
>> +
>> + ptep = fetch_pte(pgtable, iova, &pgsize);
>> + if (ptep)
>> + pte = READ_ONCE(*ptep);
>> + if (!ptep || !IOMMU_PTE_PRESENT(pte)) {
>> + pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0);
>> + iova += pgsize;
>> + continue;
>> + }
>> +
>> + /*
>> + * Mark the whole IOVA range as dirty even if only one of
>> + * the replicated PTEs were marked dirty.
>> + */
>> + if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
>> + pte_test_dirty(ptep, pgsize)) ||
>> + pte_test_and_clear_dirty(ptep, pgsize))
>> + iommu_dirty_bitmap_record(dirty, iova, pgsize);
>> + iova += pgsize;
>> + } while (iova < end);
>> +
You earlier point made me discover that the test-only case might end up clearing
the PTE unnecessarily. But I have addressed it in the previous comment
>> + return 0;
>> +}
>> +
>> /*
>> * ----------------------------------------------------
>> */
>> @@ -527,6 +610,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct
>> io_pgtable_cfg *cfg, void *coo
>> pgtable->iop.ops.map_pages = iommu_v1_map_pages;
>> pgtable->iop.ops.unmap_pages = iommu_v1_unmap_pages;
>> pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
>> + pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty;
>> return &pgtable->iop;
>> }
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index af36c627022f..31b333cc6fe1 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> ....
>> @@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
>> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
>> }
>> +static bool amd_iommu_hd_support(struct amd_iommu *iommu)
>> +{
>> + return iommu && (iommu->features & FEATURE_HDSUP);
>> +}
>> +
>
> You can use the newly introduced check_feature(u64 mask) to check the HD support.
>
It appears that the check_feature() is logically equivalent to
check_feature_on_all_iommus(); where this check is per-device/per-iommu check to
support potentially nature of different IOMMUs with different features. Being
per-IOMMU would allow you to have firmware to not advertise certain IOMMU
features on some devices while still supporting for others. I understand this is
not a thing in x86, but the UAPI supports it. Having said that, you still want
me to switch to check_feature() ?
I think iommufd tree next branch is still in v6.6-rc2, so I am not sure I can
really use check_feature() yet without leading Jason individual branch into
compile errors. This all eventually gets merged into linux-next daily, but my
impression is that individual maintainer's next is compilable? Worst case I
submit a follow-up post merge cleanup to switch to check_feature()? [I can't use
use check_feature_on_all_iommus() as that's removed by this commit below]
> (See
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec)
>
>> ...
>> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain
>> *dom,
>> return 0;
>> dev_data->defer_attach = false;
>> + if (dom->dirty_ops && iommu &&
>> + !(iommu->features & FEATURE_HDSUP))
>
> if (dom->dirty_ops && !check_feature(FEATURE_HDSUP))
>
OK -- will switch depending on above paragraph
>> + return -EINVAL;
>> if (dev_data->domain)
>> detach_device(dev);
>> @@ -2371,6 +2390,11 @@ static bool amd_iommu_capable(struct device *dev, enum
>> iommu_cap cap)
>> return true;
>> case IOMMU_CAP_DEFERRED_FLUSH:
>> return true;
>> + case IOMMU_CAP_DIRTY: {
>> + struct amd_iommu *iommu = rlookup_amd_iommu(dev);
>> +
>> + return amd_iommu_hd_support(iommu);
>
> return check_feature(FEATURE_HDSUP);
>
Likewise
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 9:54 ` Joao Martins
@ 2023-10-17 18:32 ` Joao Martins
2023-10-17 18:49 ` Jason Gunthorpe
2023-10-18 11:46 ` Suthikulpanit, Suravee
2023-10-18 13:04 ` Suthikulpanit, Suravee
2 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 18:32 UTC (permalink / raw)
To: Suthikulpanit, Suravee, Jason Gunthorpe
Cc: Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm, iommu
Hey Suravee,
On 17/10/2023 10:54, Joao Martins wrote:
> On 17/10/2023 09:18, Suthikulpanit, Suravee wrote:
>> On 9/23/2023 8:25 AM, Joao Martins wrote:
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * ----------------------------------------------------
>>> */
>>> @@ -527,6 +610,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct
>>> io_pgtable_cfg *cfg, void *coo
>>> pgtable->iop.ops.map_pages = iommu_v1_map_pages;
>>> pgtable->iop.ops.unmap_pages = iommu_v1_unmap_pages;
>>> pgtable->iop.ops.iova_to_phys = iommu_v1_iova_to_phys;
>>> + pgtable->iop.ops.read_and_clear_dirty = iommu_v1_read_and_clear_dirty;
>>> return &pgtable->iop;
>>> }
>>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>>> index af36c627022f..31b333cc6fe1 100644
>>> --- a/drivers/iommu/amd/iommu.c
>>> +++ b/drivers/iommu/amd/iommu.c
>>> ....
>>> @@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
>>> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
>>> }
>>> +static bool amd_iommu_hd_support(struct amd_iommu *iommu)
>>> +{
>>> + return iommu && (iommu->features & FEATURE_HDSUP);
>>> +}
>>> +
>>
>> You can use the newly introduced check_feature(u64 mask) to check the HD support.
>>
>
> It appears that the check_feature() is logically equivalent to
> check_feature_on_all_iommus(); where this check is per-device/per-iommu check to
> support potentially nature of different IOMMUs with different features. Being
> per-IOMMU would allow you to have firmware to not advertise certain IOMMU
> features on some devices while still supporting for others. I understand this is
> not a thing in x86, but the UAPI supports it. Having said that, you still want
> me to switch to check_feature() ?
>
Do you have a strong preference here between current code and switching to
global check via check_feature() ?
> I think iommufd tree next branch is still in v6.6-rc2, so I am not sure I can
> really use check_feature() yet without leading Jason individual branch into
> compile errors. This all eventually gets merged into linux-next daily, but my
> impression is that individual maintainer's next is compilable? Worst case I
> submit a follow-up post merge cleanup to switch to check_feature()? [I can't use
> use check_feature_on_all_iommus() as that's removed by this commit below]
>
Jason, how do we usually handle this cross trees? check_feature() doesn't exist
in your tree, but it does in Joerg's tree; meanwhile
check_feature_on_all_iommus() gets renamed to check_feature(). Should I need to
go with it, do I rebase against linux-next? I have been assuming that your tree
must compile; or worst-case different maintainer pull each other's trees.
Alternatively: I can check the counter directly to replicate the amd_iommu_efr
check under the current helper I made (amd_iommu_hd_support) and then change it
after the fact... That should lead to less dependencies?
>> (See
>> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec)
>>
>>> ...
>>> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain
>>> *dom,
>>> return 0;
>>> dev_data->defer_attach = false;
>>> + if (dom->dirty_ops && iommu &&
>>> + !(iommu->features & FEATURE_HDSUP))
>>
>> if (dom->dirty_ops && !check_feature(FEATURE_HDSUP))
>>
> OK -- will switch depending on above paragraph
>
>>> + return -EINVAL;
>>> if (dev_data->domain)
>>> detach_device(dev);
>>> @@ -2371,6 +2390,11 @@ static bool amd_iommu_capable(struct device *dev, enum
>>> iommu_cap cap)
>>> return true;
>>> case IOMMU_CAP_DEFERRED_FLUSH:
>>> return true;
>>> + case IOMMU_CAP_DIRTY: {
>>> + struct amd_iommu *iommu = rlookup_amd_iommu(dev);
>>> +
>>> + return amd_iommu_hd_support(iommu);
>>
>> return check_feature(FEATURE_HDSUP);
>>
> Likewise
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 18:32 ` Joao Martins
@ 2023-10-17 18:49 ` Jason Gunthorpe
2023-10-17 19:03 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 18:49 UTC (permalink / raw)
To: Joao Martins
Cc: Suthikulpanit, Suravee, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Will Deacon, Robin Murphy, Alex Williamson, kvm, iommu
On Tue, Oct 17, 2023 at 07:32:31PM +0100, Joao Martins wrote:
> Jason, how do we usually handle this cross trees? check_feature() doesn't exist
> in your tree, but it does in Joerg's tree; meanwhile
> check_feature_on_all_iommus() gets renamed to check_feature(). Should I need to
> go with it, do I rebase against linux-next? I have been assuming that your tree
> must compile; or worst-case different maintainer pull each other's trees.
We didn't make any special preparation to speed this, so I would wait
till next cycle to take the AMD patches
Thus we should look at the vt-d patches if this is to go in this
cycle.
> Alternatively: I can check the counter directly to replicate the amd_iommu_efr
> check under the current helper I made (amd_iommu_hd_support) and then change it
> after the fact... That should lead to less dependencies?
Or this
We are fast running out of time though :)
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 18:49 ` Jason Gunthorpe
@ 2023-10-17 19:03 ` Joao Martins
2023-10-17 22:04 ` Joao Martins
2023-10-18 20:40 ` Joao Martins
0 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 19:03 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suthikulpanit, Suravee, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Will Deacon, Robin Murphy, Alex Williamson, kvm, iommu
On 17/10/2023 19:49, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 07:32:31PM +0100, Joao Martins wrote:
>
>> Jason, how do we usually handle this cross trees? check_feature() doesn't exist
>> in your tree, but it does in Joerg's tree; meanwhile
>> check_feature_on_all_iommus() gets renamed to check_feature(). Should I need to
>> go with it, do I rebase against linux-next? I have been assuming that your tree
>> must compile; or worst-case different maintainer pull each other's trees.
>
> We didn't make any special preparation to speed this, so I would wait
> till next cycle to take the AMD patches
>
> Thus we should look at the vt-d patches if this is to go in this
> cycle.
>
>> Alternatively: I can check the counter directly to replicate the amd_iommu_efr
>> check under the current helper I made (amd_iommu_hd_support) and then change it
>> after the fact... That should lead to less dependencies?
>
> Or this
>
I think I'll go with this (once Suravee responds)
> We are fast running out of time though :)
Yeah, I know :( I am trying to get this out tomorrow
Still trying to get the AMD patches too, as that's the hardware I have been
testing (and has more mass for external people to play around) and I also have a
higher degree of confidence there.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 19:03 ` Joao Martins
@ 2023-10-17 22:04 ` Joao Martins
2023-10-18 11:47 ` Suthikulpanit, Suravee
2023-10-18 20:40 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 22:04 UTC (permalink / raw)
To: Jason Gunthorpe, Suthikulpanit, Suravee
Cc: Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm, iommu
On 17/10/2023 20:03, Joao Martins wrote:
> On 17/10/2023 19:49, Jason Gunthorpe wrote:
>> On Tue, Oct 17, 2023 at 07:32:31PM +0100, Joao Martins wrote:
>>
>>> Jason, how do we usually handle this cross trees? check_feature() doesn't exist
>>> in your tree, but it does in Joerg's tree; meanwhile
>>> check_feature_on_all_iommus() gets renamed to check_feature(). Should I need to
>>> go with it, do I rebase against linux-next? I have been assuming that your tree
>>> must compile; or worst-case different maintainer pull each other's trees.
>>
>> We didn't make any special preparation to speed this, so I would wait
>> till next cycle to take the AMD patches
>>
>> Thus we should look at the vt-d patches if this is to go in this
>> cycle.
>>
>>> Alternatively: I can check the counter directly to replicate the amd_iommu_efr
>>> check under the current helper I made (amd_iommu_hd_support) and then change it
>>> after the fact... That should lead to less dependencies?
>>
>> Or this
>>
> I think I'll go with this (once Suravee responds)
>
Or just keep current code -- which is valid -- at this point and doesn't involve
replicating anything
>> We are fast running out of time though :)
>
> Yeah, I know :( I am trying to get this out tomorrow
>
> Still trying to get the AMD patches too, as that's the hardware I have been
> testing (and has more mass for external people to play around) and I also have a
> higher degree of confidence there.
>
> Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 22:04 ` Joao Martins
@ 2023-10-18 11:47 ` Suthikulpanit, Suravee
0 siblings, 0 replies; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-18 11:47 UTC (permalink / raw)
To: Joao Martins, Jason Gunthorpe
Cc: Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm, iommu
Hi Joao,
On 10/18/2023 5:04 AM, Joao Martins wrote:
> On 17/10/2023 20:03, Joao Martins wrote:
>> On 17/10/2023 19:49, Jason Gunthorpe wrote:
>>> On Tue, Oct 17, 2023 at 07:32:31PM +0100, Joao Martins wrote:
>>>
>>>> Jason, how do we usually handle this cross trees? check_feature() doesn't exist
>>>> in your tree, but it does in Joerg's tree; meanwhile
>>>> check_feature_on_all_iommus() gets renamed to check_feature(). Should I need to
>>>> go with it, do I rebase against linux-next? I have been assuming that your tree
>>>> must compile; or worst-case different maintainer pull each other's trees.
>>>
>>> We didn't make any special preparation to speed this, so I would wait
>>> till next cycle to take the AMD patches
>>>
>>> Thus we should look at the vt-d patches if this is to go in this
>>> cycle.
>>>
>>>> Alternatively: I can check the counter directly to replicate the amd_iommu_efr
>>>> check under the current helper I made (amd_iommu_hd_support) and then change it
>>>> after the fact... That should lead to less dependencies?
>>>
>>> Or this
>>>
>> I think I'll go with this (once Suravee responds)
>>
>
> Or just keep current code -- which is valid -- at this point and doesn't involve
> replicating anything
We could keep the code for now. It should not break anything
functionally. Then, once the check_feature() stuff is in place, we can
propose a follow up change to keep things consistent :)
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 19:03 ` Joao Martins
2023-10-17 22:04 ` Joao Martins
@ 2023-10-18 20:40 ` Joao Martins
1 sibling, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-18 20:40 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Suthikulpanit, Suravee, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Will Deacon, Robin Murphy, Alex Williamson, kvm, iommu
On 17/10/2023 20:03, Joao Martins wrote:
> On 17/10/2023 19:49, Jason Gunthorpe wrote:
>> On Tue, Oct 17, 2023 at 07:32:31PM +0100, Joao Martins wrote:
>> We are fast running out of time though :)
>
> Yeah, I know :( I am trying to get this out tomorrow
Finally got done; I think I didn't miss anything.
For what is worth: we could also wait until -rc1 if desired or should it need
longer simmering in linux-next. Or in case something is less perfect than it
should be.
Thanks for all the prompt comments!
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 9:54 ` Joao Martins
2023-10-17 18:32 ` Joao Martins
@ 2023-10-18 11:46 ` Suthikulpanit, Suravee
2023-10-18 13:04 ` Suthikulpanit, Suravee
2 siblings, 0 replies; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-18 11:46 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
Hi Joao,
On 10/17/2023 4:54 PM, Joao Martins wrote:
> On 17/10/2023 09:18, Suthikulpanit, Suravee wrote:
>> Hi Joao,
>>
>> On 9/23/2023 8:25 AM, Joao Martins wrote:
>>> ...
>>> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
>>> index 2892aa1b4dc1..099ccb04f52f 100644
>>> --- a/drivers/iommu/amd/io_pgtable.c
>>> +++ b/drivers/iommu/amd/io_pgtable.c
>>> @@ -486,6 +486,89 @@ static phys_addr_t iommu_v1_iova_to_phys(struct
>>> io_pgtable_ops *ops, unsigned lo
>>> return (__pte & ~offset_mask) | (iova & offset_mask);
>>> }
>>> +static bool pte_test_dirty(u64 *ptep, unsigned long size)
>>> +{
>>> + bool dirty = false;
>>> + int i, count;
>>> +
>>> + /*
>>> + * 2.2.3.2 Host Dirty Support
>>> + * When a non-default page size is used , software must OR the
>>> + * Dirty bits in all of the replicated host PTEs used to map
>>> + * the page. The IOMMU does not guarantee the Dirty bits are
>>> + * set in all of the replicated PTEs. Any portion of the page
>>> + * may have been written even if the Dirty bit is set in only
>>> + * one of the replicated PTEs.
>>> + */
>>> + count = PAGE_SIZE_PTE_COUNT(size);
>>> + for (i = 0; i < count; i++) {
>>> + if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
>>> + dirty = true;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + return dirty;
>>> +}
>>> +
>>> +static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
>>> +{
>>> + bool dirty = false;
>>> + int i, count;
>>> +
>>> + /*
>>> + * 2.2.3.2 Host Dirty Support
>>> + * When a non-default page size is used , software must OR the
>>> + * Dirty bits in all of the replicated host PTEs used to map
>>> + * the page. The IOMMU does not guarantee the Dirty bits are
>>> + * set in all of the replicated PTEs. Any portion of the page
>>> + * may have been written even if the Dirty bit is set in only
>>> + * one of the replicated PTEs.
>>> + */
>>> + count = PAGE_SIZE_PTE_COUNT(size);
>>> + for (i = 0; i < count; i++)
>>> + if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
>>> + (unsigned long *) &ptep[i]))
>>> + dirty = true;
>>> +
>>> + return dirty;
>>> +}
>>
>> Can we consolidate the two functions above where we can pass the flag and check
>> if IOMMU_DIRTY_NO_CLEAR is set?
>>
> I guess so yes -- it was initially to have an efficient tight loop to check all
> replicated PTEs, but I think I found a way to merge everything e.g.
>
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 099ccb04f52f..953f867b4943 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -486,8 +486,10 @@ static phys_addr_t iommu_v1_iova_to_phys(struct
> io_pgtable_ops *ops, unsigned lo
> return (__pte & ~offset_mask) | (iova & offset_mask);
> }
>
> -static bool pte_test_dirty(u64 *ptep, unsigned long size)
> +static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size,
> + unsigned long flags)
> {
> + bool test_only = flags & IOMMU_DIRTY_NO_CLEAR;
> bool dirty = false;
> int i, count;
>
> @@ -501,35 +503,20 @@ static bool pte_test_dirty(u64 *ptep, unsigned long size)
> * one of the replicated PTEs.
> */
> count = PAGE_SIZE_PTE_COUNT(size);
> - for (i = 0; i < count; i++) {
> - if (test_bit(IOMMU_PTE_HD_BIT, (unsigned long *) &ptep[i])) {
> + for (i = 0; i < count && test_only; i++) {
> + if (test_bit(IOMMU_PTE_HD_BIT,
> + (unsigned long *) &ptep[i])) {
> dirty = true;
> break;
> }
> }
>
> - return dirty;
> -}
> -
> -static bool pte_test_and_clear_dirty(u64 *ptep, unsigned long size)
> -{
> - bool dirty = false;
> - int i, count;
> -
> - /*
> - * 2.2.3.2 Host Dirty Support
> - * When a non-default page size is used , software must OR the
> - * Dirty bits in all of the replicated host PTEs used to map
> - * the page. The IOMMU does not guarantee the Dirty bits are
> - * set in all of the replicated PTEs. Any portion of the page
> - * may have been written even if the Dirty bit is set in only
> - * one of the replicated PTEs.
> - */
> - count = PAGE_SIZE_PTE_COUNT(size);
> - for (i = 0; i < count; i++)
> + for (i = 0; i < count && !test_only; i++) {
> if (test_and_clear_bit(IOMMU_PTE_HD_BIT,
> - (unsigned long *) &ptep[i]))
> + (unsigned long *) &ptep[i])) {
> dirty = true;
> + }
> + }
>
> return dirty;
> }
> @@ -559,9 +546,7 @@ static int iommu_v1_read_and_clear_dirty(struct
> io_pgtable_ops *ops,
> * Mark the whole IOVA range as dirty even if only one of
> * the replicated PTEs were marked dirty.
> */
> - if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
> - pte_test_dirty(ptep, pgsize)) ||
> - pte_test_and_clear_dirty(ptep, pgsize))
> + if (pte_test_and_clear_dirty(ptep, pgsize, flags))
> iommu_dirty_bitmap_record(dirty, iova, pgsize);
> iova += pgsize;
> } while (iova < end);
>
>>> +
>>> +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
>>> + unsigned long iova, size_t size,
>>> + unsigned long flags,
>>> + struct iommu_dirty_bitmap *dirty)
>>> +{
>>> + struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
>>> + unsigned long end = iova + size - 1;
>>> +
>>> + do {
>>> + unsigned long pgsize = 0;
>>> + u64 *ptep, pte;
>>> +
>>> + ptep = fetch_pte(pgtable, iova, &pgsize);
>>> + if (ptep)
>>> + pte = READ_ONCE(*ptep);
>>> + if (!ptep || !IOMMU_PTE_PRESENT(pte)) {
>>> + pgsize = pgsize ?: PTE_LEVEL_PAGE_SIZE(0);
>>> + iova += pgsize;
>>> + continue;
>>> + }
>>> +
>>> + /*
>>> + * Mark the whole IOVA range as dirty even if only one of
>>> + * the replicated PTEs were marked dirty.
>>> + */
>>> + if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
>>> + pte_test_dirty(ptep, pgsize)) ||
>>> + pte_test_and_clear_dirty(ptep, pgsize))
>>> + iommu_dirty_bitmap_record(dirty, iova, pgsize);
>>> + iova += pgsize;
>>> + } while (iova < end);
>>> +
>
> You earlier point made me discover that the test-only case might end up clearing
> the PTE unnecessarily. But I have addressed it in the previous comment
Reviewed by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-17 9:54 ` Joao Martins
2023-10-17 18:32 ` Joao Martins
2023-10-18 11:46 ` Suthikulpanit, Suravee
@ 2023-10-18 13:04 ` Suthikulpanit, Suravee
2023-10-18 13:17 ` Joao Martins
2023-10-18 15:50 ` Jason Gunthorpe
2 siblings, 2 replies; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-18 13:04 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
Joao,
On 10/17/2023 4:54 PM, Joao Martins wrote:
>>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>>> index af36c627022f..31b333cc6fe1 100644
>>> --- a/drivers/iommu/amd/iommu.c
>>> +++ b/drivers/iommu/amd/iommu.c
>>> ....
>>> @@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
>>> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
>>> }
>>> +static bool amd_iommu_hd_support(struct amd_iommu *iommu)
>>> +{
>>> + return iommu && (iommu->features & FEATURE_HDSUP);
>>> +}
>>> +
>> You can use the newly introduced check_feature(u64 mask) to check the HD support.
>>
> It appears that the check_feature() is logically equivalent to
> check_feature_on_all_iommus(); where this check is per-device/per-iommu check to
> support potentially nature of different IOMMUs with different features. Being
> per-IOMMU would allow you to have firmware to not advertise certain IOMMU
> features on some devices while still supporting for others. I understand this is
> not a thing in x86, but the UAPI supports it. Having said that, you still want
> me to switch to check_feature() ?
So far, AMD does not have system w/ multiple IOMMUs, which have
different EFR/EFR2. However, the AMD IOMMU spec does not enforce
EFR/EFR2 of all IOMMU instances to be the same. There are certain
features, which require consistent support across all IOMMUs. That's why
we introduced the system-wide amd_iommu_efr / amd_iommu_efr2 to simpify
feature checking logic in the driver.
For EFR[HDSup], let's consider a VM with two VFIO pass-through devices
(dev_A and dev_B). Each device is on different IOMMU instance (IOMMU_A,
IOMMU_B), where only IOMMU_A has EFR[HDSUP]=1.
If we call do_iommu_domain_alloc(type, dev_A,
IOMMU_HWPT_ALLOC_ENFORCE_DIRTY), this should return a domain w/
dirty_ops set. Then, if we attach dev_B to the same domain, the
following check should return -EINVAL.
@@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct
iommu_domain *dom,
return 0;
dev_data->defer_attach = false;
+ if (dom->dirty_ops && iommu &&
+ !(iommu->features & FEATURE_HDSUP))
+ return -EINVAL;
which means dev_A and dev_B cannot be in the same VFIO domain.
In this case, since we can prevent devices on IOMMUs w/ different
EFR[HDSUP] bit to share the same domain, it should be safe to support
dirty tracking on such system, and it makes sense to just check the
per-IOMMU EFR value (i.e. iommu->features). If we decide to keep this,
we probably should put comment in the code to describe this.
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-18 13:04 ` Suthikulpanit, Suravee
@ 2023-10-18 13:17 ` Joao Martins
2023-10-18 13:31 ` Joao Martins
2023-10-18 15:50 ` Jason Gunthorpe
1 sibling, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-18 13:17 UTC (permalink / raw)
To: Suthikulpanit, Suravee, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 18/10/2023 14:04, Suthikulpanit, Suravee wrote:
> Joao,
>
> On 10/17/2023 4:54 PM, Joao Martins wrote:
>>>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>>>> index af36c627022f..31b333cc6fe1 100644
>>>> --- a/drivers/iommu/amd/iommu.c
>>>> +++ b/drivers/iommu/amd/iommu.c
>>>> ....
>>>> @@ -2156,11 +2160,17 @@ static inline u64 dma_max_address(void)
>>>> return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
>>>> }
>>>> +static bool amd_iommu_hd_support(struct amd_iommu *iommu)
>>>> +{
>>>> + return iommu && (iommu->features & FEATURE_HDSUP);
>>>> +}
>>>> +
>>> You can use the newly introduced check_feature(u64 mask) to check the HD
>>> support.
>>>
>> It appears that the check_feature() is logically equivalent to
>> check_feature_on_all_iommus(); where this check is per-device/per-iommu check to
>> support potentially nature of different IOMMUs with different features. Being
>> per-IOMMU would allow you to have firmware to not advertise certain IOMMU
>> features on some devices while still supporting for others. I understand this is
>> not a thing in x86, but the UAPI supports it. Having said that, you still want
>> me to switch to check_feature() ?
>
> So far, AMD does not have system w/ multiple IOMMUs, which have different
> EFR/EFR2. However, the AMD IOMMU spec does not enforce EFR/EFR2 of all IOMMU
> instances to be the same. There are certain features, which require consistent
> support across all IOMMUs. That's why we introduced the system-wide
> amd_iommu_efr / amd_iommu_efr2 to simpify feature checking logic in the driver.
>
Ack
> For EFR[HDSup], let's consider a VM with two VFIO pass-through devices (dev_A
> and dev_B). Each device is on different IOMMU instance (IOMMU_A, IOMMU_B), where
> only IOMMU_A has EFR[HDSUP]=1.
>
> If we call do_iommu_domain_alloc(type, dev_A, IOMMU_HWPT_ALLOC_ENFORCE_DIRTY),
> this should return a domain w/ dirty_ops set. Then, if we attach dev_B to the
> same domain, the following check should return -EINVAL.
>
True; this is what it is in this patch.
> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
> return 0;
>
> dev_data->defer_attach = false;
> + if (dom->dirty_ops && iommu &&
> + !(iommu->features & FEATURE_HDSUP))
> + return -EINVAL;
>
> which means dev_A and dev_B cannot be in the same VFIO domain.
>
Correct; and that's by design.
> In this case, since we can prevent devices on IOMMUs w/ different EFR[HDSUP] bit
> to share the same domain, it should be safe to support dirty tracking on such
> system, and it makes sense to just check the per-IOMMU EFR value (i.e.
> iommu->features). If we decide to keep this, we probably should put comment in
> the code to describe this.
OK, I'll leave a comment, but note that this isn't an odd case, we are supposed
to enforce dirty tracking on the iommu domain, and then nack any non-supported
IOMMUs such taht only devices with dirty-tracking supported IOMMUs are present.
But the well-behaved userspace app will also check accordingly if the capability
is there in the IOMMU.
Btw, I understand that this is not the case for current AMD systems where
everything is homogeneous. We can simplify this with check_feature() afterwards,
but the fact that the spec doesn't prevent it makes me wonder too :)
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-18 13:17 ` Joao Martins
@ 2023-10-18 13:31 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-18 13:31 UTC (permalink / raw)
To: Suthikulpanit, Suravee, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 18/10/2023 14:17, Joao Martins wrote:
> On 18/10/2023 14:04, Suthikulpanit, Suravee wrote:
>> On 10/17/2023 4:54 PM, Joao Martins wrote:
>> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
>> return 0;
>>
>> dev_data->defer_attach = false;
>> + if (dom->dirty_ops && iommu &&
>> + !(iommu->features & FEATURE_HDSUP))
>> + return -EINVAL;
>>
>> which means dev_A and dev_B cannot be in the same VFIO domain.
>>
> Correct; and that's by design.
>
>> In this case, since we can prevent devices on IOMMUs w/ different EFR[HDSUP] bit
>> to share the same domain, it should be safe to support dirty tracking on such
>> system, and it makes sense to just check the per-IOMMU EFR value (i.e.
>> iommu->features). If we decide to keep this, we probably should put comment in
>> the code to describe this.
>
> OK, I'll leave a comment, but note that this isn't an odd case, we are supposed
> to enforce dirty tracking on the iommu domain, and then nack any non-supported
> IOMMUs such taht only devices with dirty-tracking supported IOMMUs are present.
> But the well-behaved userspace app will also check accordingly if the capability
> is there in the IOMMU.
>
FWIW, I added this comment:
@@ -2254,6 +2270,13 @@ static int amd_iommu_attach_device(struct iommu_domain *dom,
dev_data->defer_attach = false;
+ /*
+ * Restrict to devices with compatible IOMMU hardware support
+ * when enforcement of dirty tracking is enabled.
+ */
> Btw, I understand that this is not the case for current AMD systems where
> everything is homogeneous. We can simplify this with check_feature() afterwards,
> but the fact that the spec doesn't prevent it makes me wonder too :)
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs
2023-10-18 13:04 ` Suthikulpanit, Suravee
2023-10-18 13:17 ` Joao Martins
@ 2023-10-18 15:50 ` Jason Gunthorpe
1 sibling, 0 replies; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 15:50 UTC (permalink / raw)
To: Suthikulpanit, Suravee
Cc: Joao Martins, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Wed, Oct 18, 2023 at 08:04:07PM +0700, Suthikulpanit, Suravee wrote:
> > It appears that the check_feature() is logically equivalent to
> > check_feature_on_all_iommus(); where this check is per-device/per-iommu check to
> > support potentially nature of different IOMMUs with different features. Being
> > per-IOMMU would allow you to have firmware to not advertise certain IOMMU
> > features on some devices while still supporting for others. I understand this is
> > not a thing in x86, but the UAPI supports it. Having said that, you still want
> > me to switch to check_feature() ?
>
> So far, AMD does not have system w/ multiple IOMMUs, which have different
> EFR/EFR2. However, the AMD IOMMU spec does not enforce EFR/EFR2 of all IOMMU
> instances to be the same. There are certain features, which require
> consistent support across all IOMMUs. That's why we introduced the
> system-wide amd_iommu_efr / amd_iommu_efr2 to simpify feature checking logic
> in the driver.
I've argued this seems like a shortcut. The general design of the
iommu subsystem has everything be per-iommu and the only cross
instance sharing should be with the domain.
I understood AMD had some global things where it needed to interact
with the other arch code that had to be global.
However, at least for domain centric features this is how things
should be coded in drivers - store the data in the domain and reject
attach to incompatible iommus. Try to minimize the use of globals.
> @@ -2252,6 +2268,9 @@ static int amd_iommu_attach_device(struct iommu_domain
> *dom,
> return 0;
>
> dev_data->defer_attach = false;
> + if (dom->dirty_ops && iommu &&
> + !(iommu->features & FEATURE_HDSUP))
> + return -EINVAL;
>
> which means dev_A and dev_B cannot be in the same VFIO domain.
Which is correct, the HW cannot support dirty tracking.
The VMM world can handle this, it knows that the domain is
incompatible so it can choose to use another way to do dirty tracking
and associate a non-dirty tracking domain to this device. Or decide to
give up.
Other platforms will require this code anyhow as they don't have the
guarentee of uniformity.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (16 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 17/19] iommu/amd: Access/Dirty bit support in IOPTEs Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-10-17 3:48 ` Suthikulpanit, Suravee
2023-10-18 8:32 ` Vasant Hegde
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
` (2 subsequent siblings)
20 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
Print the feature, much like other kernel-supported features.
One can still probe its actual hw support via sysfs, regardless
of what the kernel does.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
drivers/iommu/amd/init.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 45efb7e5d725..b091a3d10819 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
if (iommu->features & FEATURE_GAM_VAPIC)
pr_cont(" GA_vAPIC");
+ if (iommu->features & FEATURE_HASUP)
+ pr_cont(" HASup");
+ if (iommu->features & FEATURE_HDSUP)
+ pr_cont(" HDSup");
if (iommu->features & FEATURE_SNP)
pr_cont(" SNP");
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-09-23 1:25 ` [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported Joao Martins
@ 2023-10-17 3:48 ` Suthikulpanit, Suravee
2023-10-17 9:07 ` Joao Martins
2023-10-18 8:32 ` Vasant Hegde
1 sibling, 1 reply; 140+ messages in thread
From: Suthikulpanit, Suravee @ 2023-10-17 3:48 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 9/23/2023 8:25 AM, Joao Martins wrote:
> Print the feature, much like other kernel-supported features.
>
> One can still probe its actual hw support via sysfs, regardless
> of what the kernel does.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/iommu/amd/init.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 45efb7e5d725..b091a3d10819 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>
> if (iommu->features & FEATURE_GAM_VAPIC)
> pr_cont(" GA_vAPIC");
> + if (iommu->features & FEATURE_HASUP)
> + pr_cont(" HASup");
> + if (iommu->features & FEATURE_HDSUP)
> + pr_cont(" HDSup");
>
> if (iommu->features & FEATURE_SNP)
> pr_cont(" SNP");
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Thanks,
Suravee
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-17 3:48 ` Suthikulpanit, Suravee
@ 2023-10-17 9:07 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 9:07 UTC (permalink / raw)
To: Suthikulpanit, Suravee, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 17/10/2023 04:48, Suthikulpanit, Suravee wrote:
> On 9/23/2023 8:25 AM, Joao Martins wrote:
>> Print the feature, much like other kernel-supported features.
>>
>> One can still probe its actual hw support via sysfs, regardless
>> of what the kernel does.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/iommu/amd/init.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index 45efb7e5d725..b091a3d10819 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>> if (iommu->features & FEATURE_GAM_VAPIC)
>> pr_cont(" GA_vAPIC");
>> + if (iommu->features & FEATURE_HASUP)
>> + pr_cont(" HASup");
>> + if (iommu->features & FEATURE_HDSUP)
>> + pr_cont(" HDSup");
>> if (iommu->features & FEATURE_SNP)
>> pr_cont(" SNP");
>
> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>
Thanks!
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-09-23 1:25 ` [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported Joao Martins
2023-10-17 3:48 ` Suthikulpanit, Suravee
@ 2023-10-18 8:32 ` Vasant Hegde
2023-10-18 8:53 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Vasant Hegde @ 2023-10-18 8:32 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
Joao,
On 9/23/2023 6:55 AM, Joao Martins wrote:
> Print the feature, much like other kernel-supported features.
>
> One can still probe its actual hw support via sysfs, regardless
> of what the kernel does.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> drivers/iommu/amd/init.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 45efb7e5d725..b091a3d10819 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>
> if (iommu->features & FEATURE_GAM_VAPIC)
> pr_cont(" GA_vAPIC");
> + if (iommu->features & FEATURE_HASUP)
> + pr_cont(" HASup");
> + if (iommu->features & FEATURE_HDSUP)
> + pr_cont(" HDSup");
Note that this has a conflict with iommu/next branch. But it should be fairly
straight to fix it. Otherwise patch looks good to me.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-18 8:32 ` Vasant Hegde
@ 2023-10-18 8:53 ` Joao Martins
2023-10-18 9:03 ` Vasant Hegde
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-18 8:53 UTC (permalink / raw)
To: Vasant Hegde, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 18/10/2023 09:32, Vasant Hegde wrote:
> Joao,
>
> On 9/23/2023 6:55 AM, Joao Martins wrote:
>> Print the feature, much like other kernel-supported features.
>>
>> One can still probe its actual hw support via sysfs, regardless
>> of what the kernel does.
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> drivers/iommu/amd/init.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index 45efb7e5d725..b091a3d10819 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>>
>> if (iommu->features & FEATURE_GAM_VAPIC)
>> pr_cont(" GA_vAPIC");
>> + if (iommu->features & FEATURE_HASUP)
>> + pr_cont(" HASup");
>> + if (iommu->features & FEATURE_HDSUP)
>> + pr_cont(" HDSup");
>
> Note that this has a conflict with iommu/next branch. But it should be fairly
> straight to fix it. Otherwise patch looks good to me.
>
I guess it's this patch, thanks for reminding:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/drivers/iommu/amd/init.c?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec
But then it's the same problem as the previous patch. The loop above is enterily
reworked, so the code above won't work, and the "iommu->features &" conditionals
needs to be replaced with a check_feature(FEATURE_HDSUP) and
check_feature(FEATURE_HASUP). And depending on the order of pull requests this
is problematic. The previous patch can get away with direct usage of
amd_iommu_efr, but this one sadly no.
I can skip this patch in particular for v4 and re-submit after -rc1 when
everything is aligned. It is only for user experience about console printing two
strings. Real feature probe is not affected users still have the old sysfs
interface, and these days IOMMUFD GET_HW_INFO which userspace/VMM will rely on.
> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
>
Thanks!
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-18 8:53 ` Joao Martins
@ 2023-10-18 9:03 ` Vasant Hegde
2023-10-18 9:05 ` Joao Martins
2023-10-18 15:52 ` Jason Gunthorpe
0 siblings, 2 replies; 140+ messages in thread
From: Vasant Hegde @ 2023-10-18 9:03 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
Hi Joao,
On 10/18/2023 2:23 PM, Joao Martins wrote:
> On 18/10/2023 09:32, Vasant Hegde wrote:
>> Joao,
>>
>> On 9/23/2023 6:55 AM, Joao Martins wrote:
>>> Print the feature, much like other kernel-supported features.
>>>
>>> One can still probe its actual hw support via sysfs, regardless
>>> of what the kernel does.
>>>
>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>> ---
>>> drivers/iommu/amd/init.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index 45efb7e5d725..b091a3d10819 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>>>
>>> if (iommu->features & FEATURE_GAM_VAPIC)
>>> pr_cont(" GA_vAPIC");
>>> + if (iommu->features & FEATURE_HASUP)
>>> + pr_cont(" HASup");
>>> + if (iommu->features & FEATURE_HDSUP)
>>> + pr_cont(" HDSup");
>>
>> Note that this has a conflict with iommu/next branch. But it should be fairly
>> straight to fix it. Otherwise patch looks good to me.
>>
> I guess it's this patch, thanks for reminding:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/drivers/iommu/amd/init.c?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec
Right.
>
> But then it's the same problem as the previous patch. The loop above is enterily
> reworked, so the code above won't work, and the "iommu->features &" conditionals
> needs to be replaced with a check_feature(FEATURE_HDSUP) and
> check_feature(FEATURE_HASUP). And depending on the order of pull requests this
> is problematic. The previous patch can get away with direct usage of
> amd_iommu_efr, but this one sadly no.
>
> I can skip this patch in particular for v4 and re-submit after -rc1 when
> everything is aligned. It is only for user experience about console printing two
> strings. Real feature probe is not affected users still have the old sysfs
> interface, and these days IOMMUFD GET_HW_INFO which userspace/VMM will rely on.
IIUC this can be an independent patch and doesn't have strict dependency on this
series itself. May be you can rebase it on top of iommu/next and post it separately?
-Vasant
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-18 9:03 ` Vasant Hegde
@ 2023-10-18 9:05 ` Joao Martins
2023-10-18 15:52 ` Jason Gunthorpe
1 sibling, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-18 9:05 UTC (permalink / raw)
To: Vasant Hegde, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 18/10/2023 10:03, Vasant Hegde wrote:
> Hi Joao,
>
> On 10/18/2023 2:23 PM, Joao Martins wrote:
>> On 18/10/2023 09:32, Vasant Hegde wrote:
>>> Joao,
>>>
>>> On 9/23/2023 6:55 AM, Joao Martins wrote:
>>>> Print the feature, much like other kernel-supported features.
>>>>
>>>> One can still probe its actual hw support via sysfs, regardless
>>>> of what the kernel does.
>>>>
>>>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>>>> ---
>>>> drivers/iommu/amd/init.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>>> index 45efb7e5d725..b091a3d10819 100644
>>>> --- a/drivers/iommu/amd/init.c
>>>> +++ b/drivers/iommu/amd/init.c
>>>> @@ -2208,6 +2208,10 @@ static void print_iommu_info(void)
>>>>
>>>> if (iommu->features & FEATURE_GAM_VAPIC)
>>>> pr_cont(" GA_vAPIC");
>>>> + if (iommu->features & FEATURE_HASUP)
>>>> + pr_cont(" HASup");
>>>> + if (iommu->features & FEATURE_HDSUP)
>>>> + pr_cont(" HDSup");
>>>
>>> Note that this has a conflict with iommu/next branch. But it should be fairly
>>> straight to fix it. Otherwise patch looks good to me.
>>>
>> I guess it's this patch, thanks for reminding:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/drivers/iommu/amd/init.c?h=next&id=7b7563a93437ef945c829538da28f0095f1603ec
>
> Right.
>
>>
>> But then it's the same problem as the previous patch. The loop above is enterily
>> reworked, so the code above won't work, and the "iommu->features &" conditionals
>> needs to be replaced with a check_feature(FEATURE_HDSUP) and
>> check_feature(FEATURE_HASUP). And depending on the order of pull requests this
>> is problematic. The previous patch can get away with direct usage of
>> amd_iommu_efr, but this one sadly no.
>>
>> I can skip this patch in particular for v4 and re-submit after -rc1 when
>> everything is aligned. It is only for user experience about console printing two
>> strings. Real feature probe is not affected users still have the old sysfs
>> interface, and these days IOMMUFD GET_HW_INFO which userspace/VMM will rely on.
>
> IIUC this can be an independent patch and doesn't have strict dependency on this
> series itself. May be you can rebase it on top of iommu/next and post it separately?
Yeap, pretty much; I've removed this from this series.
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-18 9:03 ` Vasant Hegde
2023-10-18 9:05 ` Joao Martins
@ 2023-10-18 15:52 ` Jason Gunthorpe
2023-10-18 15:55 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-18 15:52 UTC (permalink / raw)
To: Vasant Hegde
Cc: Joao Martins, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Lu Baolu, Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On Wed, Oct 18, 2023 at 02:33:09PM +0530, Vasant Hegde wrote:
> IIUC this can be an independent patch and doesn't have strict dependency on this
> series itself. May be you can rebase it on top of iommu/next and post it separately?
No, we can't merge it that way. If this is required the AMD parts have
to wait for rc1
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported
2023-10-18 15:52 ` Jason Gunthorpe
@ 2023-10-18 15:55 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-18 15:55 UTC (permalink / raw)
To: Jason Gunthorpe, Vasant Hegde
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 18/10/2023 16:52, Jason Gunthorpe wrote:
> On Wed, Oct 18, 2023 at 02:33:09PM +0530, Vasant Hegde wrote:
>
>> IIUC this can be an independent patch and doesn't have strict dependency on this
>> series itself. May be you can rebase it on top of iommu/next and post it separately?
>
> No, we can't merge it that way. If this is required the AMD parts have
> to wait for rc1
I've removed this specific patch from the series, and was aiming to post to -rc1
when the dependent parts are settled.
It really it just printing 6 letters on the kernel log. It's a tiny thing
really, and the rest of the series does not depend this at all.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (17 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 18/19] iommu/amd: Print access/dirty bits if supported Joao Martins
@ 2023-09-23 1:25 ` Joao Martins
2023-09-25 7:01 ` Baolu Lu
` (4 more replies)
2023-09-26 8:58 ` [PATCH v3 00/19] IOMMUFD Dirty Tracking Shameerali Kolothum Thodi
2023-10-13 16:29 ` Jason Gunthorpe
20 siblings, 5 replies; 140+ messages in thread
From: Joao Martins @ 2023-09-23 1:25 UTC (permalink / raw)
To: iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Lu Baolu,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm, Joao Martins
IOMMU advertises Access/Dirty bits for second-stage page table if the
extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
The first stage table is compatible with CPU page table thus A/D bits are
implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
"3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
"3.7.2 Accessed and Dirty Flags".
First stage page table is enabled by default so it's allowed to set dirty
tracking and no control bits needed, it just returns 0. To use SSADS, set
bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
via pasid_flush_caches() following the manual. Relevant SDM refs:
"3.7.2 Accessed and Dirty Flags"
"6.5.3.3 Guidance to Software for Invalidations,
Table 23. Guidance to Software for Invalidations"
PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus
the caller of the iommu op will flush the IOTLB. Relevant manuals over
the hardware translation is chapter 6 with some special mention to:
"6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
"6.2.4 IOTLB"
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
The IOPTE walker is still a bit inneficient. Making sure the UAPI/IOMMUFD is
solid and agreed upon.
---
drivers/iommu/intel/iommu.c | 94 +++++++++++++++++++++++++++++++++++++
drivers/iommu/intel/iommu.h | 15 ++++++
drivers/iommu/intel/pasid.c | 94 +++++++++++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.h | 4 ++
4 files changed, 207 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 491bcde1ff96..7d5a8f5283a7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -300,6 +300,7 @@ static int iommu_skip_te_disable;
#define IDENTMAP_AZALIA 4
const struct iommu_ops intel_iommu_ops;
+const struct iommu_dirty_ops intel_dirty_ops;
static bool translation_pre_enabled(struct intel_iommu *iommu)
{
@@ -4077,6 +4078,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
static struct iommu_domain *
intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
{
+ bool enforce_dirty = (flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY);
struct iommu_domain *domain;
struct intel_iommu *iommu;
@@ -4087,9 +4089,15 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
+ if (enforce_dirty &&
+ !device_iommu_capable(dev, IOMMU_CAP_DIRTY))
+ return ERR_PTR(-EOPNOTSUPP);
+
domain = iommu_domain_alloc(dev->bus);
if (!domain)
domain = ERR_PTR(-ENOMEM);
+ if (domain && enforce_dirty)
+ domain->dirty_ops = &intel_dirty_ops;
return domain;
}
@@ -4367,6 +4375,9 @@ static bool intel_iommu_capable(struct device *dev, enum iommu_cap cap)
return dmar_platform_optin();
case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
return ecap_sc_support(info->iommu->ecap);
+ case IOMMU_CAP_DIRTY:
+ return sm_supported(info->iommu) &&
+ ecap_slads(info->iommu->ecap);
default:
return false;
}
@@ -4822,6 +4833,89 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length, u32 *type)
return vtd;
}
+static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
+ bool enable)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_domain_info *info;
+ int ret = -EINVAL;
+
+ spin_lock(&dmar_domain->lock);
+ if (!(dmar_domain->dirty_tracking ^ enable) ||
+ list_empty(&dmar_domain->devices)) {
+ spin_unlock(&dmar_domain->lock);
+ return 0;
+ }
+
+ list_for_each_entry(info, &dmar_domain->devices, link) {
+ /* First-level page table always enables dirty bit*/
+ if (dmar_domain->use_first_level) {
+ ret = 0;
+ break;
+ }
+
+ ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain,
+ info->dev, IOMMU_NO_PASID,
+ enable);
+ if (ret)
+ break;
+
+ }
+
+ if (!ret)
+ dmar_domain->dirty_tracking = enable;
+ spin_unlock(&dmar_domain->lock);
+
+ return ret;
+}
+
+static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ unsigned long flags,
+ struct iommu_dirty_bitmap *dirty)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ unsigned long end = iova + size - 1;
+ unsigned long pgsize;
+ bool ad_enabled;
+
+ spin_lock(&dmar_domain->lock);
+ ad_enabled = dmar_domain->dirty_tracking;
+ spin_unlock(&dmar_domain->lock);
+
+ if (!ad_enabled && dirty->bitmap)
+ return -EINVAL;
+
+ rcu_read_lock();
+ do {
+ struct dma_pte *pte;
+ int lvl = 0;
+
+ pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl,
+ GFP_ATOMIC);
+ pgsize = level_size(lvl) << VTD_PAGE_SHIFT;
+ if (!pte || !dma_pte_present(pte)) {
+ iova += pgsize;
+ continue;
+ }
+
+ /* It is writable, set the bitmap */
+ if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
+ dma_sl_pte_dirty(pte)) ||
+ dma_sl_pte_test_and_clear_dirty(pte))
+ iommu_dirty_bitmap_record(dirty, iova, pgsize);
+ iova += pgsize;
+ } while (iova < end);
+ rcu_read_unlock();
+
+ return 0;
+}
+
+const struct iommu_dirty_ops intel_dirty_ops = {
+ .set_dirty_tracking = intel_iommu_set_dirty_tracking,
+ .read_and_clear_dirty = intel_iommu_read_and_clear_dirty,
+};
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index c18fb699c87a..bccd44db3316 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -48,6 +48,9 @@
#define DMA_FL_PTE_DIRTY BIT_ULL(6)
#define DMA_FL_PTE_XD BIT_ULL(63)
+#define DMA_SL_PTE_DIRTY_BIT 9
+#define DMA_SL_PTE_DIRTY BIT_ULL(DMA_SL_PTE_DIRTY_BIT)
+
#define ADDR_WIDTH_5LEVEL (57)
#define ADDR_WIDTH_4LEVEL (48)
@@ -592,6 +595,7 @@ struct dmar_domain {
* otherwise, goes through the second
* level.
*/
+ u8 dirty_tracking:1; /* Dirty tracking is enabled */
spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */
@@ -781,6 +785,17 @@ static inline bool dma_pte_present(struct dma_pte *pte)
return (pte->val & 3) != 0;
}
+static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
+{
+ return (pte->val & DMA_SL_PTE_DIRTY) != 0;
+}
+
+static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
+{
+ return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
+ (unsigned long *)&pte->val);
+}
+
static inline bool dma_pte_superpage(struct dma_pte *pte)
{
return (pte->val & DMA_PTE_LARGE_PAGE);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 8f92b92f3d2a..03814942d59c 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -277,6 +277,11 @@ static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
WRITE_ONCE(*ptr, (old & ~mask) | bits);
}
+static inline u64 pasid_get_bits(u64 *ptr)
+{
+ return READ_ONCE(*ptr);
+}
+
/*
* Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
* PASID entry.
@@ -335,6 +340,36 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
pasid_set_bits(&pe->val[0], 1 << 1, 0);
}
+/*
+ * Enable second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_ssade(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 9, 1 << 9);
+}
+
+/*
+ * Enable second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_clear_ssade(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 9, 0);
+}
+
+/*
+ * Checks if second level A/D bits by setting the SLADE (Second Level
+ * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
+ * entry is enabled.
+ */
+static inline bool pasid_get_ssade(struct pasid_entry *pe)
+{
+ return pasid_get_bits(&pe->val[0]) & (1 << 9);
+}
+
/*
* Setup the WPE(Write Protect Enable) field (Bit 132) of a
* scalable mode PASID entry.
@@ -627,6 +662,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
pasid_set_translation_type(pte, PASID_ENTRY_PGTT_SL_ONLY);
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+ if (domain->dirty_tracking)
+ pasid_set_ssade(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
@@ -636,6 +673,63 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
return 0;
}
+/*
+ * Set up dirty tracking on a second only translation type.
+ */
+int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, u32 pasid,
+ bool enabled)
+{
+ struct pasid_entry *pte;
+ u16 did, pgtt;
+
+ spin_lock(&iommu->lock);
+
+ did = domain_id_iommu(domain, iommu);
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (!pte) {
+ spin_unlock(&iommu->lock);
+ dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+ return -ENODEV;
+ }
+
+ pgtt = pasid_pte_get_pgtt(pte);
+
+ if (enabled)
+ pasid_set_ssade(pte);
+ else
+ pasid_clear_ssade(pte);
+ spin_unlock(&iommu->lock);
+
+ /*
+ * From VT-d spec table 25 "Guidance to Software for Invalidations":
+ *
+ * - PASID-selective-within-Domain PASID-cache invalidation
+ * If (PGTT=SS or Nested)
+ * - Domain-selective IOTLB invalidation
+ * Else
+ * - PASID-selective PASID-based IOTLB invalidation
+ * - If (pasid is RID_PASID)
+ * - Global Device-TLB invalidation to affected functions
+ * Else
+ * - PASID-based Device-TLB invalidation (with S=1 and
+ * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
+ */
+ pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+
+ if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
+ iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+ else
+ qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
+
+ /* Device IOTLB doesn't need to be flushed in caching mode. */
+ if (!cap_caching_mode(iommu->cap))
+ devtlb_invalidation_with_pasid(iommu, dev, pasid);
+
+ return 0;
+}
+
/*
* Set up the scalable mode pasid entry for passthrough translation type.
*/
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 4e9e68c3c388..958050b093aa 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -106,6 +106,10 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u32 pasid);
+int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, u32 pasid,
+ bool enabled);
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u32 pasid);
--
2.17.2
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
@ 2023-09-25 7:01 ` Baolu Lu
2023-09-25 9:08 ` Joao Martins
2023-10-16 0:51 ` Baolu Lu
` (3 subsequent siblings)
4 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-09-25 7:01 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/23/23 9:25 AM, Joao Martins wrote:
> IOMMU advertises Access/Dirty bits for second-stage page table if the
> extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
> The first stage table is compatible with CPU page table thus A/D bits are
> implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
> "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
> "3.7.2 Accessed and Dirty Flags".
>
> First stage page table is enabled by default so it's allowed to set dirty
> tracking and no control bits needed, it just returns 0. To use SSADS, set
> bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
> via pasid_flush_caches() following the manual. Relevant SDM refs:
>
> "3.7.2 Accessed and Dirty Flags"
> "6.5.3.3 Guidance to Software for Invalidations,
> Table 23. Guidance to Software for Invalidations"
>
> PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
> IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
> iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus
> the caller of the iommu op will flush the IOTLB. Relevant manuals over
> the hardware translation is chapter 6 with some special mention to:
>
> "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
> "6.2.4 IOTLB"
>
> Signed-off-by: Joao Martins<joao.m.martins@oracle.com>
> ---
> The IOPTE walker is still a bit inneficient. Making sure the UAPI/IOMMUFD is
> solid and agreed upon.
> ---
> drivers/iommu/intel/iommu.c | 94 +++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel/iommu.h | 15 ++++++
> drivers/iommu/intel/pasid.c | 94 +++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel/pasid.h | 4 ++
> 4 files changed, 207 insertions(+)
The code is probably incomplete. When attaching a domain to a device,
check the domain's dirty tracking capability against the device's
capabilities. If the domain's dirty tracking capability is set but the
device does not support it, the attach callback should return -EINVAL.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-25 7:01 ` Baolu Lu
@ 2023-09-25 9:08 ` Joao Martins
2023-10-16 2:26 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-09-25 9:08 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 25/09/2023 08:01, Baolu Lu wrote:
> On 9/23/23 9:25 AM, Joao Martins wrote:
>> IOMMU advertises Access/Dirty bits for second-stage page table if the
>> extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
>> The first stage table is compatible with CPU page table thus A/D bits are
>> implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
>> "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
>> "3.7.2 Accessed and Dirty Flags".
>>
>> First stage page table is enabled by default so it's allowed to set dirty
>> tracking and no control bits needed, it just returns 0. To use SSADS, set
>> bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
>> via pasid_flush_caches() following the manual. Relevant SDM refs:
>>
>> "3.7.2 Accessed and Dirty Flags"
>> "6.5.3.3 Guidance to Software for Invalidations,
>> Table 23. Guidance to Software for Invalidations"
>>
>> PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
>> IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
>> iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus
>> the caller of the iommu op will flush the IOTLB. Relevant manuals over
>> the hardware translation is chapter 6 with some special mention to:
>>
>> "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
>> "6.2.4 IOTLB"
>>
>> Signed-off-by: Joao Martins<joao.m.martins@oracle.com>
>> ---
>> The IOPTE walker is still a bit inneficient. Making sure the UAPI/IOMMUFD is
>> solid and agreed upon.
>> ---
>> drivers/iommu/intel/iommu.c | 94 +++++++++++++++++++++++++++++++++++++
>> drivers/iommu/intel/iommu.h | 15 ++++++
>> drivers/iommu/intel/pasid.c | 94 +++++++++++++++++++++++++++++++++++++
>> drivers/iommu/intel/pasid.h | 4 ++
>> 4 files changed, 207 insertions(+)
>
> The code is probably incomplete. When attaching a domain to a device,
> check the domain's dirty tracking capability against the device's
> capabilities. If the domain's dirty tracking capability is set but the
> device does not support it, the attach callback should return -EINVAL.
>
Yeap, I did that for AMD, but it seems in the mix of changes I may have deleted
and then forgot to include it here.
Here's what I added (together with consolidated cap checking):
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7d5a8f5283a7..fabfe363f1f9 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4075,6 +4075,11 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL;
}
+static bool intel_iommu_slads_supported(struct intel_iommu *iommu)
+{
+ return sm_supported(iommu) && ecap_slads(iommu->ecap);
+}
+
static struct iommu_domain *
intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
{
@@ -4090,7 +4095,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
return ERR_PTR(-EOPNOTSUPP);
if (enforce_dirty &&
- !device_iommu_capable(dev, IOMMU_CAP_DIRTY))
+ !intel_iommu_slads_supported(iommu))
return ERR_PTR(-EOPNOTSUPP);
domain = iommu_domain_alloc(dev->bus);
@@ -4121,6 +4126,9 @@ static int prepare_domain_attach_device(struct
iommu_domain *domain,
if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
return -EINVAL;
+ if (domain->dirty_ops && !intel_iommu_slads_supported(iommu))
+ return -EINVAL;
+
/* check if this iommu agaw is sufficient for max mapped address */
addr_width = agaw_to_width(iommu->agaw);
if (addr_width > cap_mgaw(iommu->cap))
@@ -4376,8 +4384,7 @@ static bool intel_iommu_capable(struct device *dev, enum
iommu_cap cap)
case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
return ecap_sc_support(info->iommu->ecap);
case IOMMU_CAP_DIRTY:
- return sm_supported(info->iommu) &&
- ecap_slads(info->iommu->ecap);
+ return intel_iommu_slads_supported(info->iommu);
default:
return false;
}
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-25 9:08 ` Joao Martins
@ 2023-10-16 2:26 ` Baolu Lu
0 siblings, 0 replies; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 2:26 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/25/23 5:08 PM, Joao Martins wrote:
>
>
> On 25/09/2023 08:01, Baolu Lu wrote:
>> On 9/23/23 9:25 AM, Joao Martins wrote:
>>> IOMMU advertises Access/Dirty bits for second-stage page table if the
>>> extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS).
>>> The first stage table is compatible with CPU page table thus A/D bits are
>>> implicitly supported. Relevant Intel IOMMU SDM ref for first stage table
>>> "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table
>>> "3.7.2 Accessed and Dirty Flags".
>>>
>>> First stage page table is enabled by default so it's allowed to set dirty
>>> tracking and no control bits needed, it just returns 0. To use SSADS, set
>>> bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB
>>> via pasid_flush_caches() following the manual. Relevant SDM refs:
>>>
>>> "3.7.2 Accessed and Dirty Flags"
>>> "6.5.3.3 Guidance to Software for Invalidations,
>>> Table 23. Guidance to Software for Invalidations"
>>>
>>> PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush
>>> IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that
>>> iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus
>>> the caller of the iommu op will flush the IOTLB. Relevant manuals over
>>> the hardware translation is chapter 6 with some special mention to:
>>>
>>> "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
>>> "6.2.4 IOTLB"
>>>
>>> Signed-off-by: Joao Martins<joao.m.martins@oracle.com>
>>> ---
>>> The IOPTE walker is still a bit inneficient. Making sure the UAPI/IOMMUFD is
>>> solid and agreed upon.
>>> ---
>>> drivers/iommu/intel/iommu.c | 94 +++++++++++++++++++++++++++++++++++++
>>> drivers/iommu/intel/iommu.h | 15 ++++++
>>> drivers/iommu/intel/pasid.c | 94 +++++++++++++++++++++++++++++++++++++
>>> drivers/iommu/intel/pasid.h | 4 ++
>>> 4 files changed, 207 insertions(+)
>>
>> The code is probably incomplete. When attaching a domain to a device,
>> check the domain's dirty tracking capability against the device's
>> capabilities. If the domain's dirty tracking capability is set but the
>> device does not support it, the attach callback should return -EINVAL.
>>
> Yeap, I did that for AMD, but it seems in the mix of changes I may have deleted
> and then forgot to include it here.
>
> Here's what I added (together with consolidated cap checking):
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 7d5a8f5283a7..fabfe363f1f9 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4075,6 +4075,11 @@ static struct iommu_domain
> *intel_iommu_domain_alloc(unsigned type)
> return NULL;
> }
>
> +static bool intel_iommu_slads_supported(struct intel_iommu *iommu)
> +{
> + return sm_supported(iommu) && ecap_slads(iommu->ecap);
> +}
> +
> static struct iommu_domain *
> intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
> {
> @@ -4090,7 +4095,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
> return ERR_PTR(-EOPNOTSUPP);
>
> if (enforce_dirty &&
> - !device_iommu_capable(dev, IOMMU_CAP_DIRTY))
> + !intel_iommu_slads_supported(iommu))
> return ERR_PTR(-EOPNOTSUPP);
>
> domain = iommu_domain_alloc(dev->bus);
> @@ -4121,6 +4126,9 @@ static int prepare_domain_attach_device(struct
> iommu_domain *domain,
> if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
> return -EINVAL;
>
> + if (domain->dirty_ops && !intel_iommu_slads_supported(iommu))
> + return -EINVAL;
> +
> /* check if this iommu agaw is sufficient for max mapped address */
> addr_width = agaw_to_width(iommu->agaw);
> if (addr_width > cap_mgaw(iommu->cap))
> @@ -4376,8 +4384,7 @@ static bool intel_iommu_capable(struct device *dev, enum
> iommu_cap cap)
> case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
> return ecap_sc_support(info->iommu->ecap);
> case IOMMU_CAP_DIRTY:
> - return sm_supported(info->iommu) &&
> - ecap_slads(info->iommu->ecap);
> + return intel_iommu_slads_supported(info->iommu);
> default:
> return false;
> }
Yes. Above additional change looks good to me.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
2023-09-25 7:01 ` Baolu Lu
@ 2023-10-16 0:51 ` Baolu Lu
2023-10-16 10:42 ` Joao Martins
2023-10-16 1:37 ` Baolu Lu
` (2 subsequent siblings)
4 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 0:51 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/23/23 9:25 AM, Joao Martins wrote:
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -300,6 +300,7 @@ static int iommu_skip_te_disable;
> #define IDENTMAP_AZALIA 4
>
> const struct iommu_ops intel_iommu_ops;
> +const struct iommu_dirty_ops intel_dirty_ops;
>
> static bool translation_pre_enabled(struct intel_iommu *iommu)
> {
> @@ -4077,6 +4078,7 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
> static struct iommu_domain *
> intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
> {
> + bool enforce_dirty = (flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY);
> struct iommu_domain *domain;
> struct intel_iommu *iommu;
>
> @@ -4087,9 +4089,15 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
> if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
> return ERR_PTR(-EOPNOTSUPP);
>
> + if (enforce_dirty &&
> + !device_iommu_capable(dev, IOMMU_CAP_DIRTY))
> + return ERR_PTR(-EOPNOTSUPP);
> +
> domain = iommu_domain_alloc(dev->bus);
> if (!domain)
> domain = ERR_PTR(-ENOMEM);
> + if (domain && enforce_dirty)
@domain can not be NULL here.
> + domain->dirty_ops = &intel_dirty_ops;
> return domain;
> }
The VT-d driver always uses second level for a user domain translation.
In order to avoid checks of "domain->use_first_level" in the callbacks,
how about check it here and return failure if first level is used for
user domain?
[...]
domain = iommu_domain_alloc(dev->bus);
if (!domain)
return ERR_PTR(-ENOMEM);
if (enforce_dirty) {
if (to_dmar_domain(domain)->use_first_level) {
iommu_domain_free(domain);
return ERR_PTR(-EOPNOTSUPP);
}
domain->dirty_ops = &intel_dirty_ops;
}
return domain;
>
> @@ -4367,6 +4375,9 @@ static bool intel_iommu_capable(struct device *dev, enum iommu_cap cap)
> return dmar_platform_optin();
> case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
> return ecap_sc_support(info->iommu->ecap);
> + case IOMMU_CAP_DIRTY:
> + return sm_supported(info->iommu) &&
> + ecap_slads(info->iommu->ecap);
Above appears several times in this patch. Is it possible to define it
as a macro?
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index bccd44db3316..379e141bbb28 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -542,6 +542,8 @@ enum {
#define sm_supported(iommu) (intel_iommu_sm &&
ecap_smts((iommu)->ecap))
#define pasid_supported(iommu) (sm_supported(iommu) && \
ecap_pasid((iommu)->ecap))
+#define slads_supported(iommu) (sm_supported(iommu) && \
+ ecap_slads((iommu)->ecap))
> default:
> return false;
> }
Best regards,
baolu
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 0:51 ` Baolu Lu
@ 2023-10-16 10:42 ` Joao Martins
2023-10-16 12:41 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 10:42 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 16/10/2023 01:51, Baolu Lu wrote:
> On 9/23/23 9:25 AM, Joao Martins wrote:
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -300,6 +300,7 @@ static int iommu_skip_te_disable;
>> #define IDENTMAP_AZALIA 4
>> const struct iommu_ops intel_iommu_ops;
>> +const struct iommu_dirty_ops intel_dirty_ops;
>> static bool translation_pre_enabled(struct intel_iommu *iommu)
>> {
>> @@ -4077,6 +4078,7 @@ static struct iommu_domain
>> *intel_iommu_domain_alloc(unsigned type)
>> static struct iommu_domain *
>> intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
>> {
>> + bool enforce_dirty = (flags & IOMMU_HWPT_ALLOC_ENFORCE_DIRTY);
>> struct iommu_domain *domain;
>> struct intel_iommu *iommu;
>> @@ -4087,9 +4089,15 @@ intel_iommu_domain_alloc_user(struct device *dev, u32
>> flags)
>> if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
>> return ERR_PTR(-EOPNOTSUPP);
>> + if (enforce_dirty &&
>> + !device_iommu_capable(dev, IOMMU_CAP_DIRTY))
>> + return ERR_PTR(-EOPNOTSUPP);
>> +
>> domain = iommu_domain_alloc(dev->bus);
>> if (!domain)
>> domain = ERR_PTR(-ENOMEM);
>> + if (domain && enforce_dirty)
>
> @domain can not be NULL here.
>
True, it should be:
if (!IS_ERR(domain) && enforce_dirty)
>> + domain->dirty_ops = &intel_dirty_ops;
>> return domain;
>> }
>
> The VT-d driver always uses second level for a user domain translation.
> In order to avoid checks of "domain->use_first_level" in the callbacks,
> how about check it here and return failure if first level is used for
> user domain?
>
I was told by Yi Y Sun offlist to have the first_level checked, because dirty
bit in first stage page table is always enabled (and cannot be toggled on/off).
I can remove it again; initially RFC didn't have it as it was failing in similar
way to how you suggest here. Not sure how to proceed?
>
> [...]
> domain = iommu_domain_alloc(dev->bus);
> if (!domain)
> return ERR_PTR(-ENOMEM);
>
> if (enforce_dirty) {
> if (to_dmar_domain(domain)->use_first_level) {
> iommu_domain_free(domain);
> return ERR_PTR(-EOPNOTSUPP);
> }
> domain->dirty_ops = &intel_dirty_ops;
> }
>
> return domain;
>
Should I fail on first level pagetable dirty enforcement, certainly will adopt
the above (and remove the sucessfully return on first_level set_dirty_tracking)
>> @@ -4367,6 +4375,9 @@ static bool intel_iommu_capable(struct device *dev,
>> enum iommu_cap cap)
>> return dmar_platform_optin();
>> case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
>> return ecap_sc_support(info->iommu->ecap);
>> + case IOMMU_CAP_DIRTY:
>> + return sm_supported(info->iommu) &&
>> + ecap_slads(info->iommu->ecap);
>
> Above appears several times in this patch. Is it possible to define it
> as a macro?
>
Yeap, for sure much cleaner indeed.
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index bccd44db3316..379e141bbb28 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -542,6 +542,8 @@ enum {
> #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
> #define pasid_supported(iommu) (sm_supported(iommu) && \
> ecap_pasid((iommu)->ecap))
> +#define slads_supported(iommu) (sm_supported(iommu) && \
> + ecap_slads((iommu)->ecap))
>
Yeap.
>> default:
>> return false;
>> }
>
> Best regards,
> baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 10:42 ` Joao Martins
@ 2023-10-16 12:41 ` Baolu Lu
0 siblings, 0 replies; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 12:41 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 2023/10/16 18:42, Joao Martins wrote:
>>> + domain->dirty_ops = &intel_dirty_ops;
>>> return domain;
>>> }
>> The VT-d driver always uses second level for a user domain translation.
>> In order to avoid checks of "domain->use_first_level" in the callbacks,
>> how about check it here and return failure if first level is used for
>> user domain?
>>
> I was told by Yi Y Sun offlist to have the first_level checked, because dirty
> bit in first stage page table is always enabled (and cannot be toggled on/off).
> I can remove it again; initially RFC didn't have it as it was failing in similar
> way to how you suggest here. Not sure how to proceed?
Yi was right. But we currently have no use case for dirty tracking in
the first-level page table. So let's start from only supporting it in
the second level page table.
If we later identify a use case for dirty tracking in the first level
page table, we can then add the code with appropriate testing efforts.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
2023-09-25 7:01 ` Baolu Lu
2023-10-16 0:51 ` Baolu Lu
@ 2023-10-16 1:37 ` Baolu Lu
2023-10-16 10:57 ` Joao Martins
2023-10-16 2:07 ` Baolu Lu
2023-10-16 2:21 ` Baolu Lu
4 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 1:37 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/23/23 9:25 AM, Joao Martins wrote:
[...]
>
> +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
> + bool enable)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + int ret = -EINVAL;
> +
> + spin_lock(&dmar_domain->lock);
> + if (!(dmar_domain->dirty_tracking ^ enable) ||
Just out of curiosity, can we simply write
dmar_domain->dirty_tracking == enable
instead? I am not sure whether the compiler will be happy with this.
> + list_empty(&dmar_domain->devices)) {
list_for_each_entry is no op if list is empty, so no need to check it.
> + spin_unlock(&dmar_domain->lock);
> + return 0;
> + }
> +
> + list_for_each_entry(info, &dmar_domain->devices, link) {
> + /* First-level page table always enables dirty bit*/
> + if (dmar_domain->use_first_level) {
Since we leave out domain->use_first_level in the user_domain_alloc
function, we no longer need to check it here.
> + ret = 0;
> + break;
> + }
> +
> + ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain,
> + info->dev, IOMMU_NO_PASID,
> + enable);
> + if (ret)
> + break;
We need to unwind to the previous status here. We cannot leave some
devices with status @enable while others do not.
> +
> + }
The VT-d driver also support attaching domain to a pasid of a device. We
also need to enable dirty tracking on those devices.
> +
> + if (!ret)
> + dmar_domain->dirty_tracking = enable;
> + spin_unlock(&dmar_domain->lock);
> +
> + return ret;
> +}
I have made some changes to the code based on my above comments. Please
let me know what you think.
static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
bool enable)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct dev_pasid_info *dev_pasid;
struct device_domain_info *info;
int ret;
spin_lock(&dmar_domain->lock);
if (!(dmar_domain->dirty_tracking ^ enable))
goto out_unlock;
list_for_each_entry(info, &dmar_domain->devices, link) {
ret = intel_pasid_setup_dirty_tracking(info->iommu,
dmar_domain,
info->dev,
IOMMU_NO_PASID,
enable);
if (ret)
goto err_unwind;
}
list_for_each_entry(dev_pasid, &dmar_domain->dev_pasids,
link_domain) {
info = dev_iommu_priv_get(dev_pasid->dev);
ret = intel_pasid_setup_dirty_tracking(info->iommu,
dmar_domain,
info->dev,
dev_pasid->pasid,
enable);
if (ret)
goto err_unwind;
}
dmar_domain->dirty_tracking = enable;
out_unlock:
spin_unlock(&dmar_domain->lock);
return 0;
err_unwind:
list_for_each_entry(info, &dmar_domain->devices, link)
intel_pasid_setup_dirty_tracking(info->iommu,
dmar_domain, info->dev,
IOMMU_NO_PASID,
dmar_domain->dirty_tracking);
list_for_each_entry(dev_pasid, &dmar_domain->dev_pasids,
link_domain) {
info = dev_iommu_priv_get(dev_pasid->dev);
intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
info->dev,
dev_pasid->pasid,
dmar_domain->dirty_tracking);
}
spin_unlock(&dmar_domain->lock);
return ret;
}
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 1:37 ` Baolu Lu
@ 2023-10-16 10:57 ` Joao Martins
2023-10-16 11:42 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 10:57 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 16/10/2023 02:37, Baolu Lu wrote:
> On 9/23/23 9:25 AM, Joao Martins wrote:
> [...]
>> +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
>> + bool enable)
>> +{
>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> + struct device_domain_info *info;
>> + int ret = -EINVAL;
>> +
>> + spin_lock(&dmar_domain->lock);
>> + if (!(dmar_domain->dirty_tracking ^ enable) ||
>
> Just out of curiosity, can we simply write
>
> dmar_domain->dirty_tracking == enable
>
> instead? I am not sure whether the compiler will be happy with this.
>
Part of the check above, was just trying to avoid same-state transitions. the
above you wrote should work (...)
>> + list_empty(&dmar_domain->devices)) {
>
list_for_each_entry is no op if list is empty, so no need to check it.
>
Though this is unnecessary yes.
>> + spin_unlock(&dmar_domain->lock);
>> + return 0;
>> + }
>> +
>> + list_for_each_entry(info, &dmar_domain->devices, link) {
>> + /* First-level page table always enables dirty bit*/
>> + if (dmar_domain->use_first_level) {
>
> Since we leave out domain->use_first_level in the user_domain_alloc
> function, we no longer need to check it here.
>
>> + ret = 0;
>> + break;
>> + }
>> +
>> + ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain,
>> + info->dev, IOMMU_NO_PASID,
>> + enable);
>> + if (ret)
>> + break;
>
> We need to unwind to the previous status here. We cannot leave some
> devices with status @enable while others do not.
>
Ugh, yes
>> +
>> + }
>
> The VT-d driver also support attaching domain to a pasid of a device. We
> also need to enable dirty tracking on those devices.
>
True. But to be honest, I thought we weren't quite there yet in PASID support
from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
impression? From the code below, it clearly looks the driver does.
>> +
>> + if (!ret)
>> + dmar_domain->dirty_tracking = enable;
>> + spin_unlock(&dmar_domain->lock);
>> +
>> + return ret;
>> +}
>
> I have made some changes to the code based on my above comments. Please
> let me know what you think.
>
Looks great; thanks a lot for these changes!
> static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
> bool enable)
> {
> struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> struct dev_pasid_info *dev_pasid;
> struct device_domain_info *info;
> int ret;
>
> spin_lock(&dmar_domain->lock);
> if (!(dmar_domain->dirty_tracking ^ enable))
> goto out_unlock;
>
> list_for_each_entry(info, &dmar_domain->devices, link) {
> ret = intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
> info->dev, IOMMU_NO_PASID,
> enable);
> if (ret)
> goto err_unwind;
> }
>
> list_for_each_entry(dev_pasid, &dmar_domain->dev_pasids, link_domain) {
> info = dev_iommu_priv_get(dev_pasid->dev);
> ret = intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
> info->dev, dev_pasid->pasid,
> enable);
> if (ret)
> goto err_unwind;
> }
>
> dmar_domain->dirty_tracking = enable;
> out_unlock:
> spin_unlock(&dmar_domain->lock);
>
> return 0;
>
> err_unwind:
> list_for_each_entry(info, &dmar_domain->devices, link)
> intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
> info->dev,
> IOMMU_NO_PASID,
>
> dmar_domain->dirty_tracking);
> list_for_each_entry(dev_pasid, &dmar_domain->dev_pasids, link_domain) {
> info = dev_iommu_priv_get(dev_pasid->dev);
> intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
> info->dev, dev_pasid->pasid,
>
> dmar_domain->dirty_tracking);
> }
> spin_unlock(&dmar_domain->lock);
>
> return ret;
> }
>
> Best regards,
> baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 10:57 ` Joao Martins
@ 2023-10-16 11:42 ` Jason Gunthorpe
2023-10-16 12:58 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 11:42 UTC (permalink / raw)
To: Joao Martins
Cc: Baolu Lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
> True. But to be honest, I thought we weren't quite there yet in PASID support
> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
> impression? From the code below, it clearly looks the driver does.
I think we should plan that this series will go before the PASID
series
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 11:42 ` Jason Gunthorpe
@ 2023-10-16 12:58 ` Baolu Lu
2023-10-16 12:59 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 12:58 UTC (permalink / raw)
To: Jason Gunthorpe, Joao Martins
Cc: baolu.lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 2023/10/16 19:42, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>
>> True. But to be honest, I thought we weren't quite there yet in PASID support
>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
>> impression? From the code below, it clearly looks the driver does.
> I think we should plan that this series will go before the PASID
> series
I know that PASID support in IOMMUFD is not yet available, but the VT-d
driver already supports attaching a domain to a PASID, as required by
the idxd driver for kernel DMA with PASID. Therefore, from the driver's
perspective, dirty tracking should also be enabled for PASIDs.
However, I am also fine with deferring this code until PASID support is
added to IOMMUFD. In that case, it's better to add a comment to remind
people to revisit this issue later.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 12:58 ` Baolu Lu
@ 2023-10-16 12:59 ` Jason Gunthorpe
2023-10-16 13:01 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-16 12:59 UTC (permalink / raw)
To: Baolu Lu
Cc: Joao Martins, iommu, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
> On 2023/10/16 19:42, Jason Gunthorpe wrote:
> > On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
> >
> > > True. But to be honest, I thought we weren't quite there yet in PASID support
> > > from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
> > > impression? From the code below, it clearly looks the driver does.
> > I think we should plan that this series will go before the PASID
> > series
>
> I know that PASID support in IOMMUFD is not yet available, but the VT-d
> driver already supports attaching a domain to a PASID, as required by
> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
> perspective, dirty tracking should also be enabled for PASIDs.
As long as the driver refuses to attach a dirty track enabled domain
to PASID it would be fine for now.
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 12:59 ` Jason Gunthorpe
@ 2023-10-16 13:01 ` Baolu Lu
2023-10-17 10:51 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 13:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: baolu.lu, Joao Martins, iommu, Kevin Tian,
Shameerali Kolothum Thodi, Yi Liu, Yi Y Sun, Nicolin Chen,
Joerg Roedel, Suravee Suthikulpanit, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 2023/10/16 20:59, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>
>>>> True. But to be honest, I thought we weren't quite there yet in PASID support
>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
>>>> impression? From the code below, it clearly looks the driver does.
>>> I think we should plan that this series will go before the PASID
>>> series
>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>> driver already supports attaching a domain to a PASID, as required by
>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>> perspective, dirty tracking should also be enabled for PASIDs.
> As long as the driver refuses to attach a dirty track enabled domain
> to PASID it would be fine for now.
Yes. This works.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 13:01 ` Baolu Lu
@ 2023-10-17 10:51 ` Joao Martins
2023-10-17 12:41 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 10:51 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 16/10/2023 14:01, Baolu Lu wrote:
> On 2023/10/16 20:59, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>>
>>>>> True. But to be honest, I thought we weren't quite there yet in PASID support
>>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
>>>>> impression? From the code below, it clearly looks the driver does.
>>>> I think we should plan that this series will go before the PASID
>>>> series
>>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>>> driver already supports attaching a domain to a PASID, as required by
>>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>>> perspective, dirty tracking should also be enabled for PASIDs.
>> As long as the driver refuses to attach a dirty track enabled domain
>> to PASID it would be fine for now.
>
> Yes. This works.
Baolu Lu, I am blocking PASID attachment this way; let me know if this matches
how would you have the driver refuse dirty tracking on PASID.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 66b0e1d5a98c..d33df02f0f2d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4123,7 +4123,7 @@ static void intel_iommu_domain_free(struct iommu_domain
*domain)
}
static int prepare_domain_attach_device(struct iommu_domain *domain,
- struct device *dev)
+ struct device *dev, ioasid_t pasid)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
@@ -4136,7 +4136,8 @@ static int prepare_domain_attach_device(struct
iommu_domain *domain,
if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
return -EINVAL;
- if (domain->dirty_ops && !slads_supported(iommu))
+ if (domain->dirty_ops &&
+ (!slads_supported(iommu) || pasid != IOMMU_NO_PASID))
return -EINVAL;
/* check if this iommu agaw is sufficient for max mapped address */
@@ -4174,7 +4175,7 @@ static int intel_iommu_attach_device(struct iommu_domain
*domain,
if (info->domain)
device_block_translation(dev);
- ret = prepare_domain_attach_device(domain, dev);
+ ret = prepare_domain_attach_device(domain, dev, IOMMU_NO_PASID);
if (ret)
return ret;
@@ -4795,7 +4796,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
*domain,
if (context_copied(iommu, info->bus, info->devfn))
return -EBUSY;
- ret = prepare_domain_attach_device(domain, dev);
+ ret = prepare_domain_attach_device(domain, dev, pasid);
if (ret)
return ret;
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 10:51 ` Joao Martins
@ 2023-10-17 12:41 ` Baolu Lu
2023-10-17 14:16 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-17 12:41 UTC (permalink / raw)
To: Joao Martins, Jason Gunthorpe
Cc: baolu.lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 2023/10/17 18:51, Joao Martins wrote:
> On 16/10/2023 14:01, Baolu Lu wrote:
>> On 2023/10/16 20:59, Jason Gunthorpe wrote:
>>> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>>>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>>>
>>>>>> True. But to be honest, I thought we weren't quite there yet in PASID support
>>>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the wrong
>>>>>> impression? From the code below, it clearly looks the driver does.
>>>>> I think we should plan that this series will go before the PASID
>>>>> series
>>>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>>>> driver already supports attaching a domain to a PASID, as required by
>>>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>>>> perspective, dirty tracking should also be enabled for PASIDs.
>>> As long as the driver refuses to attach a dirty track enabled domain
>>> to PASID it would be fine for now.
>> Yes. This works.
> Baolu Lu, I am blocking PASID attachment this way; let me know if this matches
> how would you have the driver refuse dirty tracking on PASID.
Joao, how about blocking pasid attachment in intel_iommu_set_dev_pasid()
directly?
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c86ba5a3e75c..392b6ca9ce90 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4783,6 +4783,9 @@ static int intel_iommu_set_dev_pasid(struct
iommu_domain *domain,
if (context_copied(iommu, info->bus, info->devfn))
return -EBUSY;
+ if (domain->dirty_ops)
+ return -EOPNOTSUPP;
+
ret = prepare_domain_attach_device(domain, dev);
if (ret)
return ret;
Best regards,
baolu
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 12:41 ` Baolu Lu
@ 2023-10-17 14:16 ` Joao Martins
2023-10-17 14:25 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:16 UTC (permalink / raw)
To: Baolu Lu, Jason Gunthorpe
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On 17/10/2023 13:41, Baolu Lu wrote:
> On 2023/10/17 18:51, Joao Martins wrote:
>> On 16/10/2023 14:01, Baolu Lu wrote:
>>> On 2023/10/16 20:59, Jason Gunthorpe wrote:
>>>> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>>>>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>>>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>>>>
>>>>>>> True. But to be honest, I thought we weren't quite there yet in PASID
>>>>>>> support
>>>>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the
>>>>>>> wrong
>>>>>>> impression? From the code below, it clearly looks the driver does.
>>>>>> I think we should plan that this series will go before the PASID
>>>>>> series
>>>>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>>>>> driver already supports attaching a domain to a PASID, as required by
>>>>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>>>>> perspective, dirty tracking should also be enabled for PASIDs.
>>>> As long as the driver refuses to attach a dirty track enabled domain
>>>> to PASID it would be fine for now.
>>> Yes. This works.
>> Baolu Lu, I am blocking PASID attachment this way; let me know if this matches
>> how would you have the driver refuse dirty tracking on PASID.
>
> Joao, how about blocking pasid attachment in intel_iommu_set_dev_pasid()
> directly?
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index c86ba5a3e75c..392b6ca9ce90 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4783,6 +4783,9 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
> *domain,
> if (context_copied(iommu, info->bus, info->devfn))
> return -EBUSY;
>
> + if (domain->dirty_ops)
> + return -EOPNOTSUPP;
> +
> ret = prepare_domain_attach_device(domain, dev);
> if (ret)
> return ret;
I was trying to centralize all the checks, but I can place it here if you prefer
this way.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 14:16 ` Joao Martins
@ 2023-10-17 14:25 ` Joao Martins
2023-10-18 2:06 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:25 UTC (permalink / raw)
To: Baolu Lu
Cc: iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit, Will Deacon,
Robin Murphy, Alex Williamson, kvm, Jason Gunthorpe
On 17/10/2023 15:16, Joao Martins wrote:
> On 17/10/2023 13:41, Baolu Lu wrote:
>> On 2023/10/17 18:51, Joao Martins wrote:
>>> On 16/10/2023 14:01, Baolu Lu wrote:
>>>> On 2023/10/16 20:59, Jason Gunthorpe wrote:
>>>>> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>>>>>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>>>>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>>>>>
>>>>>>>> True. But to be honest, I thought we weren't quite there yet in PASID
>>>>>>>> support
>>>>>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the
>>>>>>>> wrong
>>>>>>>> impression? From the code below, it clearly looks the driver does.
>>>>>>> I think we should plan that this series will go before the PASID
>>>>>>> series
>>>>>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>>>>>> driver already supports attaching a domain to a PASID, as required by
>>>>>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>>>>>> perspective, dirty tracking should also be enabled for PASIDs.
>>>>> As long as the driver refuses to attach a dirty track enabled domain
>>>>> to PASID it would be fine for now.
>>>> Yes. This works.
>>> Baolu Lu, I am blocking PASID attachment this way; let me know if this matches
>>> how would you have the driver refuse dirty tracking on PASID.
>>
>> Joao, how about blocking pasid attachment in intel_iommu_set_dev_pasid()
>> directly?
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index c86ba5a3e75c..392b6ca9ce90 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -4783,6 +4783,9 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
>> *domain,
>> if (context_copied(iommu, info->bus, info->devfn))
>> return -EBUSY;
>>
>> + if (domain->dirty_ops)
>> + return -EOPNOTSUPP;
>> +
>> ret = prepare_domain_attach_device(domain, dev);
>> if (ret)
>> return ret;
>
> I was trying to centralize all the checks, but I can place it here if you prefer
> this way.
>
Minor change, I'm changing error code to -EINVAL to align with non-PASID case.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 14:25 ` Joao Martins
@ 2023-10-18 2:06 ` Baolu Lu
0 siblings, 0 replies; 140+ messages in thread
From: Baolu Lu @ 2023-10-18 2:06 UTC (permalink / raw)
To: Joao Martins
Cc: baolu.lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm, Jason Gunthorpe
On 10/17/23 10:25 PM, Joao Martins wrote:
> On 17/10/2023 15:16, Joao Martins wrote:
>> On 17/10/2023 13:41, Baolu Lu wrote:
>>> On 2023/10/17 18:51, Joao Martins wrote:
>>>> On 16/10/2023 14:01, Baolu Lu wrote:
>>>>> On 2023/10/16 20:59, Jason Gunthorpe wrote:
>>>>>> On Mon, Oct 16, 2023 at 08:58:42PM +0800, Baolu Lu wrote:
>>>>>>> On 2023/10/16 19:42, Jason Gunthorpe wrote:
>>>>>>>> On Mon, Oct 16, 2023 at 11:57:34AM +0100, Joao Martins wrote:
>>>>>>>>
>>>>>>>>> True. But to be honest, I thought we weren't quite there yet in PASID
>>>>>>>>> support
>>>>>>>>> from IOMMUFD perspective; hence why I didn't aim at it. Or do I have the
>>>>>>>>> wrong
>>>>>>>>> impression? From the code below, it clearly looks the driver does.
>>>>>>>> I think we should plan that this series will go before the PASID
>>>>>>>> series
>>>>>>> I know that PASID support in IOMMUFD is not yet available, but the VT-d
>>>>>>> driver already supports attaching a domain to a PASID, as required by
>>>>>>> the idxd driver for kernel DMA with PASID. Therefore, from the driver's
>>>>>>> perspective, dirty tracking should also be enabled for PASIDs.
>>>>>> As long as the driver refuses to attach a dirty track enabled domain
>>>>>> to PASID it would be fine for now.
>>>>> Yes. This works.
>>>> Baolu Lu, I am blocking PASID attachment this way; let me know if this matches
>>>> how would you have the driver refuse dirty tracking on PASID.
>>>
>>> Joao, how about blocking pasid attachment in intel_iommu_set_dev_pasid()
>>> directly?
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index c86ba5a3e75c..392b6ca9ce90 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -4783,6 +4783,9 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
>>> *domain,
>>> if (context_copied(iommu, info->bus, info->devfn))
>>> return -EBUSY;
>>>
>>> + if (domain->dirty_ops)
>>> + return -EOPNOTSUPP;
>>> +
>>> ret = prepare_domain_attach_device(domain, dev);
>>> if (ret)
>>> return ret;
>>
>> I was trying to centralize all the checks, but I can place it here if you prefer
>> this way.
We will soon remove this check when pasid is supported in iommufd. So
less code change is better for future work.
>>
> Minor change, I'm changing error code to -EINVAL to align with non-PASID case.
Yes. Make sense. -EINVAL means "not compatible". The caller can retry.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
` (2 preceding siblings ...)
2023-10-16 1:37 ` Baolu Lu
@ 2023-10-16 2:07 ` Baolu Lu
2023-10-16 11:26 ` Joao Martins
2023-10-16 2:21 ` Baolu Lu
4 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 2:07 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/23/23 9:25 AM, Joao Martins wrote:
[...]
> +static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain,
> + unsigned long iova, size_t size,
> + unsigned long flags,
> + struct iommu_dirty_bitmap *dirty)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + unsigned long end = iova + size - 1;
> + unsigned long pgsize;
> + bool ad_enabled;
> +
> + spin_lock(&dmar_domain->lock);
> + ad_enabled = dmar_domain->dirty_tracking;
> + spin_unlock(&dmar_domain->lock);
The spin lock is to protect the RID and PASID device tracking list. No
need to use it here.
> +
> + if (!ad_enabled && dirty->bitmap)
> + return -EINVAL;
I don't understand above check of "dirty->bitmap". Isn't it always
invalid to call this if dirty tracking is not enabled on the domain?
The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
need to understand it and check its member anyway.
Or, I overlooked anything?
> +
> + rcu_read_lock();
Do we really need a rcu lock here? This operation is protected by
iopt->iova_rwsem. Is it reasonable to remove it? If not, how about put
some comments around it?
> + do {
> + struct dma_pte *pte;
> + int lvl = 0;
> +
> + pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl,
> + GFP_ATOMIC);
> + pgsize = level_size(lvl) << VTD_PAGE_SHIFT;
> + if (!pte || !dma_pte_present(pte)) {
> + iova += pgsize;
> + continue;
> + }
> +
> + /* It is writable, set the bitmap */
> + if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
> + dma_sl_pte_dirty(pte)) ||
> + dma_sl_pte_test_and_clear_dirty(pte))
> + iommu_dirty_bitmap_record(dirty, iova, pgsize);
> + iova += pgsize;
> + } while (iova < end);
> + rcu_read_unlock();
> +
> + return 0;
> +}
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 2:07 ` Baolu Lu
@ 2023-10-16 11:26 ` Joao Martins
2023-10-16 16:00 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 11:26 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 16/10/2023 03:07, Baolu Lu wrote:
> On 9/23/23 9:25 AM, Joao Martins wrote:
> [...]
>> +static int intel_iommu_read_and_clear_dirty(struct iommu_domain *domain,
>> + unsigned long iova, size_t size,
>> + unsigned long flags,
>> + struct iommu_dirty_bitmap *dirty)
>> +{
>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> + unsigned long end = iova + size - 1;
>> + unsigned long pgsize;
>> + bool ad_enabled;
>> +
>> + spin_lock(&dmar_domain->lock);
>> + ad_enabled = dmar_domain->dirty_tracking;
>> + spin_unlock(&dmar_domain->lock);
>
> The spin lock is to protect the RID and PASID device tracking list. No
> need to use it here.
>
OK
>> +
>> + if (!ad_enabled && dirty->bitmap)
>> + return -EINVAL;
>
> I don't understand above check of "dirty->bitmap". Isn't it always
> invalid to call this if dirty tracking is not enabled on the domain?
>
It is spurious (...)
> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
> need to understand it and check its member anyway.
>
(...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
already makes those checks in case there's no iova_bitmap to set bits to.
> Or, I overlooked anything?
>
>> +
>> + rcu_read_lock();
>
> Do we really need a rcu lock here? This operation is protected by
> iopt->iova_rwsem. Is it reasonable to remove it? If not, how about put
> some comments around it?
>
As I had mentioned in an earlier comment, this was not meant to be here.
>> + do {
>> + struct dma_pte *pte;
>> + int lvl = 0;
>> +
>> + pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &lvl,
>> + GFP_ATOMIC);
>> + pgsize = level_size(lvl) << VTD_PAGE_SHIFT;
>> + if (!pte || !dma_pte_present(pte)) {
>> + iova += pgsize;
>> + continue;
>> + }
>> +
>> + /* It is writable, set the bitmap */
>> + if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
>> + dma_sl_pte_dirty(pte)) ||
>> + dma_sl_pte_test_and_clear_dirty(pte))
>> + iommu_dirty_bitmap_record(dirty, iova, pgsize);
>> + iova += pgsize;
>> + } while (iova < end);
>> + rcu_read_unlock();
>> +
>> + return 0;
>> +}
>
> Best regards,
> baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 11:26 ` Joao Martins
@ 2023-10-16 16:00 ` Joao Martins
2023-10-17 2:08 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 16:00 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 16/10/2023 12:26, Joao Martins wrote:
> On 16/10/2023 03:07, Baolu Lu wrote:
>> On 9/23/23 9:25 AM, Joao Martins wrote:
>>> +
>>> + if (!ad_enabled && dirty->bitmap)
>>> + return -EINVAL;
>>
>> I don't understand above check of "dirty->bitmap". Isn't it always
>> invalid to call this if dirty tracking is not enabled on the domain?
>>
> It is spurious (...)
>
I take this back;
>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>> need to understand it and check its member anyway.
>>
>
> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
> already makes those checks in case there's no iova_bitmap to set bits to.
>
This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
essentially not record anything in the iova bitmap and just clear the dirty bits
from the IOPTEs, all when dirty tracking is technically disabled. This is done
internally only when starting dirty tracking, and thus to ensure that we cleanup
all dirty bits before we enable dirty tracking to have a consistent snapshot as
opposed to inheriting dirties from the past.
Some alternative ways to do this: 1) via the iommu_dirty_bitmap structure, where
I add one field which if true then iommufd core is able to call into iommu
driver on a "clear IOPTE" manner or 2) via the ::flags ... the thing is that
::flags values is UAPI, so it feels weird to use these flags for internal purposes.
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 16:00 ` Joao Martins
@ 2023-10-17 2:08 ` Baolu Lu
2023-10-17 11:22 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-17 2:08 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 10/17/23 12:00 AM, Joao Martins wrote:
>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>> need to understand it and check its member anyway.
>>>
>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
>> already makes those checks in case there's no iova_bitmap to set bits to.
>>
> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
> essentially not record anything in the iova bitmap and just clear the dirty bits
> from the IOPTEs, all when dirty tracking is technically disabled. This is done
> internally only when starting dirty tracking, and thus to ensure that we cleanup
> all dirty bits before we enable dirty tracking to have a consistent snapshot as
> opposed to inheriting dirties from the past.
It's okay since it serves a functional purpose. Can you please add some
comments around the code to explain the rationale.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 2:08 ` Baolu Lu
@ 2023-10-17 11:22 ` Joao Martins
2023-10-17 12:49 ` Baolu Lu
2023-10-17 13:10 ` Jason Gunthorpe
0 siblings, 2 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 11:22 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 03:08, Baolu Lu wrote:
> On 10/17/23 12:00 AM, Joao Martins wrote:
>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>>> need to understand it and check its member anyway.
>>>>
>>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
>>> already makes those checks in case there's no iova_bitmap to set bits to.
>>>
>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
>> essentially not record anything in the iova bitmap and just clear the dirty bits
>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
>> internally only when starting dirty tracking, and thus to ensure that we cleanup
>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
>> opposed to inheriting dirties from the past.
>
> It's okay since it serves a functional purpose. Can you please add some
> comments around the code to explain the rationale.
>
I added this comment below:
+ /*
+ * IOMMUFD core calls into a dirty tracking disabled domain without an
+ * IOVA bitmap set in order to clean dirty bits in all PTEs that might
+ * have occured when we stopped dirty tracking. This ensures that we
+ * never inherit dirtied bits from a previous cycle.
+ */
Also fixed an issue where I could theoretically clear the bit with
IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
@@ -781,6 +788,16 @@ static inline bool dma_pte_present(struct dma_pte *pte)
return (pte->val & 3) != 0;
}
+static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
+ unsigned long flags)
+{
+ if (flags & IOMMU_DIRTY_NO_CLEAR)
+ return (pte->val & DMA_SL_PTE_DIRTY) != 0;
+
+ return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
+ (unsigned long *)&pte->val);
+}
+
Anyhow, see below the full diff compared to this patch. Some things are in tree
that is different to submitted from this patch.
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 2e56bd79f589..dedb8ae3cba8 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -15,6 +15,7 @@ config INTEL_IOMMU
select DMA_OPS
select IOMMU_API
select IOMMU_IOVA
+ select IOMMUFD_DRIVER
select NEED_DMA_MAP_STATE
select DMAR_TABLE
select SWIOTLB
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index fabfe363f1f9..0e3f532f3bca 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4075,11 +4075,6 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL;
}
-static bool intel_iommu_slads_supported(struct intel_iommu *iommu)
-{
- return sm_supported(iommu) && ecap_slads(iommu->ecap);
-}
-
static struct iommu_domain *
intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
{
@@ -4087,6 +4082,10 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
struct iommu_domain *domain;
struct intel_iommu *iommu;
+ if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT|
+ IOMMU_HWPT_ALLOC_ENFORCE_DIRTY)))
+ return ERR_PTR(-EOPNOTSUPP);
+
iommu = device_to_iommu(dev, NULL, NULL);
if (!iommu)
return ERR_PTR(-ENODEV);
@@ -4094,15 +4093,26 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags)
if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
- if (enforce_dirty &&
- !intel_iommu_slads_supported(iommu))
+ if (enforce_dirty && !slads_supported(iommu))
return ERR_PTR(-EOPNOTSUPP);
+ /*
+ * domain_alloc_user op needs to fully initialize a domain
+ * before return, so uses iommu_domain_alloc() here for
+ * simple.
+ */
domain = iommu_domain_alloc(dev->bus);
if (!domain)
domain = ERR_PTR(-ENOMEM);
- if (domain && enforce_dirty)
+
+ if (!IS_ERR(domain) && enforce_dirty) {
+ if (to_dmar_domain(domain)->use_first_level) {
+ iommu_domain_free(domain);
+ return ERR_PTR(-EOPNOTSUPP);
+ }
domain->dirty_ops = &intel_dirty_ops;
+ }
+
return domain;
}
@@ -4113,7 +4123,7 @@ static void intel_iommu_domain_free(struct iommu_domain
*domain)
}
static int prepare_domain_attach_device(struct iommu_domain *domain,
- struct device *dev)
+ struct device *dev, ioasid_t pasid)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
@@ -4126,7 +4136,8 @@ static int prepare_domain_attach_device(struct
iommu_domain *domain,
if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
return -EINVAL;
- if (domain->dirty_ops && !intel_iommu_slads_supported(iommu))
+ if (domain->dirty_ops &&
+ (!slads_supported(iommu) || pasid != IOMMU_NO_PASID))
return -EINVAL;
/* check if this iommu agaw is sufficient for max mapped address */
@@ -4164,7 +4175,7 @@ static int intel_iommu_attach_device(struct iommu_domain
*domain,
if (info->domain)
device_block_translation(dev);
- ret = prepare_domain_attach_device(domain, dev);
+ ret = prepare_domain_attach_device(domain, dev, IOMMU_NO_PASID);
if (ret)
return ret;
@@ -4384,7 +4395,7 @@ static bool intel_iommu_capable(struct device *dev, enum
iommu_cap cap)
case IOMMU_CAP_ENFORCE_CACHE_COHERENCY:
return ecap_sc_support(info->iommu->ecap);
case IOMMU_CAP_DIRTY:
- return intel_iommu_slads_supported(info->iommu);
+ return slads_supported(info->iommu);
default:
return false;
}
@@ -4785,7 +4796,7 @@ static int intel_iommu_set_dev_pasid(struct iommu_domain
*domain,
if (context_copied(iommu, info->bus, info->devfn))
return -EBUSY;
- ret = prepare_domain_attach_device(domain, dev);
+ ret = prepare_domain_attach_device(domain, dev, pasid);
if (ret)
return ret;
@@ -4848,31 +4859,31 @@ static int intel_iommu_set_dirty_tracking(struct
iommu_domain *domain,
int ret = -EINVAL;
spin_lock(&dmar_domain->lock);
- if (!(dmar_domain->dirty_tracking ^ enable) ||
- list_empty(&dmar_domain->devices)) {
- spin_unlock(&dmar_domain->lock);
- return 0;
- }
+ if (dmar_domain->dirty_tracking == enable)
+ goto out_unlock;
list_for_each_entry(info, &dmar_domain->devices, link) {
- /* First-level page table always enables dirty bit*/
- if (dmar_domain->use_first_level) {
- ret = 0;
- break;
- }
-
ret = intel_pasid_setup_dirty_tracking(info->iommu, info->domain,
info->dev, IOMMU_NO_PASID,
enable);
if (ret)
- break;
+ goto err_unwind;
}
if (!ret)
dmar_domain->dirty_tracking = enable;
+out_unlock:
spin_unlock(&dmar_domain->lock);
+ return 0;
+
+err_unwind:
+ list_for_each_entry(info, &dmar_domain->devices, link)
+ intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
+ info->dev, IOMMU_NO_PASID,
+ dmar_domain->dirty_tracking);
+ spin_unlock(&dmar_domain->lock);
return ret;
}
@@ -4886,14 +4897,16 @@ static int intel_iommu_read_and_clear_dirty(struct
iommu_domain *domain,
unsigned long pgsize;
bool ad_enabled;
- spin_lock(&dmar_domain->lock);
+ /*
+ * IOMMUFD core calls into a dirty tracking disabled domain without an
+ * IOVA bitmap set in order to clean dirty bits in all PTEs that might
+ * have occured when we stopped dirty tracking. This ensures that we
+ * never inherit dirtied bits from a previous cycle.
+ */
ad_enabled = dmar_domain->dirty_tracking;
- spin_unlock(&dmar_domain->lock);
-
if (!ad_enabled && dirty->bitmap)
return -EINVAL;
- rcu_read_lock();
do {
struct dma_pte *pte;
int lvl = 0;
@@ -4906,14 +4919,10 @@ static int intel_iommu_read_and_clear_dirty(struct
iommu_domain *domain,
continue;
}
- /* It is writable, set the bitmap */
- if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
- dma_sl_pte_dirty(pte)) ||
- dma_sl_pte_test_and_clear_dirty(pte))
+ if (dma_sl_pte_test_and_clear_dirty(pte, flags))
iommu_dirty_bitmap_record(dirty, iova, pgsize);
iova += pgsize;
} while (iova < end);
- rcu_read_unlock();
return 0;
}
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index bccd44db3316..0b390d9e669b 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -542,6 +542,9 @@ enum {
#define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
#define pasid_supported(iommu) (sm_supported(iommu) && \
ecap_pasid((iommu)->ecap))
+#define slads_supported(iommu) (sm_supported(iommu) && \
+ ecap_slads((iommu)->ecap))
+
struct pasid_entry;
struct pasid_state_entry;
@@ -785,13 +788,12 @@ static inline bool dma_pte_present(struct dma_pte *pte)
return (pte->val & 3) != 0;
}
-static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
+static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
+ unsigned long flags)
{
- return (pte->val & DMA_SL_PTE_DIRTY) != 0;
-}
+ if (flags & IOMMU_DIRTY_NO_CLEAR)
+ return (pte->val & DMA_SL_PTE_DIRTY) != 0;
-static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
-{
return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
(unsigned long *)&pte->val);
}
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 03814942d59c..785384a59d55 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -686,15 +686,29 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu
*iommu,
spin_lock(&iommu->lock);
- did = domain_id_iommu(domain, iommu);
pte = intel_pasid_get_entry(dev, pasid);
if (!pte) {
spin_unlock(&iommu->lock);
- dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+ dev_err_ratelimited(dev,
+ "Failed to get pasid entry of PASID %d\n",
+ pasid);
return -ENODEV;
}
+ did = domain_id_iommu(domain, iommu);
pgtt = pasid_pte_get_pgtt(pte);
+ if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && pgtt != PASID_ENTRY_PGTT_NESTED) {
+ spin_unlock(&iommu->lock);
+ dev_err_ratelimited(dev,
+ "Dirty tracking not supported on translation type %d\n",
+ pgtt);
+ return -EOPNOTSUPP;
+ }
+
+ if (pasid_get_ssade(pte) == enabled) {
+ spin_unlock(&iommu->lock);
+ return 0;
+ }
if (enabled)
pasid_set_ssade(pte);
@@ -702,6 +716,9 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
pasid_clear_ssade(pte);
spin_unlock(&iommu->lock);
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
/*
* From VT-d spec table 25 "Guidance to Software for Invalidations":
*
@@ -720,8 +737,6 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
- else
- qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
/* Device IOTLB doesn't need to be flushed in caching mode. */
if (!cap_caching_mode(iommu->cap))
^ permalink raw reply related [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 11:22 ` Joao Martins
@ 2023-10-17 12:49 ` Baolu Lu
2023-10-17 14:19 ` Joao Martins
2023-10-17 13:10 ` Jason Gunthorpe
1 sibling, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-17 12:49 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 2023/10/17 19:22, Joao Martins wrote:
> On 17/10/2023 03:08, Baolu Lu wrote:
>> On 10/17/23 12:00 AM, Joao Martins wrote:
>>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>>>> need to understand it and check its member anyway.
>>>>>
>>>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
>>>> already makes those checks in case there's no iova_bitmap to set bits to.
>>>>
>>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
>>> essentially not record anything in the iova bitmap and just clear the dirty bits
>>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
>>> internally only when starting dirty tracking, and thus to ensure that we cleanup
>>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
>>> opposed to inheriting dirties from the past.
>>
>> It's okay since it serves a functional purpose. Can you please add some
>> comments around the code to explain the rationale.
>>
>
> I added this comment below:
>
> + /*
> + * IOMMUFD core calls into a dirty tracking disabled domain without an
> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
> + * have occured when we stopped dirty tracking. This ensures that we
> + * never inherit dirtied bits from a previous cycle.
> + */
>
Yes. It's clear. Thank you!
> Also fixed an issue where I could theoretically clear the bit with
> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
>
> @@ -781,6 +788,16 @@ static inline bool dma_pte_present(struct dma_pte *pte)
> return (pte->val & 3) != 0;
> }
>
> +static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
> + unsigned long flags)
> +{
> + if (flags & IOMMU_DIRTY_NO_CLEAR)
> + return (pte->val & DMA_SL_PTE_DIRTY) != 0;
> +
> + return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
> + (unsigned long *)&pte->val);
> +}
> +
Yes. Sure.
> Anyhow, see below the full diff compared to this patch. Some things are in tree
> that is different to submitted from this patch.
[...]
> @@ -4113,7 +4123,7 @@ static void intel_iommu_domain_free(struct iommu_domain
> *domain)
> }
>
> static int prepare_domain_attach_device(struct iommu_domain *domain,
> - struct device *dev)
> + struct device *dev, ioasid_t pasid)
How about blocking pasid attaching in intel_iommu_set_dev_pasid().
> {
> struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> struct intel_iommu *iommu;
> @@ -4126,7 +4136,8 @@ static int prepare_domain_attach_device(struct
> iommu_domain *domain,
> if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
> return -EINVAL;
>
> - if (domain->dirty_ops && !intel_iommu_slads_supported(iommu))
> + if (domain->dirty_ops &&
> + (!slads_supported(iommu) || pasid != IOMMU_NO_PASID))
> return -EINVAL;
>
> /* check if this iommu agaw is sufficient for max mapped address */
[...]
>
> @@ -4886,14 +4897,16 @@ static int intel_iommu_read_and_clear_dirty(struct
> iommu_domain *domain,
> unsigned long pgsize;
> bool ad_enabled;
>
> - spin_lock(&dmar_domain->lock);
> + /*
> + * IOMMUFD core calls into a dirty tracking disabled domain without an
> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
> + * have occured when we stopped dirty tracking. This ensures that we
> + * never inherit dirtied bits from a previous cycle.
> + */
> ad_enabled = dmar_domain->dirty_tracking;
> - spin_unlock(&dmar_domain->lock);
> -
> if (!ad_enabled && dirty->bitmap)
How about
if (!dmar_domain->dirty_tracking && dirty->bitmap)
return -EINVAL;
?
> return -EINVAL;
>
> - rcu_read_lock();
> do {
> struct dma_pte *pte;
> int lvl = 0;
> @@ -4906,14 +4919,10 @@ static int intel_iommu_read_and_clear_dirty(struct
> iommu_domain *domain,
> continue;
> }
>
> - /* It is writable, set the bitmap */
> - if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
> - dma_sl_pte_dirty(pte)) ||
> - dma_sl_pte_test_and_clear_dirty(pte))
> + if (dma_sl_pte_test_and_clear_dirty(pte, flags))
> iommu_dirty_bitmap_record(dirty, iova, pgsize);
> iova += pgsize;
> } while (iova < end);
> - rcu_read_unlock();
>
> return 0;
> }
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index bccd44db3316..0b390d9e669b 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -542,6 +542,9 @@ enum {
> #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
> #define pasid_supported(iommu) (sm_supported(iommu) && \
> ecap_pasid((iommu)->ecap))
> +#define slads_supported(iommu) (sm_supported(iommu) && \
> + ecap_slads((iommu)->ecap))
> +
>
> struct pasid_entry;
> struct pasid_state_entry;
> @@ -785,13 +788,12 @@ static inline bool dma_pte_present(struct dma_pte *pte)
> return (pte->val & 3) != 0;
> }
>
> -static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
> +static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
> + unsigned long flags)
> {
> - return (pte->val & DMA_SL_PTE_DIRTY) != 0;
> -}
> + if (flags & IOMMU_DIRTY_NO_CLEAR)
> + return (pte->val & DMA_SL_PTE_DIRTY) != 0;
>
> -static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
> -{
> return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
> (unsigned long *)&pte->val);
> }
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 03814942d59c..785384a59d55 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -686,15 +686,29 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu
> *iommu,
>
> spin_lock(&iommu->lock);
>
> - did = domain_id_iommu(domain, iommu);
> pte = intel_pasid_get_entry(dev, pasid);
> if (!pte) {
> spin_unlock(&iommu->lock);
> - dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
> + dev_err_ratelimited(dev,
> + "Failed to get pasid entry of PASID %d\n",
> + pasid);
> return -ENODEV;
> }
>
> + did = domain_id_iommu(domain, iommu);
> pgtt = pasid_pte_get_pgtt(pte);
> + if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && pgtt != PASID_ENTRY_PGTT_NESTED) {
> + spin_unlock(&iommu->lock);
> + dev_err_ratelimited(dev,
> + "Dirty tracking not supported on translation type %d\n",
> + pgtt);
> + return -EOPNOTSUPP;
> + }
> +
> + if (pasid_get_ssade(pte) == enabled) {
> + spin_unlock(&iommu->lock);
> + return 0;
> + }
>
> if (enabled)
> pasid_set_ssade(pte);
> @@ -702,6 +716,9 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
> pasid_clear_ssade(pte);
> spin_unlock(&iommu->lock);
>
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> /*
> * From VT-d spec table 25 "Guidance to Software for Invalidations":
> *
> @@ -720,8 +737,6 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
>
> if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
> iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> - else
> - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
>
> /* Device IOTLB doesn't need to be flushed in caching mode. */
> if (!cap_caching_mode(iommu->cap))
Others look good to me. Thanks a lot.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 12:49 ` Baolu Lu
@ 2023-10-17 14:19 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:19 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 13:49, Baolu Lu wrote:
> On 2023/10/17 19:22, Joao Martins wrote:
>> On 17/10/2023 03:08, Baolu Lu wrote:
>>> On 10/17/23 12:00 AM, Joao Martins wrote:
>>>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>>>>> need to understand it and check its member anyway.
>>>>>>
>>>>> (...) The iommu driver has no need to understand it.
>>>>> iommu_dirty_bitmap_record()
>>>>> already makes those checks in case there's no iova_bitmap to set bits to.
>>>>>
>>>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
>>>> essentially not record anything in the iova bitmap and just clear the dirty
>>>> bits
>>>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
>>>> internally only when starting dirty tracking, and thus to ensure that we
>>>> cleanup
>>>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
>>>> opposed to inheriting dirties from the past.
>>>
>>> It's okay since it serves a functional purpose. Can you please add some
>>> comments around the code to explain the rationale.
>>>
>>
>> I added this comment below:
>>
>> + /*
>> + * IOMMUFD core calls into a dirty tracking disabled domain without an
>> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
>> + * have occured when we stopped dirty tracking. This ensures that we
>> + * never inherit dirtied bits from a previous cycle.
>> + */
>>
>
> Yes. It's clear. Thank you!
>
>> Also fixed an issue where I could theoretically clear the bit with
>> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
>> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
>>
>> @@ -781,6 +788,16 @@ static inline bool dma_pte_present(struct dma_pte *pte)
>> return (pte->val & 3) != 0;
>> }
>>
>> +static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
>> + unsigned long flags)
>> +{
>> + if (flags & IOMMU_DIRTY_NO_CLEAR)
>> + return (pte->val & DMA_SL_PTE_DIRTY) != 0;
>> +
>> + return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
>> + (unsigned long *)&pte->val);
>> +}
>> +
>
> Yes. Sure.
>
>> Anyhow, see below the full diff compared to this patch. Some things are in tree
>> that is different to submitted from this patch.
>
> [...]
>
>> @@ -4113,7 +4123,7 @@ static void intel_iommu_domain_free(struct iommu_domain
>> *domain)
>> }
>>
>> static int prepare_domain_attach_device(struct iommu_domain *domain,
>> - struct device *dev)
>> + struct device *dev, ioasid_t pasid)
>
> How about blocking pasid attaching in intel_iommu_set_dev_pasid().
>
OK
>> {
>> struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> struct intel_iommu *iommu;
>> @@ -4126,7 +4136,8 @@ static int prepare_domain_attach_device(struct
>> iommu_domain *domain,
>> if (dmar_domain->force_snooping && !ecap_sc_support(iommu->ecap))
>> return -EINVAL;
>>
>> - if (domain->dirty_ops && !intel_iommu_slads_supported(iommu))
>> + if (domain->dirty_ops &&
>> + (!slads_supported(iommu) || pasid != IOMMU_NO_PASID))
>> return -EINVAL;
>>
>> /* check if this iommu agaw is sufficient for max mapped address */
>
> [...]
>
>>
>> @@ -4886,14 +4897,16 @@ static int intel_iommu_read_and_clear_dirty(struct
>> iommu_domain *domain,
>> unsigned long pgsize;
>> bool ad_enabled;
>>
>> - spin_lock(&dmar_domain->lock);
>> + /*
>> + * IOMMUFD core calls into a dirty tracking disabled domain without an
>> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
>> + * have occured when we stopped dirty tracking. This ensures that we
>> + * never inherit dirtied bits from a previous cycle.
>> + */
>> ad_enabled = dmar_domain->dirty_tracking;
>> - spin_unlock(&dmar_domain->lock);
>> -
>> if (!ad_enabled && dirty->bitmap)
>
> How about
> if (!dmar_domain->dirty_tracking && dirty->bitmap)
> return -EINVAL;
> ?
>
OK
>> return -EINVAL;
>>
>> - rcu_read_lock();
>> do {
>> struct dma_pte *pte;
>> int lvl = 0;
>> @@ -4906,14 +4919,10 @@ static int intel_iommu_read_and_clear_dirty(struct
>> iommu_domain *domain,
>> continue;
>> }
>>
>> - /* It is writable, set the bitmap */
>> - if (((flags & IOMMU_DIRTY_NO_CLEAR) &&
>> - dma_sl_pte_dirty(pte)) ||
>> - dma_sl_pte_test_and_clear_dirty(pte))
>> + if (dma_sl_pte_test_and_clear_dirty(pte, flags))
>> iommu_dirty_bitmap_record(dirty, iova, pgsize);
>> iova += pgsize;
>> } while (iova < end);
>> - rcu_read_unlock();
>>
>> return 0;
>> }
>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index bccd44db3316..0b390d9e669b 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -542,6 +542,9 @@ enum {
>> #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
>> #define pasid_supported(iommu) (sm_supported(iommu) && \
>> ecap_pasid((iommu)->ecap))
>> +#define slads_supported(iommu) (sm_supported(iommu) && \
>> + ecap_slads((iommu)->ecap))
>> +
>>
>> struct pasid_entry;
>> struct pasid_state_entry;
>> @@ -785,13 +788,12 @@ static inline bool dma_pte_present(struct dma_pte *pte)
>> return (pte->val & 3) != 0;
>> }
>>
>> -static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
>> +static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte,
>> + unsigned long flags)
>> {
>> - return (pte->val & DMA_SL_PTE_DIRTY) != 0;
>> -}
>> + if (flags & IOMMU_DIRTY_NO_CLEAR)
>> + return (pte->val & DMA_SL_PTE_DIRTY) != 0;
>>
>> -static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
>> -{
>> return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
>> (unsigned long *)&pte->val);
>> }
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 03814942d59c..785384a59d55 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -686,15 +686,29 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu
>> *iommu,
>>
>> spin_lock(&iommu->lock);
>>
>> - did = domain_id_iommu(domain, iommu);
>> pte = intel_pasid_get_entry(dev, pasid);
>> if (!pte) {
>> spin_unlock(&iommu->lock);
>> - dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
>> + dev_err_ratelimited(dev,
>> + "Failed to get pasid entry of PASID %d\n",
>> + pasid);
>> return -ENODEV;
>> }
>>
>> + did = domain_id_iommu(domain, iommu);
>> pgtt = pasid_pte_get_pgtt(pte);
>> + if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && pgtt != PASID_ENTRY_PGTT_NESTED) {
>> + spin_unlock(&iommu->lock);
>> + dev_err_ratelimited(dev,
>> + "Dirty tracking not supported on translation type %d\n",
>> + pgtt);
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + if (pasid_get_ssade(pte) == enabled) {
>> + spin_unlock(&iommu->lock);
>> + return 0;
>> + }
>>
>> if (enabled)
>> pasid_set_ssade(pte);
>> @@ -702,6 +716,9 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu
>> *iommu,
>> pasid_clear_ssade(pte);
>> spin_unlock(&iommu->lock);
>>
>> + if (!ecap_coherent(iommu->ecap))
>> + clflush_cache_range(pte, sizeof(*pte));
>> +
>> /*
>> * From VT-d spec table 25 "Guidance to Software for Invalidations":
>> *
>> @@ -720,8 +737,6 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu
>> *iommu,
>>
>> if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
>> iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
>> - else
>> - qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
>>
>> /* Device IOTLB doesn't need to be flushed in caching mode. */
>> if (!cap_caching_mode(iommu->cap))
>
> Others look good to me. Thanks a lot.
>
Joao
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 11:22 ` Joao Martins
2023-10-17 12:49 ` Baolu Lu
@ 2023-10-17 13:10 ` Jason Gunthorpe
2023-10-17 14:11 ` Joao Martins
1 sibling, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 13:10 UTC (permalink / raw)
To: Joao Martins
Cc: Baolu Lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Tue, Oct 17, 2023 at 12:22:34PM +0100, Joao Martins wrote:
> On 17/10/2023 03:08, Baolu Lu wrote:
> > On 10/17/23 12:00 AM, Joao Martins wrote:
> >>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
> >>>> need to understand it and check its member anyway.
> >>>>
> >>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
> >>> already makes those checks in case there's no iova_bitmap to set bits to.
> >>>
> >> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
> >> essentially not record anything in the iova bitmap and just clear the dirty bits
> >> from the IOPTEs, all when dirty tracking is technically disabled. This is done
> >> internally only when starting dirty tracking, and thus to ensure that we cleanup
> >> all dirty bits before we enable dirty tracking to have a consistent snapshot as
> >> opposed to inheriting dirties from the past.
> >
> > It's okay since it serves a functional purpose. Can you please add some
> > comments around the code to explain the rationale.
> >
>
> I added this comment below:
>
> + /*
> + * IOMMUFD core calls into a dirty tracking disabled domain without an
> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
> + * have occured when we stopped dirty tracking. This ensures that we
> + * never inherit dirtied bits from a previous cycle.
> + */
>
> Also fixed an issue where I could theoretically clear the bit with
> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
How does all this work, does this leak into the uapi? Why would we
want to not clear the dirty bits upon enable/disable if dirty
tracking? I can understand that the driver needs help from the caller
due to the externalized locking, but do we leak this into the userspace?
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 13:10 ` Jason Gunthorpe
@ 2023-10-17 14:11 ` Joao Martins
2023-10-17 15:31 ` Jason Gunthorpe
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-17 14:11 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Baolu Lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 14:10, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 12:22:34PM +0100, Joao Martins wrote:
>> On 17/10/2023 03:08, Baolu Lu wrote:
>>> On 10/17/23 12:00 AM, Joao Martins wrote:
>>>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>>>>> need to understand it and check its member anyway.
>>>>>>
>>>>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
>>>>> already makes those checks in case there's no iova_bitmap to set bits to.
>>>>>
>>>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
>>>> essentially not record anything in the iova bitmap and just clear the dirty bits
>>>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
>>>> internally only when starting dirty tracking, and thus to ensure that we cleanup
>>>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
>>>> opposed to inheriting dirties from the past.
>>>
>>> It's okay since it serves a functional purpose. Can you please add some
>>> comments around the code to explain the rationale.
>>>
>>
>> I added this comment below:
>>
>> + /*
>> + * IOMMUFD core calls into a dirty tracking disabled domain without an
>> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
>> + * have occured when we stopped dirty tracking. This ensures that we
>> + * never inherit dirtied bits from a previous cycle.
>> + */
>>
>> Also fixed an issue where I could theoretically clear the bit with
>> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
>> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
>
> How does all this work, does this leak into the uapi?
UAPI is only ever expected to collect/clear dirty bits while dirty tracking is
enabled. And it requires valid bitmaps before it gets to the IOMMU driver.
The above where I pass no dirty::bitmap (but with an iotlb_gather) is internal
usage only. Open to alternatives if this is prone to audit errors e.g. 1) via
the iommu_dirty_bitmap structure, where I add one field which if true then
iommufd core is able to call into iommu driver on a "clear IOPTE" manner or 2)
via the ::flags ... the thing is that ::flags values is UAPI, so it feels weird
to use these flags for internal purposes.
With respect to IOMMU_NO_CLEAR that is UAPI (a flag in read-and-clear) where the
user fetches bits, but does want to clear the hw IOPTE dirty bit (to avoid the
TLB flush).
> Why would we
> want to not clear the dirty bits upon enable/disable if dirty
> tracking?
For the unmap-and-read-dirty case. Where you unmap, and want to get the dirty
bits, but you don't care to clear them as you will be unmapping. But my comment
above is not about that btw. It is just my broken check where either I test for
dirty or test-and-clear. That's it.
> I can understand that the driver needs help from the caller
> due to the externalized locking, but do we leak this into the userspace?
AFAICT no for the first. The IOMMU_NO_CLEAR is UAPI. I take it you were talking
about the first.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 14:11 ` Joao Martins
@ 2023-10-17 15:31 ` Jason Gunthorpe
2023-10-17 15:54 ` Joao Martins
0 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-17 15:31 UTC (permalink / raw)
To: Joao Martins
Cc: Baolu Lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On Tue, Oct 17, 2023 at 03:11:46PM +0100, Joao Martins wrote:
> On 17/10/2023 14:10, Jason Gunthorpe wrote:
> > On Tue, Oct 17, 2023 at 12:22:34PM +0100, Joao Martins wrote:
> >> On 17/10/2023 03:08, Baolu Lu wrote:
> >>> On 10/17/23 12:00 AM, Joao Martins wrote:
> >>>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
> >>>>>> need to understand it and check its member anyway.
> >>>>>>
> >>>>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
> >>>>> already makes those checks in case there's no iova_bitmap to set bits to.
> >>>>>
> >>>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
> >>>> essentially not record anything in the iova bitmap and just clear the dirty bits
> >>>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
> >>>> internally only when starting dirty tracking, and thus to ensure that we cleanup
> >>>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
> >>>> opposed to inheriting dirties from the past.
> >>>
> >>> It's okay since it serves a functional purpose. Can you please add some
> >>> comments around the code to explain the rationale.
> >>>
> >>
> >> I added this comment below:
> >>
> >> + /*
> >> + * IOMMUFD core calls into a dirty tracking disabled domain without an
> >> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
> >> + * have occured when we stopped dirty tracking. This ensures that we
> >> + * never inherit dirtied bits from a previous cycle.
> >> + */
> >>
> >> Also fixed an issue where I could theoretically clear the bit with
> >> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
> >> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
> >
> > How does all this work, does this leak into the uapi?
>
> UAPI is only ever expected to collect/clear dirty bits while dirty tracking is
> enabled. And it requires valid bitmaps before it gets to the IOMMU driver.
>
> The above where I pass no dirty::bitmap (but with an iotlb_gather) is internal
> usage only. Open to alternatives if this is prone to audit errors e.g. 1) via
> the iommu_dirty_bitmap structure, where I add one field which if true then
> iommufd core is able to call into iommu driver on a "clear IOPTE" manner or 2)
> via the ::flags ... the thing is that ::flags values is UAPI, so it feels weird
> to use these flags for internal purposes.
I think NULL to mean clear but not record is OK, it doesn't matter too
much but ideally this would be sort of hidden in the iova APIs..
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-17 15:31 ` Jason Gunthorpe
@ 2023-10-17 15:54 ` Joao Martins
0 siblings, 0 replies; 140+ messages in thread
From: Joao Martins @ 2023-10-17 15:54 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Baolu Lu, iommu, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 17/10/2023 16:31, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 03:11:46PM +0100, Joao Martins wrote:
>> On 17/10/2023 14:10, Jason Gunthorpe wrote:
>>> On Tue, Oct 17, 2023 at 12:22:34PM +0100, Joao Martins wrote:
>>>> On 17/10/2023 03:08, Baolu Lu wrote:
>>>>> On 10/17/23 12:00 AM, Joao Martins wrote:
>>>>>>>> The iommu_dirty_bitmap is defined in iommu core. The iommu driver has no
>>>>>>>> need to understand it and check its member anyway.
>>>>>>>>
>>>>>>> (...) The iommu driver has no need to understand it. iommu_dirty_bitmap_record()
>>>>>>> already makes those checks in case there's no iova_bitmap to set bits to.
>>>>>>>
>>>>>> This is all true but the reason I am checking iommu_dirty_bitmap::bitmap is to
>>>>>> essentially not record anything in the iova bitmap and just clear the dirty bits
>>>>>> from the IOPTEs, all when dirty tracking is technically disabled. This is done
>>>>>> internally only when starting dirty tracking, and thus to ensure that we cleanup
>>>>>> all dirty bits before we enable dirty tracking to have a consistent snapshot as
>>>>>> opposed to inheriting dirties from the past.
>>>>>
>>>>> It's okay since it serves a functional purpose. Can you please add some
>>>>> comments around the code to explain the rationale.
>>>>>
>>>>
>>>> I added this comment below:
>>>>
>>>> + /*
>>>> + * IOMMUFD core calls into a dirty tracking disabled domain without an
>>>> + * IOVA bitmap set in order to clean dirty bits in all PTEs that might
>>>> + * have occured when we stopped dirty tracking. This ensures that we
>>>> + * never inherit dirtied bits from a previous cycle.
>>>> + */
>>>>
>>>> Also fixed an issue where I could theoretically clear the bit with
>>>> IOMMU_NO_CLEAR. Essentially passed the read_and_clear_dirty flags and let
>>>> dma_sl_pte_test_and_clear_dirty() to test and test-and-clear, similar to AMD:
>>>
>>> How does all this work, does this leak into the uapi?
>>
>> UAPI is only ever expected to collect/clear dirty bits while dirty tracking is
>> enabled. And it requires valid bitmaps before it gets to the IOMMU driver.
>>
>> The above where I pass no dirty::bitmap (but with an iotlb_gather) is internal
>> usage only. Open to alternatives if this is prone to audit errors e.g. 1) via
>> the iommu_dirty_bitmap structure, where I add one field which if true then
>> iommufd core is able to call into iommu driver on a "clear IOPTE" manner or 2)
>> via the ::flags ... the thing is that ::flags values is UAPI, so it feels weird
>> to use these flags for internal purposes.
>
> I think NULL to mean clear but not record is OK, it doesn't matter too
> much but ideally this would be sort of hidden in the iova APIs..
And it is hidden? unless you mean by hidden that there's explicit IOVA APIs that
do this?
Currently, iopt_clear_dirty_data() does this 'internal-only' usage of
iommu_read_and_clear() and does this stuff.
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
` (3 preceding siblings ...)
2023-10-16 2:07 ` Baolu Lu
@ 2023-10-16 2:21 ` Baolu Lu
2023-10-16 11:39 ` Joao Martins
4 siblings, 1 reply; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 2:21 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 9/23/23 9:25 AM, Joao Martins wrote:
[...]
> +/*
> + * Set up dirty tracking on a second only translation type.
Set up dirty tracking on a second only or nested translation type.
> + */
> +int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
> + struct dmar_domain *domain,
> + struct device *dev, u32 pasid,
> + bool enabled)
> +{
> + struct pasid_entry *pte;
> + u16 did, pgtt;
> +
> + spin_lock(&iommu->lock);
> +
> + did = domain_id_iommu(domain, iommu);
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (!pte) {
> + spin_unlock(&iommu->lock);
> + dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
Use dev_err_ratelimited() to avoid user DOS attack.
> + return -ENODEV;
> + }
Can we add a check to limit this interface to second-only and nested
translation types? These are the only valid use cases currently and for
the foreseeable future.
And, return directly if the pasid bit matches the target state.
[...]
spin_lock(&iommu->lock);
pte = intel_pasid_get_entry(dev, pasid);
if (!pte) {
spin_unlock(&iommu->lock);
dev_err_ratelimited(dev, "Failed to get pasid entry of
PASID %d\n", pasid);
return -ENODEV;
}
did = domain_id_iommu(domain, iommu);
pgtt = pasid_pte_get_pgtt(pte);
if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && pgtt !=
PASID_ENTRY_PGTT_NESTED) {
spin_unlock(&iommu->lock);
dev_err_ratelimited(dev,
"Dirty tracking not supported on
translation type %d\n",
pgtt);
return -EOPNOTSUPP;
}
if (pasid_get_ssade(pte) == enabled) {
spin_unlock(&iommu->lock);
return 0;
}
if (enabled)
pasid_set_ssade(pte);
else
pasid_clear_ssade(pte);
spin_unlock(&iommu->lock);
[...]
> +
> + pgtt = pasid_pte_get_pgtt(pte);
> +
> + if (enabled)
> + pasid_set_ssade(pte);
> + else
> + pasid_clear_ssade(pte);
> + spin_unlock(&iommu->lock);
Add below here:
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(pte, sizeof(*pte));
> +
> + /*
> + * From VT-d spec table 25 "Guidance to Software for Invalidations":
> + *
> + * - PASID-selective-within-Domain PASID-cache invalidation
> + * If (PGTT=SS or Nested)
> + * - Domain-selective IOTLB invalidation
> + * Else
> + * - PASID-selective PASID-based IOTLB invalidation
> + * - If (pasid is RID_PASID)
> + * - Global Device-TLB invalidation to affected functions
> + * Else
> + * - PASID-based Device-TLB invalidation (with S=1 and
> + * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
> + */
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> +
> + if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
> + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
> + else
> + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
Only "Domain-selective IOTLB invalidation" is needed here.
> +
> + /* Device IOTLB doesn't need to be flushed in caching mode. */
> + if (!cap_caching_mode(iommu->cap))
> + devtlb_invalidation_with_pasid(iommu, dev, pasid);
For the device IOTLB invalidation, need to follow what spec requires.
If (pasid is RID_PASID)
- Global Device-TLB invalidation to affected functions
Else
- PASID-based Device-TLB invalidation (with S=1 and
Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
> +
> + return 0;
> +}
> +
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 2:21 ` Baolu Lu
@ 2023-10-16 11:39 ` Joao Martins
2023-10-16 13:06 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-16 11:39 UTC (permalink / raw)
To: Baolu Lu, iommu
Cc: Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi, Yi Liu,
Yi Y Sun, Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit,
Will Deacon, Robin Murphy, Alex Williamson, kvm
On 16/10/2023 03:21, Baolu Lu wrote:
> On 9/23/23 9:25 AM, Joao Martins wrote:
> [...]
>> +/*
>> + * Set up dirty tracking on a second only translation type.
>
> Set up dirty tracking on a second only or nested translation type.
>
>> + */
>> +int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
>> + struct dmar_domain *domain,
>> + struct device *dev, u32 pasid,
>> + bool enabled)
>> +{
>> + struct pasid_entry *pte;
>> + u16 did, pgtt;
>> +
>> + spin_lock(&iommu->lock);
>> +
>> + did = domain_id_iommu(domain, iommu);
>> + pte = intel_pasid_get_entry(dev, pasid);
>> + if (!pte) {
>> + spin_unlock(&iommu->lock);
>> + dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
>
> Use dev_err_ratelimited() to avoid user DOS attack.
>
OK
>> + return -ENODEV;
>> + }
>
> Can we add a check to limit this interface to second-only and nested
> translation types? These are the only valid use cases currently and for
> the foreseeable future.
>
OK.
> And, return directly if the pasid bit matches the target state.
>
OK
> [...]
> spin_lock(&iommu->lock);
> pte = intel_pasid_get_entry(dev, pasid);
> if (!pte) {
> spin_unlock(&iommu->lock);
> dev_err_ratelimited(dev, "Failed to get pasid entry of PASID
> %d\n", pasid);
> return -ENODEV;
> }
>
> did = domain_id_iommu(domain, iommu);
> pgtt = pasid_pte_get_pgtt(pte);
>
> if (pgtt != PASID_ENTRY_PGTT_SL_ONLY && pgtt != PASID_ENTRY_PGTT_NESTED) {
> spin_unlock(&iommu->lock);
> dev_err_ratelimited(dev,
> "Dirty tracking not supported on translation
> type %d\n",
> pgtt);
> return -EOPNOTSUPP;
> }
>
> if (pasid_get_ssade(pte) == enabled) {
> spin_unlock(&iommu->lock);
> return 0;
> }
>
> if (enabled)
> pasid_set_ssade(pte);
> else
> pasid_clear_ssade(pte);
> spin_unlock(&iommu->lock);
> [...]
>
OK
>> +
>> + pgtt = pasid_pte_get_pgtt(pte);
>> +
>> + if (enabled)
>> + pasid_set_ssade(pte);
>> + else
>> + pasid_clear_ssade(pte);
>> + spin_unlock(&iommu->lock);
>
>
> Add below here:
>
> if (!ecap_coherent(iommu->ecap))
> clflush_cache_range(pte, sizeof(*pte));
>
Got it
>> +
>> + /*
>> + * From VT-d spec table 25 "Guidance to Software for Invalidations":
>> + *
>> + * - PASID-selective-within-Domain PASID-cache invalidation
>> + * If (PGTT=SS or Nested)
>> + * - Domain-selective IOTLB invalidation
>> + * Else
>> + * - PASID-selective PASID-based IOTLB invalidation
>> + * - If (pasid is RID_PASID)
>> + * - Global Device-TLB invalidation to affected functions
>> + * Else
>> + * - PASID-based Device-TLB invalidation (with S=1 and
>> + * Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
>> + */
>> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
>> +
>> + if (pgtt == PASID_ENTRY_PGTT_SL_ONLY || pgtt == PASID_ENTRY_PGTT_NESTED)
>> + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
>> + else
>> + qi_flush_piotlb(iommu, did, pasid, 0, -1, 0);
>
> Only "Domain-selective IOTLB invalidation" is needed here.
>
Will delete the qi_flush_piotlb() then.
>> +
>> + /* Device IOTLB doesn't need to be flushed in caching mode. */
>> + if (!cap_caching_mode(iommu->cap))
>> + devtlb_invalidation_with_pasid(iommu, dev, pasid);
>
> For the device IOTLB invalidation, need to follow what spec requires.
>
> If (pasid is RID_PASID)
> - Global Device-TLB invalidation to affected functions
> Else
> - PASID-based Device-TLB invalidation (with S=1 and
> Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
>
devtlb_invalidation_with_pasid() underneath does:
if (pasid == PASID_RID2PASID)
qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
else
qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
... Which is what the spec suggests (IIUC).
Should I read your comment above as to drop the cap_caching_mode(iommu->cap) ?
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains
2023-10-16 11:39 ` Joao Martins
@ 2023-10-16 13:06 ` Baolu Lu
0 siblings, 0 replies; 140+ messages in thread
From: Baolu Lu @ 2023-10-16 13:06 UTC (permalink / raw)
To: Joao Martins, iommu
Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Shameerali Kolothum Thodi,
Yi Liu, Yi Y Sun, Nicolin Chen, Joerg Roedel,
Suravee Suthikulpanit, Will Deacon, Robin Murphy, Alex Williamson,
kvm
On 2023/10/16 19:39, Joao Martins wrote:
>>> + /* Device IOTLB doesn't need to be flushed in caching mode. */
>>> + if (!cap_caching_mode(iommu->cap))
>>> + devtlb_invalidation_with_pasid(iommu, dev, pasid);
>> For the device IOTLB invalidation, need to follow what spec requires.
>>
>> If (pasid is RID_PASID)
>> - Global Device-TLB invalidation to affected functions
>> Else
>> - PASID-based Device-TLB invalidation (with S=1 and
>> Addr[63:12]=0x7FFFFFFF_FFFFF) to affected functions
>>
> devtlb_invalidation_with_pasid() underneath does:
>
> if (pasid == PASID_RID2PASID)
> qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
> else
> qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
>
> ... Which is what the spec suggests (IIUC).
Ah! I overlooked this. Sorry about it.
> Should I read your comment above as to drop the cap_caching_mode(iommu->cap) ?
No. Your code is fine.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread
* RE: [PATCH v3 00/19] IOMMUFD Dirty Tracking
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (18 preceding siblings ...)
2023-09-23 1:25 ` [PATCH v3 19/19] iommu/intel: Access/Dirty bit support for SL domains Joao Martins
@ 2023-09-26 8:58 ` Shameerali Kolothum Thodi
2023-10-13 16:29 ` Jason Gunthorpe
20 siblings, 0 replies; 140+ messages in thread
From: Shameerali Kolothum Thodi @ 2023-09-26 8:58 UTC (permalink / raw)
To: Joao Martins, iommu@lists.linux.dev
Cc: Jason Gunthorpe, Kevin Tian, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit, Will Deacon,
Robin Murphy, Alex Williamson, kvm@vger.kernel.org
> -----Original Message-----
> From: Joao Martins [mailto:joao.m.martins@oracle.com]
> Sent: 23 September 2023 02:25
> To: iommu@lists.linux.dev
> Cc: Jason Gunthorpe <jgg@nvidia.com>; Kevin Tian <kevin.tian@intel.com>;
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; Lu
> Baolu <baolu.lu@linux.intel.com>; Yi Liu <yi.l.liu@intel.com>; Yi Y Sun
> <yi.y.sun@intel.com>; Nicolin Chen <nicolinc@nvidia.com>; Joerg Roedel
> <joro@8bytes.org>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Will Deacon <will@kernel.org>; Robin
> Murphy <robin.murphy@arm.com>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Joao Martins
> <joao.m.martins@oracle.com>
> Subject: [PATCH v3 00/19] IOMMUFD Dirty Tracking
>
[...]
> For ARM-SMMU-v3 I've made adjustments from the RFCv2 but staged this
> into a branch[6] with all the changes but didn't include here as I can't
> test this besides compilation. Shameer, if you can pick up as chatted
> sometime ago it would be great as you have the hardware. Note that
> it depends on some patches from Nicolin for hw_info() and
> domain_alloc_user() base support coming from his nesting work.
Thanks Joao. Sure will do.
Shameer
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 00/19] IOMMUFD Dirty Tracking
2023-09-23 1:24 [PATCH v3 00/19] IOMMUFD Dirty Tracking Joao Martins
` (19 preceding siblings ...)
2023-09-26 8:58 ` [PATCH v3 00/19] IOMMUFD Dirty Tracking Shameerali Kolothum Thodi
@ 2023-10-13 16:29 ` Jason Gunthorpe
2023-10-13 18:11 ` Joao Martins
20 siblings, 1 reply; 140+ messages in thread
From: Jason Gunthorpe @ 2023-10-13 16:29 UTC (permalink / raw)
To: Joao Martins, Kevin Tian
Cc: iommu, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Suravee Suthikulpanit, Will Deacon,
Robin Murphy, Alex Williamson, kvm
On Sat, Sep 23, 2023 at 02:24:52AM +0100, Joao Martins wrote:
> Joao Martins (19):
> vfio/iova_bitmap: Export more API symbols
> vfio: Move iova_bitmap into iommu core
> iommu: Add iommu_domain ops for dirty tracking
> iommufd: Add a flag to enforce dirty tracking on attach
> iommufd/selftest: Expand mock_domain with dev_flags
> iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
> iommufd: Dirty tracking data support
> iommufd: Add IOMMU_HWPT_SET_DIRTY
> iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY
> iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
> iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA
> iommufd: Add capabilities to IOMMU_GET_HW_INFO
> iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
> iommufd: Add a flag to skip clearing of IOPTE dirty
> iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag
> iommu/amd: Add domain_alloc_user based domain allocation
> iommu/amd: Access/Dirty bit support in IOPTEs
> iommu/amd: Print access/dirty bits if supported
> iommu/intel: Access/Dirty bit support for SL domains
I read through this and I'm happy with the design - small points aside
Suggest to fix those and resend ASAP.
Kevin, you should check it too
If either AMD or Intel ack the driver part next week I would take it
this cycle. Otherwise at -rc1.
Also I recommend you push all the selftest to a block of patches at
the end of the series so the core code reads as one chunk. It doesn't
seem as large that way :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 140+ messages in thread* Re: [PATCH v3 00/19] IOMMUFD Dirty Tracking
2023-10-13 16:29 ` Jason Gunthorpe
@ 2023-10-13 18:11 ` Joao Martins
2023-10-14 7:53 ` Baolu Lu
0 siblings, 1 reply; 140+ messages in thread
From: Joao Martins @ 2023-10-13 18:11 UTC (permalink / raw)
To: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Vasant Hegde
Cc: iommu, Shameerali Kolothum Thodi, Lu Baolu, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 13/10/2023 17:29, Jason Gunthorpe wrote:
> On Sat, Sep 23, 2023 at 02:24:52AM +0100, Joao Martins wrote:
>
>> Joao Martins (19):
>> vfio/iova_bitmap: Export more API symbols
>> vfio: Move iova_bitmap into iommu core
>> iommu: Add iommu_domain ops for dirty tracking
>> iommufd: Add a flag to enforce dirty tracking on attach
>> iommufd/selftest: Expand mock_domain with dev_flags
>> iommufd/selftest: Test IOMMU_HWPT_ALLOC_ENFORCE_DIRTY
>> iommufd: Dirty tracking data support
>> iommufd: Add IOMMU_HWPT_SET_DIRTY
>> iommufd/selftest: Test IOMMU_HWPT_SET_DIRTY
>> iommufd: Add IOMMU_HWPT_GET_DIRTY_IOVA
>> iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_IOVA
>> iommufd: Add capabilities to IOMMU_GET_HW_INFO
>> iommufd/selftest: Test out_capabilities in IOMMU_GET_HW_INFO
>> iommufd: Add a flag to skip clearing of IOPTE dirty
>> iommufd/selftest: Test IOMMU_GET_DIRTY_IOVA_NO_CLEAR flag
>> iommu/amd: Add domain_alloc_user based domain allocation
>> iommu/amd: Access/Dirty bit support in IOPTEs
>> iommu/amd: Print access/dirty bits if supported
>> iommu/intel: Access/Dirty bit support for SL domains
>
> I read through this and I'm happy with the design - small points aside
>
Great!
> Suggest to fix those and resend ASAP.
>
> Kevin, you should check it too
>
> If either AMD or Intel ack the driver part next week I would take it
> this cycle. Otherwise at -rc1.
>
FWIW, I feel more confident on the AMD parts as they have been exercised on real
hardware.
Suravee, Vasant, if you could take a look at the AMD driver patches -- you
looked at a past revision (RFCv1) and provided comments but while I took the
comments I didn't get Suravee's ACK as things were in flux on the UAPI side. But
it looks that v4 won't change much of the drivers
> Also I recommend you push all the selftest to a block of patches at
> the end of the series so the core code reads as one chunk. It doesn't
> seem as large that way :)
>
Ah OK, interesting -- good to know, I can move to the end. I thought the desired
way (for reviewing purpose) was to put test right after, such that the reviewer
has it fresh while looking at the test code
^ permalink raw reply [flat|nested] 140+ messages in thread
* Re: [PATCH v3 00/19] IOMMUFD Dirty Tracking
2023-10-13 18:11 ` Joao Martins
@ 2023-10-14 7:53 ` Baolu Lu
0 siblings, 0 replies; 140+ messages in thread
From: Baolu Lu @ 2023-10-14 7:53 UTC (permalink / raw)
To: Joao Martins, Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit,
Vasant Hegde
Cc: baolu.lu, iommu, Shameerali Kolothum Thodi, Yi Liu, Yi Y Sun,
Nicolin Chen, Joerg Roedel, Will Deacon, Robin Murphy,
Alex Williamson, kvm
On 2023/10/14 2:11, Joao Martins wrote:
>> If either AMD or Intel ack the driver part next week I would take it
>> this cycle. Otherwise at -rc1.
>>
> FWIW, I feel more confident on the AMD parts as they have been exercised on real
> hardware.
>
> Suravee, Vasant, if you could take a look at the AMD driver patches -- you
> looked at a past revision (RFCv1) and provided comments but while I took the
> comments I didn't get Suravee's ACK as things were in flux on the UAPI side. But
> it looks that v4 won't change much of the drivers
>
I will also take a look at the Intel driver part.
Best regards,
baolu
^ permalink raw reply [flat|nested] 140+ messages in thread