All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
@ 2024-02-02 13:34 Joao Martins
  2024-02-02 13:34 ` [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access Joao Martins
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Hey,

This relatively small series fixes a couple of bugs while extending the
selftests to test it all and one small improvement for in iova bitmap
number of iterations.

The main bug is that we miss marking part of the required dirty bits in
the migration stream when we record hugepage PTE dirty pages on a lower
page size bitmap. This is due to how we iterate IOVA bitmap ranges (see
patch 4) and that the end of the range might not be aligned to the huge
page size. As I tried to find a reproducer[0] (which were very hard in
my setup) fixed up other things around it related to the same subject.

I have a separate set of changes, that proved also instrumental in
reproducing some of these corner cases in a more predictable manner but
with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
without relying on VF specific behaviour. Though, it relies on being
able to map IOVAs as dirty. This is not included in this series, as I
am not sure yet if such direction would make sense (outside of testing).
Comments, appreciated.

The selftests in this series (a large majority of the series changes)
are meant to be kept in this order to be more clear in history that the
selftests exercise a given fix added earlier in the series.
The last patch is an efficiency improvement, but can act as workaround if
the IOVA ranges are smaller than 64G, which is why it is placed last.

[I think this should be targetted for -rc rather than -next]

Regards,
	Joao

[0] Using latest qemu draft here: https://github.com/jpemartins/qemu/commits/iommufd-v4

Joao Martins (9):
  iommufd/iova_bitmap: Bounds check mapped::pages access
  iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
  iommufd/selftest: Test u64 unaligned bitmaps
  iommufd/iova_bitmap: Handle recording beyond the mapped pages
  iommufd/selftest: Refactor dirty bitmap tests
  iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
  iommufd/selftest: Hugepage mock domain support
  iommufd/selftest: Add mock IO hugepages tests
  iommufd/iova_bitmap: Consider page offset for the pages to be pinned

 drivers/iommu/iommufd/iommufd_test.h          |  1 +
 drivers/iommu/iommufd/iova_bitmap.c           | 68 +++++++++++++---
 drivers/iommu/iommufd/selftest.c              | 79 ++++++++++++++-----
 tools/testing/selftests/iommu/iommufd.c       | 78 ++++++++++++++----
 tools/testing/selftests/iommu/iommufd_utils.h | 39 +++++----
 5 files changed, 205 insertions(+), 60 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-07 16:49   ` Avihai Horon
  2024-02-02 13:34 ` [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array Joao Martins
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Dirty IOMMU hugepages on a base page page-size granularity can lead to an
attempt to set dirty pages in the bitmap beyond the limits that are pinned.

Bounds check the page index of the array we are trying to access is within
the limits before we kmap() and return otherwise.

While it is also a defensive check, this is also in preparation to defer
setting bits (outside the mapped range) to the next iteration(s) when the
pages become available.

Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iova_bitmap.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index 0a92c9eeaf7f..a3606b4c2229 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -409,6 +409,7 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
 			mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
 	unsigned long last_bit = (((iova + length - 1) - mapped->iova) >>
 			mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
+	unsigned long last_page_idx = mapped->npages - 1;
 
 	do {
 		unsigned int page_idx = cur_bit / BITS_PER_PAGE;
@@ -417,6 +418,9 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
 					 last_bit - cur_bit + 1);
 		void *kaddr;
 
+		if (unlikely(page_idx > last_page_idx))
+			break;
+
 		kaddr = kmap_local_page(mapped->pages[page_idx]);
 		bitmap_set(kaddr, offset, nbits);
 		kunmap_local(kaddr);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
  2024-02-02 13:34 ` [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-07 16:50   ` Avihai Horon
  2024-02-02 13:34 ` [PATCH v1 3/9] iommufd/selftest: Test u64 unaligned bitmaps Joao Martins
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

iova_bitmap_mapped_length() don't deal correctly with the small bitmaps
(< 2M bitmaps) that the starting address isn't u64 aligned, leading to
skipping a tiny part of the IOVA range. This is materialized as not
marking data dirty that should otherwise have been.

Fix that by using a u8* in the internal state of IOVA bitmap. Most
of the data structure uses the type of the bitmap to adjust its indexes,
thus changing the type of the bitmap decreases the granularity of the
bitmap indexes.

Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iova_bitmap.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index a3606b4c2229..9d42ab51a6bb 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -100,7 +100,7 @@ struct iova_bitmap {
 	struct iova_bitmap_map mapped;
 
 	/* userspace address of the bitmap */
-	u64 __user *bitmap;
+	u8 __user *bitmap;
 
 	/* u64 index that @mapped points to */
 	unsigned long mapped_base_index;
@@ -162,7 +162,7 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
 {
 	struct iova_bitmap_map *mapped = &bitmap->mapped;
 	unsigned long npages;
-	u64 __user *addr;
+	u8 __user *addr;
 	long ret;
 
 	/*
@@ -247,7 +247,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
 
 	mapped = &bitmap->mapped;
 	mapped->pgshift = __ffs(page_size);
-	bitmap->bitmap = data;
+	bitmap->bitmap = (u8 __user *)data;
 	bitmap->mapped_total_index =
 		iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
 	bitmap->iova = iova;
@@ -304,7 +304,7 @@ static unsigned long iova_bitmap_mapped_remaining(struct iova_bitmap *bitmap)
 
 	remaining = bitmap->mapped_total_index - bitmap->mapped_base_index;
 	remaining = min_t(unsigned long, remaining,
-			  bytes / sizeof(*bitmap->bitmap));
+			  DIV_ROUND_UP(bytes, sizeof(*bitmap->bitmap)));
 
 	return remaining;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 3/9] iommufd/selftest: Test u64 unaligned bitmaps
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
  2024-02-02 13:34 ` [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access Joao Martins
  2024-02-02 13:34 ` [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages Joao Martins
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Exercise the dirty tracking bitmaps with byte unaligned addresses in
addition to the PAGE_SIZE unaligned bitmaps, using a address towards the
end of the page boundary.

In doing so, increase the tailroom we allocate for the bitmap from
MOCK_PAGE_SIZE(2K) into PAGE_SIZE(4K), such that we can test end of bitmap
boundary.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1a881e7a21d1..49774a720314 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1741,9 +1741,9 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
 	self->bitmap_size =
 		variant->buffer_size / self->page_size / BITS_PER_BYTE;
 
-	/* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */
+	/* Provision with an extra (PAGE_SIZE) for the unaligned case */
 	rc = posix_memalign(&self->bitmap, PAGE_SIZE,
-			    self->bitmap_size + MOCK_PAGE_SIZE);
+			    self->bitmap_size + PAGE_SIZE);
 	assert(!rc);
 	assert(self->bitmap);
 	assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
@@ -1873,6 +1873,13 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
 				self->bitmap + MOCK_PAGE_SIZE,
 				self->bitmap_size, 0, _metadata);
 
+	/* u64 unaligned bitmap */
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START, self->page_size,
+				self->bitmap + 0xff1,
+				self->bitmap_size, 0, _metadata);
+
+
 	test_ioctl_destroy(stddev_id);
 	test_ioctl_destroy(hwpt_id);
 }
@@ -1907,6 +1914,14 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 				IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
 				_metadata);
 
+	/* u64 unaligned bitmap */
+	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+				MOCK_APERTURE_START, self->page_size,
+				self->bitmap + 0xff1,
+				self->bitmap_size,
+				IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
+				_metadata);
+
 	test_ioctl_destroy(stddev_id);
 	test_ioctl_destroy(hwpt_id);
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (2 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 3/9] iommufd/selftest: Test u64 unaligned bitmaps Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-07 16:51   ` Avihai Horon
  2024-05-24 11:22   ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 5/9] iommufd/selftest: Refactor dirty bitmap tests Joao Martins
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

IOVA bitmap is a zerocopy scheme of recording dirty bits that iterate the
different bitmap user pages at chunks of maximum of PAGE_SIZE/sizeof(struct
page*) pages.

When the iterations are split up into 64G, the end of the range may be
broken up in a way that's aligned with a non base page PTE size. This leads
to only part of the huge page being recorded in the bitmap. Note that in
pratice this is only a problem for IOMMU dirty tracking i.e. when the
backing PTEs are in IOMMU hugepages and the bitmap is in base page
granularity. So far this not something that affects VF dirty trackers
(which reports and records at the same granularity).

To fix that, if there is a remainder of bits left to set in which the
current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
iterate-and-set the rest of the bits remaining. Finally, when advancing the
iterator, skip all the bits that were set ahead.

Reported-by: Avihai Horon <avihaih@nvidia.com>
Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains")
Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iova_bitmap.c | 43 +++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index 9d42ab51a6bb..b370e8ee8866 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -113,6 +113,9 @@ struct iova_bitmap {
 
 	/* length of the IOVA range for the whole bitmap */
 	size_t length;
+
+	/* length of the IOVA range set ahead the pinned pages */
+	unsigned long set_ahead_length;
 };
 
 /*
@@ -341,6 +344,32 @@ static bool iova_bitmap_done(struct iova_bitmap *bitmap)
 	return bitmap->mapped_base_index >= bitmap->mapped_total_index;
 }
 
+static int iova_bitmap_set_ahead(struct iova_bitmap *bitmap,
+				 size_t set_ahead_length)
+{
+	int ret = 0;
+
+	while (set_ahead_length > 0 && !iova_bitmap_done(bitmap)) {
+		unsigned long length = iova_bitmap_mapped_length(bitmap);
+		unsigned long iova = iova_bitmap_mapped_iova(bitmap);
+
+		ret = iova_bitmap_get(bitmap);
+		if (ret)
+			break;
+
+		length = min(length, set_ahead_length);
+		iova_bitmap_set(bitmap, iova, length);
+
+		set_ahead_length -= length;
+		bitmap->mapped_base_index +=
+			iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
+		iova_bitmap_put(bitmap);
+	}
+
+	bitmap->set_ahead_length = 0;
+	return ret;
+}
+
 /*
  * Advances to the next range, releases the current pinned
  * pages and pins the next set of bitmap pages.
@@ -357,6 +386,15 @@ static int iova_bitmap_advance(struct iova_bitmap *bitmap)
 	if (iova_bitmap_done(bitmap))
 		return 0;
 
+	/* Iterate, set and skip any bits requested for next iteration */
+	if (bitmap->set_ahead_length) {
+		int ret;
+
+		ret = iova_bitmap_set_ahead(bitmap, bitmap->set_ahead_length);
+		if (ret)
+			return ret;
+	}
+
 	/* When advancing the index we pin the next set of bitmap pages */
 	return iova_bitmap_get(bitmap);
 }
@@ -426,5 +464,10 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
 		kunmap_local(kaddr);
 		cur_bit += nbits;
 	} while (cur_bit <= last_bit);
+
+	if (unlikely(cur_bit <= last_bit)) {
+		bitmap->set_ahead_length =
+			((last_bit - cur_bit + 1) << bitmap->mapped.pgshift);
+	}
 }
 EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 5/9] iommufd/selftest: Refactor dirty bitmap tests
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (3 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 6/9] iommufd/selftest: Refactor mock_domain_read_and_clear_dirty() Joao Martins
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Rework the functions that test and set the bitmaps to receive a new
parameter (the pte_page_size) that reflects the expected PTE size in the
page tables. The same scheme is still used i.e. even bits are dirty and odd
page indexes aren't dirty. Here it just refactors to consider the size of
the PTE rather than hardcoded to IOMMU mock base page assumptions.

While at it, refactor dirty bitmap tests to use the idev_id created by the
fixture instead of creating a new one.

This is in preparation for doing tests with IOMMU hugepages where multiple
bits set as part of recording a whole hugepage as dirty and thus the
pte_page_size will vary depending on io hugepages or io base pages.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 28 ++++++-------
 tools/testing/selftests/iommu/iommufd_utils.h | 39 ++++++++++++-------
 2 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 49774a720314..56c3e511a0ab 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1849,7 +1849,7 @@ TEST_F(iommufd_dirty_tracking, device_dirty_capability)
 
 TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
 {
-	uint32_t stddev_id;
+	uint32_t page_size = MOCK_PAGE_SIZE;
 	uint32_t hwpt_id;
 	uint32_t ioas_id;
 
@@ -1859,34 +1859,31 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
 
 	test_cmd_hwpt_alloc(self->idev_id, ioas_id,
 			    IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
-	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
 
 	test_cmd_set_dirty_tracking(hwpt_id, true);
 
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
+				MOCK_APERTURE_START, self->page_size, page_size,
 				self->bitmap, self->bitmap_size, 0, _metadata);
 
 	/* PAGE_SIZE unaligned bitmap */
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
+				MOCK_APERTURE_START, self->page_size, page_size,
 				self->bitmap + MOCK_PAGE_SIZE,
 				self->bitmap_size, 0, _metadata);
 
 	/* u64 unaligned bitmap */
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
-				self->bitmap + 0xff1,
-				self->bitmap_size, 0, _metadata);
-
+				MOCK_APERTURE_START, self->page_size, page_size,
+				self->bitmap + 0xff1, self->bitmap_size, 0,
+				_metadata);
 
-	test_ioctl_destroy(stddev_id);
 	test_ioctl_destroy(hwpt_id);
 }
 
 TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 {
-	uint32_t stddev_id;
+	uint32_t page_size = MOCK_PAGE_SIZE;
 	uint32_t hwpt_id;
 	uint32_t ioas_id;
 
@@ -1896,19 +1893,18 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 
 	test_cmd_hwpt_alloc(self->idev_id, ioas_id,
 			    IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
-	test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
 
 	test_cmd_set_dirty_tracking(hwpt_id, true);
 
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
+				MOCK_APERTURE_START, self->page_size, page_size,
 				self->bitmap, self->bitmap_size,
 				IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
 				_metadata);
 
 	/* Unaligned bitmap */
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
+				MOCK_APERTURE_START, self->page_size, page_size,
 				self->bitmap + MOCK_PAGE_SIZE,
 				self->bitmap_size,
 				IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
@@ -1916,13 +1912,11 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 
 	/* u64 unaligned bitmap */
 	test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-				MOCK_APERTURE_START, self->page_size,
-				self->bitmap + 0xff1,
-				self->bitmap_size,
+				MOCK_APERTURE_START, self->page_size, page_size,
+				self->bitmap + 0xff1, self->bitmap_size,
 				IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
 				_metadata);
 
-	test_ioctl_destroy(stddev_id);
 	test_ioctl_destroy(hwpt_id);
 }
 
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index c646264aa41f..8d2b46b2114d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -344,16 +344,19 @@ static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
 						  page_size, bitmap, nr))
 
 static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
-				    __u64 iova, size_t page_size, __u64 *bitmap,
+				    __u64 iova, size_t page_size,
+				    size_t pte_page_size, __u64 *bitmap,
 				    __u64 bitmap_size, __u32 flags,
 				    struct __test_metadata *_metadata)
 {
-	unsigned long i, nbits = bitmap_size * BITS_PER_BYTE;
-	unsigned long nr = nbits / 2;
+	unsigned long npte = pte_page_size / page_size, pteset = 2 * npte;
+	unsigned long nbits = bitmap_size * BITS_PER_BYTE;
+	unsigned long j, i, nr = nbits / pteset ?: 1;
 	__u64 out_dirty = 0;
 
 	/* Mark all even bits as dirty in the mock domain */
-	for (i = 0; i < nbits; i += 2)
+	memset(bitmap, 0, bitmap_size);
+	for (i = 0; i < nbits; i += pteset)
 		set_bit(i, (unsigned long *)bitmap);
 
 	test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size,
@@ -365,8 +368,12 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
 	test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, bitmap,
 				  flags);
 	/* Beware ASSERT_EQ() is two statements -- braces are not redundant! */
-	for (i = 0; i < nbits; i++) {
-		ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *)bitmap));
+	for (i = 0; i < nbits; i += pteset) {
+		for (j = 0; j < pteset; j++) {
+			ASSERT_EQ(j < npte,
+				  test_bit(i + j, (unsigned long *)bitmap));
+		}
+		ASSERT_EQ(!(i % pteset), test_bit(i, (unsigned long *)bitmap));
 	}
 
 	memset(bitmap, 0, bitmap_size);
@@ -374,19 +381,23 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
 				  flags);
 
 	/* It as read already -- expect all zeroes */
-	for (i = 0; i < nbits; i++) {
-		ASSERT_EQ(!(i % 2) && (flags &
-				       IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
-			  test_bit(i, (unsigned long *)bitmap));
+	for (i = 0; i < nbits; i += pteset) {
+		for (j = 0; j < pteset; j++) {
+			ASSERT_EQ(
+				(j < npte) &&
+					(flags &
+					 IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
+				test_bit(i + j, (unsigned long *)bitmap));
+		}
 	}
 
 	return 0;
 }
-#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap,      \
-				bitmap_size, flags, _metadata)                 \
+#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, pte_size,\
+				bitmap, bitmap_size, flags, _metadata)     \
 	ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, length, iova, \
-					      page_size, bitmap, bitmap_size,  \
-					      flags, _metadata))
+					      page_size, pte_size, bitmap,     \
+					      bitmap_size, flags, _metadata))
 
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 6/9] iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (4 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 5/9] iommufd/selftest: Refactor dirty bitmap tests Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 7/9] iommufd/selftest: Hugepage mock domain support Joao Martins
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Move the clearing of the dirty bit of the mock domain into
mock_domain_test_and_clear_dirty() helper, simplifying the caller function.

Additionally, rework the mock_domain_read_and_clear_dirty() loop to iterate
over a potentially variable IO page size. No functional change intended
with the loop refactor.

This is in preparation for dirty tracking support for IOMMU hugepage mock
domains.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/selftest.c | 64 ++++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 2fb2597e069f..182495041407 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -191,6 +191,34 @@ static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
 	return 0;
 }
 
+static bool mock_test_and_clear_dirty(struct mock_iommu_domain *mock,
+				      unsigned long iova, size_t page_size,
+				      unsigned long flags)
+{
+	unsigned long cur, end = iova + page_size - 1;
+	bool dirty = false;
+	void *ent, *old;
+
+	for (cur = iova; cur < end; cur += MOCK_IO_PAGE_SIZE) {
+		ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
+		if (!ent || !(xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA))
+			continue;
+
+		dirty = true;
+		/* Clear dirty */
+		if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
+			unsigned long val;
+
+			val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+			old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
+				       xa_mk_value(val), GFP_KERNEL);
+			WARN_ON_ONCE(ent != old);
+		}
+	}
+
+	return dirty;
+}
+
 static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
 					    unsigned long iova, size_t size,
 					    unsigned long flags,
@@ -198,31 +226,29 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
 {
 	struct mock_iommu_domain *mock =
 		container_of(domain, struct mock_iommu_domain, domain);
-	unsigned long i, max = size / MOCK_IO_PAGE_SIZE;
-	void *ent, *old;
+	unsigned long end = iova + size;
+	void *ent;
 
 	if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap)
 		return -EINVAL;
 
-	for (i = 0; i < max; i++) {
-		unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE;
+	do {
+		unsigned long pgsize = MOCK_IO_PAGE_SIZE;
+		unsigned long head;
 
-		ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
-		if (ent && (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
-			/* Clear dirty */
-			if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
-				unsigned long val;
-
-				val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
-				old = xa_store(&mock->pfns,
-					       cur / MOCK_IO_PAGE_SIZE,
-					       xa_mk_value(val), GFP_KERNEL);
-				WARN_ON_ONCE(ent != old);
-			}
-			iommu_dirty_bitmap_record(dirty, cur,
-						  MOCK_IO_PAGE_SIZE);
+		ent = xa_load(&mock->pfns, iova / MOCK_IO_PAGE_SIZE);
+		if (!ent) {
+			iova += pgsize;
+			continue;
 		}
-	}
+
+		head = iova & ~(pgsize - 1);
+
+		/* Clear dirty */
+		if (mock_test_and_clear_dirty(mock, head, pgsize, flags))
+			iommu_dirty_bitmap_record(dirty, head, pgsize);
+		iova = head + pgsize;
+	} while (iova < end);
 
 	return 0;
 }
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 7/9] iommufd/selftest: Hugepage mock domain support
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (5 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 6/9] iommufd/selftest: Refactor mock_domain_read_and_clear_dirty() Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 8/9] iommufd/selftest: Add mock IO hugepages tests Joao Martins
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Add support to mock iommu hugepages of 1M (for a 2K mock io page size). To
avoid breaking test suite defaults, the way this is done is by explicitly
creating a iommu mock device which has hugepage support (i.e. through
MOCK_FLAGS_DEVICE_HUGE_IOVA).

The same scheme is maintained of mock base page index tracking in the
XArray, except that an extra bit is added to mark it as a hugepage. One
subpage containing the dirty bit, means that the whole hugepage is dirty
(similar to AMD IOMMU non-standard page sizes). For clearing, same thing
applies, and it must clear all dirty subpages.

This is in preparation for dirty tracking to mark mock hugepages as
dirty to exercise all the iova-bitmap fixes.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  1 +
 drivers/iommu/iommufd/selftest.c     | 15 +++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 482d4059f5db..e854d3f67205 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -45,6 +45,7 @@ enum {
 
 enum {
 	MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
+	MOCK_FLAGS_DEVICE_HUGE_IOVA = 1 << 1,
 };
 
 enum {
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 182495041407..066ba92fc069 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -41,6 +41,7 @@ static atomic_t mock_dev_num;
 enum {
 	MOCK_DIRTY_TRACK = 1,
 	MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2,
+	MOCK_HUGE_PAGE_SIZE = 512 * MOCK_IO_PAGE_SIZE,
 
 	/*
 	 * Like a real page table alignment requires the low bits of the address
@@ -53,6 +54,7 @@ enum {
 	MOCK_PFN_START_IOVA = _MOCK_PFN_START,
 	MOCK_PFN_LAST_IOVA = _MOCK_PFN_START,
 	MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1,
+	MOCK_PFN_HUGE_IOVA = _MOCK_PFN_START << 2,
 };
 
 /*
@@ -242,6 +244,8 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
 			continue;
 		}
 
+		if (xa_to_value(ent) & MOCK_PFN_HUGE_IOVA)
+			pgsize = MOCK_HUGE_PAGE_SIZE;
 		head = iova & ~(pgsize - 1);
 
 		/* Clear dirty */
@@ -260,6 +264,7 @@ const struct iommu_dirty_ops dirty_ops = {
 
 static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
 {
+	struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
 	struct mock_iommu_domain *mock;
 
 	mock = kzalloc(sizeof(*mock), GFP_KERNEL);
@@ -268,6 +273,8 @@ static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
 	mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
 	mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
 	mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
+	if (dev && mdev->flags & MOCK_FLAGS_DEVICE_HUGE_IOVA)
+		mock->domain.pgsize_bitmap |= MOCK_HUGE_PAGE_SIZE;
 	mock->domain.ops = mock_ops.default_domain_ops;
 	mock->domain.type = IOMMU_DOMAIN_UNMANAGED;
 	xa_init(&mock->pfns);
@@ -313,7 +320,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
 			return ERR_PTR(-EOPNOTSUPP);
 		if (user_data || (has_dirty_flag && no_dirty_ops))
 			return ERR_PTR(-EOPNOTSUPP);
-		domain = mock_domain_alloc_paging(NULL);
+		domain = mock_domain_alloc_paging(dev);
 		if (!domain)
 			return ERR_PTR(-ENOMEM);
 		if (has_dirty_flag)
@@ -376,6 +383,9 @@ static int mock_domain_map_pages(struct iommu_domain *domain,
 
 			if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
 				flags = MOCK_PFN_LAST_IOVA;
+			if (pgsize != MOCK_IO_PAGE_SIZE) {
+				flags |= MOCK_PFN_HUGE_IOVA;
+			}
 			old = xa_store(&mock->pfns, iova / MOCK_IO_PAGE_SIZE,
 				       xa_mk_value((paddr / MOCK_IO_PAGE_SIZE) |
 						   flags),
@@ -632,7 +642,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	struct mock_dev *mdev;
 	int rc;
 
-	if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY))
+	if (dev_flags &
+	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
 		return ERR_PTR(-EINVAL);
 
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 8/9] iommufd/selftest: Add mock IO hugepages tests
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (6 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 7/9] iommufd/selftest: Hugepage mock domain support Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-02 13:34 ` [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned Joao Martins
  2024-02-06 18:13 ` [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Jason Gunthorpe
  9 siblings, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

Leverage previously added MOCK_FLAGS_DEVICE_HUGE_IOVA flag to create an
IOMMU domain with more than MOCK_IO_PAGE_SIZE supported.

Plumb the hugetlb backing memory for buffer allocation and change the
expected page size to MOCK_HUGE_PAGE_SIZE (1M) when hugepage variant test
cases are used. These so far are limited to 128M and 256M IOVA range tests
cases which is when 1M hugepages can be used.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 tools/testing/selftests/iommu/iommufd.c | 45 +++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 56c3e511a0ab..edf1c99c9936 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -12,6 +12,7 @@
 static unsigned long HUGEPAGE_SIZE;
 
 #define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
+#define MOCK_HUGE_PAGE_SIZE (512 * MOCK_PAGE_SIZE)
 
 static unsigned long get_huge_page_size(void)
 {
@@ -1716,10 +1717,12 @@ FIXTURE(iommufd_dirty_tracking)
 FIXTURE_VARIANT(iommufd_dirty_tracking)
 {
 	unsigned long buffer_size;
+	bool hugepages;
 };
 
 FIXTURE_SETUP(iommufd_dirty_tracking)
 {
+	int mmap_flags;
 	void *vrc;
 	int rc;
 
@@ -1732,9 +1735,17 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
 			   variant->buffer_size, rc);
 	}
 
+	mmap_flags = MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED;
+	if (variant->hugepages) {
+		/*
+		 * MAP_POPULATE will cause the kernel to fail mmap if THPs are
+		 * not available.
+		 */
+		mmap_flags |= MAP_HUGETLB | MAP_POPULATE;
+	}
 	assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
 	vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE,
-		   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+		   mmap_flags, -1, 0);
 	assert(vrc == self->buffer);
 
 	self->page_size = MOCK_PAGE_SIZE;
@@ -1749,8 +1760,16 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
 	assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
 
 	test_ioctl_ioas_alloc(&self->ioas_id);
-	test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id,
-			     &self->idev_id);
+	/* Enable 1M mock IOMMU hugepages */
+	if (variant->hugepages) {
+		test_cmd_mock_domain_flags(self->ioas_id,
+					   MOCK_FLAGS_DEVICE_HUGE_IOVA,
+					   &self->stdev_id, &self->hwpt_id,
+					   &self->idev_id);
+	} else {
+		test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+				     &self->hwpt_id, &self->idev_id);
+	}
 }
 
 FIXTURE_TEARDOWN(iommufd_dirty_tracking)
@@ -1784,12 +1803,26 @@ FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M)
 	.buffer_size = 128UL * 1024UL * 1024UL,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M_huge)
+{
+	/* 4K bitmap (128M IOVA range) */
+	.buffer_size = 128UL * 1024UL * 1024UL,
+	.hugepages = true,
+};
+
 FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M)
 {
 	/* 8K bitmap (256M IOVA range) */
 	.buffer_size = 256UL * 1024UL * 1024UL,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M_huge)
+{
+	/* 8K bitmap (256M IOVA range) */
+	.buffer_size = 256UL * 1024UL * 1024UL,
+	.hugepages = true,
+};
+
 TEST_F(iommufd_dirty_tracking, enforce_dirty)
 {
 	uint32_t ioas_id, stddev_id, idev_id;
@@ -1853,6 +1886,9 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
 	uint32_t hwpt_id;
 	uint32_t ioas_id;
 
+	if (variant->hugepages)
+		page_size = MOCK_HUGE_PAGE_SIZE;
+
 	test_ioctl_ioas_alloc(&ioas_id);
 	test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
 				     variant->buffer_size, MOCK_APERTURE_START);
@@ -1887,6 +1923,9 @@ TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 	uint32_t hwpt_id;
 	uint32_t ioas_id;
 
+	if (variant->hugepages)
+		page_size = MOCK_HUGE_PAGE_SIZE;
+
 	test_ioctl_ioas_alloc(&ioas_id);
 	test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
 				     variant->buffer_size, MOCK_APERTURE_START);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (7 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 8/9] iommufd/selftest: Add mock IO hugepages tests Joao Martins
@ 2024-02-02 13:34 ` Joao Martins
  2024-02-07 16:52   ` Avihai Horon
  2024-02-06 18:13 ` [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Jason Gunthorpe
  9 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2024-02-02 13:34 UTC (permalink / raw)
  To: iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Joao Martins, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

For small bitmaps that aren't PAGE_SIZE aligned *and* that are less than
512 pages in bitmap length, use an extra page to be able to cover the
entire range e.g. [1M..3G] would be iterated more efficiently in a
single iteration, rather than two.

Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/iommu/iommufd/iova_bitmap.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
index b370e8ee8866..db8c46bee155 100644
--- a/drivers/iommu/iommufd/iova_bitmap.c
+++ b/drivers/iommu/iommufd/iova_bitmap.c
@@ -178,18 +178,19 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
 			       bitmap->mapped_base_index) *
 			       sizeof(*bitmap->bitmap), PAGE_SIZE);
 
-	/*
-	 * We always cap at max number of 'struct page' a base page can fit.
-	 * This is, for example, on x86 means 2M of bitmap data max.
-	 */
-	npages = min(npages,  PAGE_SIZE / sizeof(struct page *));
-
 	/*
 	 * Bitmap address to be pinned is calculated via pointer arithmetic
 	 * with bitmap u64 word index.
 	 */
 	addr = bitmap->bitmap + bitmap->mapped_base_index;
 
+	/*
+	 * We always cap at max number of 'struct page' a base page can fit.
+	 * This is, for example, on x86 means 2M of bitmap data max.
+	 */
+	npages = min(npages + !!offset_in_page(addr),
+		     PAGE_SIZE / sizeof(struct page *));
+
 	ret = pin_user_pages_fast((unsigned long)addr, npages,
 				  FOLL_WRITE, mapped->pages);
 	if (ret <= 0)
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
                   ` (8 preceding siblings ...)
  2024-02-02 13:34 ` [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned Joao Martins
@ 2024-02-06 18:13 ` Jason Gunthorpe
  2024-02-06 18:32   ` Joao Martins
  9 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 18:13 UTC (permalink / raw)
  To: Joao Martins
  Cc: iommu, Kevin Tian, Suravee Suthikulpanit, Lu Baolu, Avihai Horon,
	Alex Williamson, Shameerali Kolothum Thodi

On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:

> I have a separate set of changes, that proved also instrumental in
> reproducing some of these corner cases in a more predictable manner but
> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
> without relying on VF specific behaviour. Though, it relies on being
> able to map IOVAs as dirty. This is not included in this series, as I
> am not sure yet if such direction would make sense (outside of testing).
> Comments, appreciated.

I would love to have a test suite that could drive real HW. I think
that would make a big improvement to iommu driver quality.
 
> The selftests in this series (a large majority of the series changes)
> are meant to be kept in this order to be more clear in history that the
> selftests exercise a given fix added earlier in the series.
> The last patch is an efficiency improvement, but can act as workaround if
> the IOVA ranges are smaller than 64G, which is why it is placed last.
> 
> [I think this should be targetted for -rc rather than -next]

OK, I took it for -rc

It is bit unfortunate that the number of huge pages needed to run the
test has increased but it seems necessary to reach all of those
corners...

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-06 18:13 ` [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Jason Gunthorpe
@ 2024-02-06 18:32   ` Joao Martins
  2024-02-06 20:57     ` Joao Martins
  2024-02-07  8:41     ` Tian, Kevin
  0 siblings, 2 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-06 18:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Suravee Suthikulpanit, Lu Baolu, Avihai Horon,
	Alex Williamson, Shameerali Kolothum Thodi

On 06/02/2024 18:13, Jason Gunthorpe wrote:
> On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:
> 
>> I have a separate set of changes, that proved also instrumental in
>> reproducing some of these corner cases in a more predictable manner but
>> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
>> without relying on VF specific behaviour. Though, it relies on being
>> able to map IOVAs as dirty. This is not included in this series, as I
>> am not sure yet if such direction would make sense (outside of testing).
>> Comments, appreciated.
> 
> I would love to have a test suite that could drive real HW. I think
> that would make a big improvement to iommu driver quality.
>  

Ideally we would merge the selftests to use real hw domain rather than mock
domain, and skip what's not supported when hardware lacks. Specially as we add
more and more features (IOPF, DPT, nesting) and underlying testing, it seems the
most natural point of consolidation.

Dirty tracking is a little strange because one needs some sort of block/packet
I/O done to hit the path of having something allocated, which usually requires
some driver operating to mark things dirty plus page allocator luck in picking
the right pages. So the straightforward way to manipulate all this without any
of this complexity on real hardware is this is to "map as dirty" and check if
something is missing. It proved useful in these bugs as I couldn't reproduce
this by just using the thing (even sharing the same test as Avihai). In any
case, the gotcha is consuming a new perm bit.

>> The selftests in this series (a large majority of the series changes)
>> are meant to be kept in this order to be more clear in history that the
>> selftests exercise a given fix added earlier in the series.
>> The last patch is an efficiency improvement, but can act as workaround if
>> the IOVA ranges are smaller than 64G, which is why it is placed last.
>>
>> [I think this should be targetted for -rc rather than -next]
> 
> OK, I took it for -rc
> 
> It is bit unfortunate that the number of huge pages needed to run the
> test has increased but it seems necessary to reach all of those
> corners...
> 

One thing that I have here as TODO is to make sure we check if the necessary
amount of hugepages is set and instead of 'scarying' the user with 'test failed'
we SKIP("...") with some reporting of how many are missing for the test.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-06 18:32   ` Joao Martins
@ 2024-02-06 20:57     ` Joao Martins
  2024-02-07  8:41     ` Tian, Kevin
  1 sibling, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-02-06 20:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Kevin Tian, Suravee Suthikulpanit, Lu Baolu, Avihai Horon,
	Alex Williamson, Shameerali Kolothum Thodi

On 06/02/2024 18:32, Joao Martins wrote:
> On 06/02/2024 18:13, Jason Gunthorpe wrote:
>> On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:
>>
>>> I have a separate set of changes, that proved also instrumental in
>>> reproducing some of these corner cases in a more predictable manner but
>>> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
>>> without relying on VF specific behaviour. Though, it relies on being
>>> able to map IOVAs as dirty. This is not included in this series, as I
>>> am not sure yet if such direction would make sense (outside of testing).
>>> Comments, appreciated.
>>
>> I would love to have a test suite that could drive real HW. I think
>> that would make a big improvement to iommu driver quality.
>>  
> 
> Ideally we would merge the selftests to use real hw domain rather than mock
> domain, and skip what's not supported when hardware lacks. Specially as we add
> more and more features (IOPF, DPT, nesting) and underlying testing, it seems the
> most natural point of consolidation.
> 
> Dirty tracking is a little strange because one needs some sort of block/packet
> I/O done to hit the path of having something allocated, which usually requires
> some driver operating to mark things dirty plus page allocator luck in picking
> the right pages. So the straightforward way to manipulate all this without any
> of this complexity on real hardware is this is to "map as dirty" and check if
> something is missing. It proved useful in these bugs as I couldn't reproduce
> this by just using the thing (even sharing the same test as Avihai). In any
> case, the gotcha is consuming a new perm bit.
> 
>>> The selftests in this series (a large majority of the series changes)
>>> are meant to be kept in this order to be more clear in history that the
>>> selftests exercise a given fix added earlier in the series.
>>> The last patch is an efficiency improvement, but can act as workaround if
>>> the IOVA ranges are smaller than 64G, which is why it is placed last.
>>>
>>> [I think this should be targetted for -rc rather than -next]
>>
>> OK, I took it for -rc
>>

I should note that, I have been testing this series on top of this patch:

https://lore.kernel.org/linux-iommu/20240111073213.180020-1-baolu.lu@linux.intel.com/

.. because otherwise the selftests wouldn't run (though it is all fine in v6.7).
But maybe that got resolved in some other way that I missed?

>> It is bit unfortunate that the number of huge pages needed to run the
>> test has increased but it seems necessary to reach all of those
>> corners...
>>
> 
> One thing that I have here as TODO is to make sure we check if the necessary
> amount of hugepages is set and instead of 'scarying' the user with 'test failed'
> we SKIP("...") with some reporting of how many are missing for the test.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-06 18:32   ` Joao Martins
  2024-02-06 20:57     ` Joao Martins
@ 2024-02-07  8:41     ` Tian, Kevin
  2024-02-07 10:23       ` Joao Martins
  1 sibling, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2024-02-07  8:41 UTC (permalink / raw)
  To: Joao Martins, Jason Gunthorpe
  Cc: iommu@lists.linux.dev, Suravee Suthikulpanit, Lu Baolu,
	Avihai Horon, Alex Williamson, Shameerali Kolothum Thodi

> From: Joao Martins <joao.m.martins@oracle.com>
> Sent: Wednesday, February 7, 2024 2:32 AM
> 
> On 06/02/2024 18:13, Jason Gunthorpe wrote:
> > On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:
> >
> >> I have a separate set of changes, that proved also instrumental in
> >> reproducing some of these corner cases in a more predictable manner but
> >> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
> >> without relying on VF specific behaviour. Though, it relies on being
> >> able to map IOVAs as dirty. This is not included in this series, as I
> >> am not sure yet if such direction would make sense (outside of testing).
> >> Comments, appreciated.
> >
> > I would love to have a test suite that could drive real HW. I think
> > that would make a big improvement to iommu driver quality.
> >
> 
> Ideally we would merge the selftests to use real hw domain rather than mock
> domain, and skip what's not supported when hardware lacks. Specially as we
> add
> more and more features (IOPF, DPT, nesting) and underlying testing, it seems
> the
> most natural point of consolidation.
> 
> Dirty tracking is a little strange because one needs some sort of block/packet
> I/O done to hit the path of having something allocated, which usually

same for iopf

> requires
> some driver operating to mark things dirty plus page allocator luck in picking
> the right pages. So the straightforward way to manipulate all this without any
> of this complexity on real hardware is this is to "map as dirty" and check if
> something is missing. It proved useful in these bugs as I couldn't reproduce
> this by just using the thing (even sharing the same test as Avihai). In any
> case, the gotcha is consuming a new perm bit.
> 

is this something which virtio-iommu may come to help? e.g. virtio-iommu 
backend may be instrumented to fake DMA requests w/o relying on a specific
device to actually touch memory...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-07  8:41     ` Tian, Kevin
@ 2024-02-07 10:23       ` Joao Martins
  2024-02-08  8:21         ` Tian, Kevin
  0 siblings, 1 reply; 22+ messages in thread
From: Joao Martins @ 2024-02-07 10:23 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: iommu@lists.linux.dev, Suravee Suthikulpanit, Lu Baolu,
	Avihai Horon, Alex Williamson, Shameerali Kolothum Thodi

On 07/02/2024 08:41, Tian, Kevin wrote:
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Sent: Wednesday, February 7, 2024 2:32 AM
>>
>> On 06/02/2024 18:13, Jason Gunthorpe wrote:
>>> On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:
>>>
>>>> I have a separate set of changes, that proved also instrumental in
>>>> reproducing some of these corner cases in a more predictable manner but
>>>> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU driver...
>>>> without relying on VF specific behaviour. Though, it relies on being
>>>> able to map IOVAs as dirty. This is not included in this series, as I
>>>> am not sure yet if such direction would make sense (outside of testing).
>>>> Comments, appreciated.
>>>
>>> I would love to have a test suite that could drive real HW. I think
>>> that would make a big improvement to iommu driver quality.
>>>
>>
>> Ideally we would merge the selftests to use real hw domain rather than mock
>> domain, and skip what's not supported when hardware lacks. Specially as we
>> add
>> more and more features (IOPF, DPT, nesting) and underlying testing, it seems
>> the
>> most natural point of consolidation.
>>
>> Dirty tracking is a little strange because one needs some sort of block/packet
>> I/O done to hit the path of having something allocated, which usually
> 
> same for iopf
> 
>> requires
>> some driver operating to mark things dirty plus page allocator luck in picking
>> the right pages. So the straightforward way to manipulate all this without any
>> of this complexity on real hardware is this is to "map as dirty" and check if
>> something is missing. It proved useful in these bugs as I couldn't reproduce
>> this by just using the thing (even sharing the same test as Avihai). In any
>> case, the gotcha is consuming a new perm bit.
>>
> 
> is this something which virtio-iommu may come to help? e.g. virtio-iommu 
> backend may be instrumented to fake DMA requests w/o relying on a specific
> device to actually touch memory...

I would think that you don't need a specific IOMMU model to have VMM do fake DMA
(any model should do I think with the right VMM code). But well but then this
assumes guests, not applicable to baremetal. The issue with doing specific IOMMU
models (besides incomplete functionality in virtio-iommu vs arch one) is that it
is tied to a specific IOMMU driver rather.

Initially I was thinking a small kvm guest created and instrument the VF. Though
then the latter would like take the majority of code in getting there. And it
wouldn't look like a test suite and would likely be better acquainted for
specific cases.

For the purpose of code coverage (of iommu drivers) we could possibly have test
ops that instrument the missing hardware i/o path that a real PCI device would
do, though done in a real hw domain, much like how we do with mock test ops. I
gave the example for MAP as dirty (which I guess we could restrict it to
IOMMUFD_SELFTEST kconfig), I guess for IOPF one would fixup the user fault (much
like how we fault when we pin pages). I have some hopes that the selftest mock
domain adaptation into real hw domain could be done (but still need to attempt
at it once have spare cycles)... that would cover a huge portion in terms of
coverage. And perhaps have a tool for the subset of test cases that require
special handling (dirty-tracking, IOPFs, what else?).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access
  2024-02-02 13:34 ` [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access Joao Martins
@ 2024-02-07 16:49   ` Avihai Horon
  0 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2024-02-07 16:49 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Alex Williamson, Shameerali Kolothum Thodi


On 02/02/2024 15:34, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> Dirty IOMMU hugepages on a base page page-size granularity can lead to an
> attempt to set dirty pages in the bitmap beyond the limits that are pinned.
>
> Bounds check the page index of the array we are trying to access is within
> the limits before we kmap() and return otherwise.
>
> While it is also a defensive check, this is also in preparation to defer
> setting bits (outside the mapped range) to the next iteration(s) when the
> pages become available.
>
> Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Tested-by: Avihai Horon <avihaih@nvidia.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
  2024-02-02 13:34 ` [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array Joao Martins
@ 2024-02-07 16:50   ` Avihai Horon
  0 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2024-02-07 16:50 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Alex Williamson, Shameerali Kolothum Thodi


On 02/02/2024 15:34, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> iova_bitmap_mapped_length() don't deal correctly with the small bitmaps
> (< 2M bitmaps) that the starting address isn't u64 aligned, leading to
> skipping a tiny part of the IOVA range. This is materialized as not
> marking data dirty that should otherwise have been.
>
> Fix that by using a u8* in the internal state of IOVA bitmap. Most
> of the data structure uses the type of the bitmap to adjust its indexes,
> thus changing the type of the bitmap decreases the granularity of the
> bitmap indexes.
>
> Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Tested-by: Avihai Horon <avihaih@nvidia.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages
  2024-02-02 13:34 ` [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages Joao Martins
@ 2024-02-07 16:51   ` Avihai Horon
  2024-05-24 11:22   ` Joao Martins
  1 sibling, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2024-02-07 16:51 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Alex Williamson, Shameerali Kolothum Thodi


On 02/02/2024 15:34, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> IOVA bitmap is a zerocopy scheme of recording dirty bits that iterate the
> different bitmap user pages at chunks of maximum of PAGE_SIZE/sizeof(struct
> page*) pages.
>
> When the iterations are split up into 64G, the end of the range may be
> broken up in a way that's aligned with a non base page PTE size. This leads
> to only part of the huge page being recorded in the bitmap. Note that in
> pratice this is only a problem for IOMMU dirty tracking i.e. when the
> backing PTEs are in IOMMU hugepages and the bitmap is in base page
> granularity. So far this not something that affects VF dirty trackers
> (which reports and records at the same granularity).
>
> To fix that, if there is a remainder of bits left to set in which the
> current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
> iterate-and-set the rest of the bits remaining. Finally, when advancing the
> iterator, skip all the bits that were set ahead.
>
> Reported-by: Avihai Horon <avihaih@nvidia.com>
> Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains")
> Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Tested-by: Avihai Horon <avihaih@nvidia.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned
  2024-02-02 13:34 ` [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned Joao Martins
@ 2024-02-07 16:52   ` Avihai Horon
  0 siblings, 0 replies; 22+ messages in thread
From: Avihai Horon @ 2024-02-07 16:52 UTC (permalink / raw)
  To: Joao Martins, iommu
  Cc: Jason Gunthorpe, Kevin Tian, Suravee Suthikulpanit, Lu Baolu,
	Alex Williamson, Shameerali Kolothum Thodi


On 02/02/2024 15:34, Joao Martins wrote:
> External email: Use caution opening links or attachments
>
>
> For small bitmaps that aren't PAGE_SIZE aligned *and* that are less than
> 512 pages in bitmap length, use an extra page to be able to cover the
> entire range e.g. [1M..3G] would be iterated more efficiently in a
> single iteration, rather than two.
>
> Fixes: b058ea3ab5af ("vfio/iova_bitmap: refactor iova_bitmap_set() to better handle page boundaries")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>

Tested-by: Avihai Horon <avihaih@nvidia.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-07 10:23       ` Joao Martins
@ 2024-02-08  8:21         ` Tian, Kevin
  2024-02-08 17:23           ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2024-02-08  8:21 UTC (permalink / raw)
  To: Joao Martins, Jason Gunthorpe
  Cc: iommu@lists.linux.dev, Suravee Suthikulpanit, Lu Baolu,
	Avihai Horon, Alex Williamson, Shameerali Kolothum Thodi

> From: Joao Martins <joao.m.martins@oracle.com>
> Sent: Wednesday, February 7, 2024 6:24 PM
> 
> On 07/02/2024 08:41, Tian, Kevin wrote:
> >> From: Joao Martins <joao.m.martins@oracle.com>
> >> Sent: Wednesday, February 7, 2024 2:32 AM
> >>
> >> On 06/02/2024 18:13, Jason Gunthorpe wrote:
> >>> On Fri, Feb 02, 2024 at 01:34:06PM +0000, Joao Martins wrote:
> >>>
> >>>> I have a separate set of changes, that proved also instrumental in
> >>>> reproducing some of these corner cases in a more predictable manner
> but
> >>>> with IOMMU drivers instead, rather than IOMMUFD mock IOMMU
> driver...
> >>>> without relying on VF specific behaviour. Though, it relies on being
> >>>> able to map IOVAs as dirty. This is not included in this series, as I
> >>>> am not sure yet if such direction would make sense (outside of testing).
> >>>> Comments, appreciated.
> >>>
> >>> I would love to have a test suite that could drive real HW. I think
> >>> that would make a big improvement to iommu driver quality.
> >>>
> >>
> >> Ideally we would merge the selftests to use real hw domain rather than
> mock
> >> domain, and skip what's not supported when hardware lacks. Specially as
> we
> >> add
> >> more and more features (IOPF, DPT, nesting) and underlying testing, it
> seems
> >> the
> >> most natural point of consolidation.
> >>
> >> Dirty tracking is a little strange because one needs some sort of
> block/packet
> >> I/O done to hit the path of having something allocated, which usually
> >
> > same for iopf
> >
> >> requires
> >> some driver operating to mark things dirty plus page allocator luck in
> picking
> >> the right pages. So the straightforward way to manipulate all this without
> any
> >> of this complexity on real hardware is this is to "map as dirty" and check if
> >> something is missing. It proved useful in these bugs as I couldn't
> reproduce
> >> this by just using the thing (even sharing the same test as Avihai). In any
> >> case, the gotcha is consuming a new perm bit.
> >>
> >
> > is this something which virtio-iommu may come to help? e.g. virtio-iommu
> > backend may be instrumented to fake DMA requests w/o relying on a
> specific
> > device to actually touch memory...
> 
> I would think that you don't need a specific IOMMU model to have VMM do
> fake DMA
> (any model should do I think with the right VMM code). But well but then this
> assumes guests, not applicable to baremetal. The issue with doing specific
> IOMMU
> models (besides incomplete functionality in virtio-iommu vs arch one) is that
> it
> is tied to a specific IOMMU driver rather.

yes, that's the limitation.

> 
> Initially I was thinking a small kvm guest created and instrument the VF.
> Though
> then the latter would like take the majority of code in getting there. And it
> wouldn't look like a test suite and would likely be better acquainted for
> specific cases.

right. anything requiring a functional guest including driver and specific
VF hw cannot be a generic test suite.

> 
> For the purpose of code coverage (of iommu drivers) we could possibly have
> test
> ops that instrument the missing hardware i/o path that a real PCI device
> would
> do, though done in a real hw domain, much like how we do with mock test
> ops. I
> gave the example for MAP as dirty (which I guess we could restrict it to
> IOMMUFD_SELFTEST kconfig), I guess for IOPF one would fixup the user fault
> (much
> like how we fault when we pin pages). I have some hopes that the selftest
> mock
> domain adaptation into real hw domain could be done (but still need to
> attempt
> at it once have spare cycles)... that would cover a huge portion in terms of
> coverage. And perhaps have a tool for the subset of test cases that require
> special handling (dirty-tracking, IOPFs, what else?).

could be a framework in iommu core which calls @do_dma callback
provided by iommu drivers. Within the callback the driver specific
code can walk the page table, set dirty bit, or trigger IOPF, etc. Kind
of a sw page walker.

it requires some fixup to associate a mock device with real hw domain.
Once it's done then the selftest suite can call normal iommu API to get
domain prepared and then trigger @do_dma via mock device to 
exercise various dma patterns.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking
  2024-02-08  8:21         ` Tian, Kevin
@ 2024-02-08 17:23           ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2024-02-08 17:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Joao Martins, iommu@lists.linux.dev, Suravee Suthikulpanit,
	Lu Baolu, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi

On Thu, Feb 08, 2024 at 08:21:01AM +0000, Tian, Kevin wrote:

> could be a framework in iommu core which calls @do_dma callback
> provided by iommu drivers. Within the callback the driver specific
> code can walk the page table, set dirty bit, or trigger IOPF, etc. Kind
> of a sw page walker.

My next next? project is going to be looking at improving the page
table code to more completely consolidate it around common
algorithms. Joao and I had shared a few ideas last year and it turns
out there are quite a few needs out there.

A big point of that would be to create a selftest that could do the
kinds of manipulations you are talking about here.

Then the idea is mock could back the mock domains using the io page
table storage instead of the xarray and we'd get a big improvement to
the other big area of code iommu drivers have to implement.

We'll see how it looks, it may be too ambitious.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages
  2024-02-02 13:34 ` [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages Joao Martins
  2024-02-07 16:51   ` Avihai Horon
@ 2024-05-24 11:22   ` Joao Martins
  1 sibling, 0 replies; 22+ messages in thread
From: Joao Martins @ 2024-05-24 11:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian
  Cc: Suravee Suthikulpanit, Lu Baolu, Avihai Horon, Alex Williamson,
	Shameerali Kolothum Thodi, iommu

On 02/02/2024 13:34, Joao Martins wrote:
> IOVA bitmap is a zerocopy scheme of recording dirty bits that iterate the
> different bitmap user pages at chunks of maximum of PAGE_SIZE/sizeof(struct
> page*) pages.
> 
> When the iterations are split up into 64G, the end of the range may be
> broken up in a way that's aligned with a non base page PTE size. This leads
> to only part of the huge page being recorded in the bitmap. 

Sigh, it seems I haven't fully fixed the problem rooted in this sentence with
IOMMU hugepages :/

When the end of the 64G page is dirty, this is OK, but the problem is when the
IOPTE isn't dirty: if we are dealing with iommu hugepages, imagine a non-dirty
2M page at the end of the current iteration window of 64G which is not fully
included in the 64G range; we test it non-dirty and finish read_and_clear(), and
move to the next iteration to check the next 64G; pin the next 2M of bitmap
pages and hit the second part of the hugepage, and we find that it's dirty. The
hugepage being dirty, we clear the dirty bit of the huge IOPTE *but* we just
record part of the hugepage in the bitmap. We end up checking twice the same
hugepage IOPTE but not recording the entire page as dirty -- and future calls to
GET_DIRTY_BITMAP don't check/send either because we cleared the IOPTE dirty bit.
So we end up not sending such data in the migration stream.

It's relatively rare to trigger this, but this is essentially the leftover bug here.

I have two alternate approaches to fix this; but this is
making me think whether I should just drop this fixed windowing scheme and make
it dynamic to avoid further bugs[*] i.e. pin on demand only when we have
something to dirty.

Here's the two different approaches to fix this bug:

1) Fix iova_bitmap_set_ahead such that the next iteration is able to skip fully
the hugepage we just dirty. Right now I increment the indexes, but the first bit
of the index might still be referencing the hugepage that crossed the iteration.
This is a relatively small cleanup, but the more important change is to have
IOMMU drivers skipping the second half if we are scanning in the middle of a
hugepage. This later part is what I find ugly considering the breaking of the
hugepage is self-induced due to the bitmap iteration in 64G chunks coupled with
the fact that we can't really guess the hugepage size until we walk the page table.

2) Alternatively: never pin anything at start and drop the iteration logic from
callers, and have iova_bitmap_set dynamically pin the range that contains the
bitmap region we want to set bits. It's simpler and the gist in iova_bitmap_set
would be something like this:

@@ -481,6 +487,25 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
                        mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
        unsigned long last_page_idx = mapped->npages - 1;

+       if (unlikely(!mapped->npages ||
+               (iova < mapped->iova ||
+                (iova + length - 1) > (mapped->iova + mapped->length - 1)))) {
+
+               bitmap->mapped_base_index =
+                       iova_bitmap_offset_to_index(bitmap,
+                                                   iova - bitmap->iova);
+
+               iova_bitmap_put(bitmap);
+               if (iova_bitmap_get(bitmap))
+                       return;
+        }

[there are small changes around this to make it faster like cahing length in
mapped::length; but this is what makes the windowing dynamic]

The consequence of approach (2) is the added comparison(s) though I am not
measuring a meaningful difference with all-dirty cases. We would also need logic
to handle the theoretic >=64G hugepages with AMD iommu v1 pgtable and what I
introduced in set_ahead_length already does 99% of it. On the other hand non
dirty IOVA ranges become a tiny bit cheaper but the efficiency benefit is noise
compared to cost of walking the pagetable. The thing I don't like about approach
(1) is having iommu driver skip the partial[*] IOVA page.

With (2) we won't do the self-induced dirty subpages in a IOMMU hugepage due to
iterations, and neither of the two approaches leads to VMMs asking for a partial
range of what was mapped in the IOAS.

Any thoughts?

[*] Something important to note: GET_DIRTY_BITMAP ioctl can be called with a
subrange of the DMA range we mapped. I'll need to adjust the dirty-size in
callers of iommu_dirty_bitmap_record() in iommu drivers. Or introduce the
restriction which forces the user to issue separate ioas map calls in small chunks.

> Note that in
> pratice this is only a problem for IOMMU dirty tracking i.e. when the
> backing PTEs are in IOMMU hugepages and the bitmap is in base page
> granularity. So far this not something that affects VF dirty trackers
> (which reports and records at the same granularity).
> 
> To fix that, if there is a remainder of bits left to set in which the
> current IOVA bitmap doesn't cover, make a copy of the bitmap structure and
> iterate-and-set the rest of the bits remaining. Finally, when advancing the
> iterator, skip all the bits that were set ahead.
> 
> Reported-by: Avihai Horon <avihaih@nvidia.com>
> Fixes: f35f22cc760e ("iommu/vt-d: Access/Dirty bit support for SS domains")
> Fixes: 421a511a293f ("iommu/amd: Access/Dirty bit support in IOPTEs")
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/iommu/iommufd/iova_bitmap.c | 43 +++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/iova_bitmap.c b/drivers/iommu/iommufd/iova_bitmap.c
> index 9d42ab51a6bb..b370e8ee8866 100644
> --- a/drivers/iommu/iommufd/iova_bitmap.c
> +++ b/drivers/iommu/iommufd/iova_bitmap.c
> @@ -113,6 +113,9 @@ struct iova_bitmap {
>  
>  	/* length of the IOVA range for the whole bitmap */
>  	size_t length;
> +
> +	/* length of the IOVA range set ahead the pinned pages */
> +	unsigned long set_ahead_length;
>  };
>  
>  /*
> @@ -341,6 +344,32 @@ static bool iova_bitmap_done(struct iova_bitmap *bitmap)
>  	return bitmap->mapped_base_index >= bitmap->mapped_total_index;
>  }
>  
> +static int iova_bitmap_set_ahead(struct iova_bitmap *bitmap,
> +				 size_t set_ahead_length)
> +{
> +	int ret = 0;
> +
> +	while (set_ahead_length > 0 && !iova_bitmap_done(bitmap)) {
> +		unsigned long length = iova_bitmap_mapped_length(bitmap);
> +		unsigned long iova = iova_bitmap_mapped_iova(bitmap);
> +
> +		ret = iova_bitmap_get(bitmap);
> +		if (ret)
> +			break;
> +
> +		length = min(length, set_ahead_length);
> +		iova_bitmap_set(bitmap, iova, length);
> +
> +		set_ahead_length -= length;
> +		bitmap->mapped_base_index +=
> +			iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
> +		iova_bitmap_put(bitmap);
> +	}
> +
> +	bitmap->set_ahead_length = 0;
> +	return ret;
> +}
> +
>  /*
>   * Advances to the next range, releases the current pinned
>   * pages and pins the next set of bitmap pages.
> @@ -357,6 +386,15 @@ static int iova_bitmap_advance(struct iova_bitmap *bitmap)
>  	if (iova_bitmap_done(bitmap))
>  		return 0;
>  
> +	/* Iterate, set and skip any bits requested for next iteration */
> +	if (bitmap->set_ahead_length) {
> +		int ret;
> +
> +		ret = iova_bitmap_set_ahead(bitmap, bitmap->set_ahead_length);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* When advancing the index we pin the next set of bitmap pages */
>  	return iova_bitmap_get(bitmap);
>  }
> @@ -426,5 +464,10 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
>  		kunmap_local(kaddr);
>  		cur_bit += nbits;
>  	} while (cur_bit <= last_bit);
> +
> +	if (unlikely(cur_bit <= last_bit)) {
> +		bitmap->set_ahead_length =
> +			((last_bit - cur_bit + 1) << bitmap->mapped.pgshift);
> +	}
>  }
>  EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-05-24 11:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-02 13:34 [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Joao Martins
2024-02-02 13:34 ` [PATCH v1 1/9] iommufd/iova_bitmap: Bounds check mapped::pages access Joao Martins
2024-02-07 16:49   ` Avihai Horon
2024-02-02 13:34 ` [PATCH v1 2/9] iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array Joao Martins
2024-02-07 16:50   ` Avihai Horon
2024-02-02 13:34 ` [PATCH v1 3/9] iommufd/selftest: Test u64 unaligned bitmaps Joao Martins
2024-02-02 13:34 ` [PATCH v1 4/9] iommufd/iova_bitmap: Handle recording beyond the mapped pages Joao Martins
2024-02-07 16:51   ` Avihai Horon
2024-05-24 11:22   ` Joao Martins
2024-02-02 13:34 ` [PATCH v1 5/9] iommufd/selftest: Refactor dirty bitmap tests Joao Martins
2024-02-02 13:34 ` [PATCH v1 6/9] iommufd/selftest: Refactor mock_domain_read_and_clear_dirty() Joao Martins
2024-02-02 13:34 ` [PATCH v1 7/9] iommufd/selftest: Hugepage mock domain support Joao Martins
2024-02-02 13:34 ` [PATCH v1 8/9] iommufd/selftest: Add mock IO hugepages tests Joao Martins
2024-02-02 13:34 ` [PATCH v1 9/9] iommufd/iova_bitmap: Consider page offset for the pages to be pinned Joao Martins
2024-02-07 16:52   ` Avihai Horon
2024-02-06 18:13 ` [PATCH v1 0/9] iommufd/iova_bitmap: Bug fixes for IOMMU dirty tracking Jason Gunthorpe
2024-02-06 18:32   ` Joao Martins
2024-02-06 20:57     ` Joao Martins
2024-02-07  8:41     ` Tian, Kevin
2024-02-07 10:23       ` Joao Martins
2024-02-08  8:21         ` Tian, Kevin
2024-02-08 17:23           ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.