public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
@ 2026-03-27 12:03 Li Wang
  2026-03-27 19:13 ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Li Wang @ 2026-03-27 12:03 UTC (permalink / raw)
  To: rppt, akpm, david, ljs, Liam.Howlett, vbabka, surenb, mhocko,
	shuah
  Cc: aubaker, liwang, linux-mm, linux-kselftest, linux-kernel

hugetlb_dio test uses sub-page offsets (pagesize / 2) to verify that
hugepages used as DIO user buffers are correctly unpinned at completion.

However, on filesystems with a logical block size larger than half the
page size (e.g., 4K-sector block devices), these unaligned DIO writes
are rejected with -EINVAL, causing the test to fail unexpectedly.

Add get_dio_alignment() to query the filesystem's required DIO alignment
via statx(STATX_DIOALIGN) and pass it to run_dio_using_hugetlb(). Skip
individual test cases whose write length is not a multiple of the
alignment, so that aligned cases are still tested.

=== Reproduce Steps ===

  # dd if=/dev/zero of=/tmp/test.img bs=1M count=512
  # losetup --sector-size 4096 /dev/loop0 /tmp/test.img
  # mkfs.xfs /dev/loop0
  # mkdir -p /mnt/dio_test
  # mount /dev/loop0 /mnt/dio_test

  // Modify test to open /mnt/dio_test and rebuild it:
  -       fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
  +       fd = open("/mnt/dio_test", O_TMPFILE | O_RDWR | O_DIRECT, 0664);

  # getconf PAGESIZE
  4096

  # echo 100 >/proc/sys/vm/nr_hugepages

  # ./hugetlb_dio
  TAP version 13
  1..4
  # No. Free pages before allocation : 100
  # No. Free pages after munmap : 100
  ok 1 free huge pages from 0-12288
  Bail out! Error writing to file
  : Invalid argument (22)
  # Planned tests != run tests (4 != 1)
  # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Li Wang <liwang@redhat.com>
---

Notes:
    v2:
    	- Pass dio_align as a parameter to run_dio_using_hugetlb()
    	  instead of generally page_size/2 alignment check.

 tools/testing/selftests/mm/hugetlb_dio.c | 56 +++++++++++++++++-------
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
index 9ac62eb4c97d..638a8abe3ae2 100644
--- a/tools/testing/selftests/mm/hugetlb_dio.c
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -20,7 +20,35 @@
 #include "vm_util.h"
 #include "kselftest.h"
 
-void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+#ifndef STATX_DIOALIGN
+#define STATX_DIOALIGN		0x00002000U
+#endif
+
+unsigned int get_dio_alignment(void)
+{
+	int fd, ret;
+	struct statx stx;
+	unsigned int dio_align = 1;
+
+	fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
+	if (fd < 0)
+		ksft_exit_skip("Unable to allocate file: %s\n", strerror(errno));
+
+	ret = statx(fd, "", AT_EMPTY_PATH, STATX_DIOALIGN, &stx);
+	if (ret < 0) {
+		ksft_perror("statx() failed");
+	} else if ((stx.stx_mask & STATX_DIOALIGN) &&
+			stx.stx_dio_offset_align) {
+		dio_align = stx.stx_dio_offset_align;
+	}
+
+	close(fd);
+
+	return dio_align;
+}
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off,
+			unsigned int dio_align)
 {
 	int fd;
 	char *buffer =  NULL;
@@ -33,6 +61,11 @@ void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
 	const int mmap_prot  = PROT_READ | PROT_WRITE;
 
 	writesize = end_off - start_off;
+	if (writesize % dio_align != 0) {
+		ksft_test_result_skip("DIO alignment (%u) incompatible with offset %zu\n",
+				dio_align, writesize);
+		return;
+	}
 
 	/* Get the default huge page size */
 	h_pagesize = default_huge_page_size();
@@ -89,37 +122,28 @@ void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
 
 int main(void)
 {
-	size_t pagesize = 0;
-	int fd;
+	size_t pagesize = psize();
+	unsigned int dio_align = get_dio_alignment();
 
 	ksft_print_header();
 
-	/* Open the file to DIO */
-	fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
-	if (fd < 0)
-		ksft_exit_skip("Unable to allocate file: %s\n", strerror(errno));
-	close(fd);
-
 	/* Check if huge pages are free */
 	if (!get_free_hugepages())
 		ksft_exit_skip("No free hugepage, exiting\n");
 
 	ksft_set_plan(4);
 
-	/* Get base page size */
-	pagesize  = psize();
-
 	/* start and end is aligned to pagesize */
-	run_dio_using_hugetlb(0, (pagesize * 3));
+	run_dio_using_hugetlb(0, (pagesize * 3), dio_align);
 
 	/* start is aligned but end is not aligned */
-	run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+	run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2), dio_align);
 
 	/* start is unaligned and end is aligned */
-	run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+	run_dio_using_hugetlb(pagesize / 2, (pagesize * 3), dio_align);
 
 	/* both start and end are unaligned */
-	run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+	run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2), dio_align);
 
 	ksft_finished();
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
  2026-03-27 12:03 [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible Li Wang
@ 2026-03-27 19:13 ` Andrew Morton
  2026-03-28  0:28   ` Li Wang
  2026-03-30  5:19   ` Li Wang
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2026-03-27 19:13 UTC (permalink / raw)
  To: Li Wang
  Cc: rppt, david, ljs, Liam.Howlett, vbabka, surenb, mhocko, shuah,
	aubaker, linux-mm, linux-kselftest, linux-kernel

On Fri, 27 Mar 2026 20:03:05 +0800 Li Wang <liwang@redhat.com> wrote:

> hugetlb_dio test uses sub-page offsets (pagesize / 2) to verify that
> hugepages used as DIO user buffers are correctly unpinned at completion.
> 
> However, on filesystems with a logical block size larger than half the
> page size (e.g., 4K-sector block devices), these unaligned DIO writes
> are rejected with -EINVAL, causing the test to fail unexpectedly.
> 
> Add get_dio_alignment() to query the filesystem's required DIO alignment
> via statx(STATX_DIOALIGN) and pass it to run_dio_using_hugetlb(). Skip
> individual test cases whose write length is not a multiple of the
> alignment, so that aligned cases are still tested.
> 
> === Reproduce Steps ===
> 
>   # dd if=/dev/zero of=/tmp/test.img bs=1M count=512
>   # losetup --sector-size 4096 /dev/loop0 /tmp/test.img
>   # mkfs.xfs /dev/loop0
>   # mkdir -p /mnt/dio_test
>   # mount /dev/loop0 /mnt/dio_test
> 
>   // Modify test to open /mnt/dio_test and rebuild it:
>   -       fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
>   +       fd = open("/mnt/dio_test", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
> 
>   # getconf PAGESIZE
>   4096
> 
>   # echo 100 >/proc/sys/vm/nr_hugepages
> 
>   # ./hugetlb_dio
>   TAP version 13
>   1..4
>   # No. Free pages before allocation : 100
>   # No. Free pages after munmap : 100
>   ok 1 free huge pages from 0-12288
>   Bail out! Error writing to file
>   : Invalid argument (22)
>   # Planned tests != run tests (4 != 1)
>   # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> ...
>
>     v2:
>     	- Pass dio_align as a parameter to run_dio_using_hugetlb()
>     	  instead of generally page_size/2 alignment check.
> 

Whee, AI review has decided that pre glibc-2.36 is a problem (last time
it was pre-2.37).

And it's forgotten the previous fs-doesnt-support-DIO issues.  Did you
alter that?

The get_dio_alignment-before-ksft_print_header thing seems legit.

https://sashiko.dev/#/patchset/20260327120305.58653-1-liwang@redhat.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
  2026-03-27 19:13 ` Andrew Morton
@ 2026-03-28  0:28   ` Li Wang
  2026-03-30  5:19   ` Li Wang
  1 sibling, 0 replies; 4+ messages in thread
From: Li Wang @ 2026-03-28  0:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: rppt, david, ljs, Liam.Howlett, vbabka, surenb, mhocko, shuah,
	aubaker, linux-mm, linux-kselftest, linux-kernel

On Fri, Mar 27, 2026 at 12:13:50PM -0700, Andrew Morton wrote:
> On Fri, 27 Mar 2026 20:03:05 +0800 Li Wang <liwang@redhat.com> wrote:
> 
> > hugetlb_dio test uses sub-page offsets (pagesize / 2) to verify that
> > hugepages used as DIO user buffers are correctly unpinned at completion.
> > 
> > However, on filesystems with a logical block size larger than half the
> > page size (e.g., 4K-sector block devices), these unaligned DIO writes
> > are rejected with -EINVAL, causing the test to fail unexpectedly.
> > 
> > Add get_dio_alignment() to query the filesystem's required DIO alignment
> > via statx(STATX_DIOALIGN) and pass it to run_dio_using_hugetlb(). Skip
> > individual test cases whose write length is not a multiple of the
> > alignment, so that aligned cases are still tested.
> > 
> > === Reproduce Steps ===
> > 
> >   # dd if=/dev/zero of=/tmp/test.img bs=1M count=512
> >   # losetup --sector-size 4096 /dev/loop0 /tmp/test.img
> >   # mkfs.xfs /dev/loop0
> >   # mkdir -p /mnt/dio_test
> >   # mount /dev/loop0 /mnt/dio_test
> > 
> >   // Modify test to open /mnt/dio_test and rebuild it:
> >   -       fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
> >   +       fd = open("/mnt/dio_test", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
> > 
> >   # getconf PAGESIZE
> >   4096
> > 
> >   # echo 100 >/proc/sys/vm/nr_hugepages
> > 
> >   # ./hugetlb_dio
> >   TAP version 13
> >   1..4
> >   # No. Free pages before allocation : 100
> >   # No. Free pages after munmap : 100
> >   ok 1 free huge pages from 0-12288
> >   Bail out! Error writing to file
> >   : Invalid argument (22)
> >   # Planned tests != run tests (4 != 1)
> >   # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > 
> > ...
> >
> >     v2:
> >     	- Pass dio_align as a parameter to run_dio_using_hugetlb()
> >     	  instead of generally page_size/2 alignment check.
> > 
> 
> Whee, AI review has decided that pre glibc-2.36 is a problem (last time
> it was pre-2.37).

Ok, will handle this build issue with old glibc.

> And it's forgotten the previous fs-doesnt-support-DIO issues.  Did you
> alter that?

[Sorry, forgot to mention in change logs]

Yes, I added O_DIRECT flag back to the first open(). Then it'd skip into
a failure on filesystems that doesn't support O_DIRECT.

Also, statx on FS doesn't support DIO will set stx_dio_offset_align to 0,
which may led a division by zero issue, I added a zero check.

	} else if ((stx.stx_mask & STATX_DIOALIGN) &&
			stx.stx_dio_offset_align) {


> The get_dio_alignment-before-ksft_print_header thing seems legit.

+1

And, before sending v3, I would leave this patch for more days to
see if others comment.

-- 
Regards,
Li Wang



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
  2026-03-27 19:13 ` Andrew Morton
  2026-03-28  0:28   ` Li Wang
@ 2026-03-30  5:19   ` Li Wang
  1 sibling, 0 replies; 4+ messages in thread
From: Li Wang @ 2026-03-30  5:19 UTC (permalink / raw)
  To: Andrew Morton, linux-fsdevel
  Cc: rppt, david, ljs, Liam.Howlett, vbabka, surenb, mhocko, shuah,
	aubaker, linux-mm, linux-kselftest, linux-kernel

Reply Sashiko comment:

> https://sashiko.dev/#/patchset/20260327120305.58653-1-liwang@redhat.com
>
> > +	if (writesize % dio_align != 0) {
> +		ksft_test_result_skip("DIO alignment (%u) incompatible with offset %zu\n",
> +				dio_align, writesize);
> +		return;
> +	}
>
> Is this alignment check complete? 
>
> Direct I/O requires both the transfer length and the memory buffer address
> to be aligned. Later in this function, start_off is used as the buffer offset:
> 	buffer = orig_buffer;
> 	buffer += start_off;
> If start_off is pagesize / 2 (e.g., 2048) and writesize is pagesize * 3
> (e.g., 12288), writesize is a multiple of a 4096-byte alignment, so the test
> is not skipped.
>
> However, the memory buffer itself is only 2048-byte aligned. Will the
> subsequent write() still fail with -EINVAL on 4K-sector devices?

TL;DR: Yes, we should do both buffer address and writesize alignment
       checks to satisfy all FS types.

Looking at the kernel code: fs/iomap/direct-dio.c, the only alignment
check there is at line#413, which checks file's pos and write length.

EXT4:

ext4_file_write_iter
  ext4_dio_write_iter
    iomap_dio_rw
      __iomap_dio_rw
        iomap_dio_iter
          iomap_dio_bio_iter

  390	static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
  391	{
  		...
  403
  404		/*
  405		 * File systems that write out of place and always allocate new blocks
  406		 * need each bio to be block aligned as that's the unit of allocation.
  407		 */
  408		if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
  409			alignment = fs_block_size;
  410		else
  411			alignment = bdev_logical_block_size(iomap->bdev);
  412
  413		if ((pos | length) & (alignment - 1))
  414			return -EINVAL;
  415		...

Sashiko points out the buffer-address should do alignment check as well,
I firstly suspect it based on the FS extra check before the iomap_dio_rw:

ext4_file_write_iter
  ext4_dio_write_iter
    ext4_should_use_dio
       iov_iter_alignment <--- do buffer/writesize alignment check

  842 unsigned long iov_iter_alignment(const struct iov_iter *i)
  843 {
  		...
  853                 return iov_iter_alignment_iovec(i);
  		...
  865 }

  799 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
  800 {
  		...
  809                   res |= (unsigned long)iov->iov_base + skip;
  812                   res |= len;
		...
  818         return res;
  819 }

But eventually I found that this is only fallback to the buffer I/O
when the direct I/O is unsupported (go to: ext4_buffered_write_iter).
This wouldn't happen in the test as it open with O_DIRECT flag.

Then, I turned to look at Btrfs path:

btrfs_file_write_iter
  btrfs_do_write_iter
    btrfs_direct_write
      check_direct_IO <--- do buffer alignment check
      ...
      btrfs_dio_write
        __iomap_dio_rw <--- do samething like ext4

  778 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
  779                                const struct iov_iter *iter, loff_t offset)
  780 {
  781         const u32 blocksize_mask = fs_info->sectorsize - 1;
  782
  783         if (offset & blocksize_mask)
  784                 return -EINVAL;
  785
  786         if (iov_iter_alignment(iter) & blocksize_mask)
  787                 return -EINVAL;
  788         return 0;
  789 }

Yes, here I found the evendice that iov_iter_alignment(iter) & blocksize_mask)
do the alignment check.

Unlike ext4 which never reaches the check for normal files, btrfs always checks
buffer alignment for every DIO operation. And it's a hard -EINVAL, not a silent
fallback to buffered I/O.

-- 
Regards,
Li Wang



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-30  5:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 12:03 [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible Li Wang
2026-03-27 19:13 ` Andrew Morton
2026-03-28  0:28   ` Li Wang
2026-03-30  5:19   ` Li Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox