From: Li Wang <liwang@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-fsdevel@vger.kernel.org
Cc: rppt@kernel.org, david@kernel.org, ljs@kernel.org,
Liam.Howlett@oracle.com, vbabka@kernel.org, surenb@google.com,
mhocko@suse.com, shuah@kernel.org, aubaker@redhat.com,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
Date: Mon, 30 Mar 2026 13:19:41 +0800 [thread overview]
Message-ID: <acoH7TALgBmBkk2h@redhat.com> (raw)
In-Reply-To: <20260327121350.858a127fa49ed6e1eb4a40a7@linux-foundation.org>
Reply Sashiko comment:
> https://sashiko.dev/#/patchset/20260327120305.58653-1-liwang@redhat.com
>
> > + if (writesize % dio_align != 0) {
> + ksft_test_result_skip("DIO alignment (%u) incompatible with offset %zu\n",
> + dio_align, writesize);
> + return;
> + }
>
> Is this alignment check complete?
>
> Direct I/O requires both the transfer length and the memory buffer address
> to be aligned. Later in this function, start_off is used as the buffer offset:
> buffer = orig_buffer;
> buffer += start_off;
> If start_off is pagesize / 2 (e.g., 2048) and writesize is pagesize * 3
> (e.g., 12288), writesize is a multiple of a 4096-byte alignment, so the test
> is not skipped.
>
> However, the memory buffer itself is only 2048-byte aligned. Will the
> subsequent write() still fail with -EINVAL on 4K-sector devices?
TL;DR: Yes, we should do both buffer address and writesize alignment
checks to satisfy all FS types.
Looking at the kernel code: fs/iomap/direct-dio.c, the only alignment
check there is at line#413, which checks file's pos and write length.
EXT4:
ext4_file_write_iter
ext4_dio_write_iter
iomap_dio_rw
__iomap_dio_rw
iomap_dio_iter
iomap_dio_bio_iter
390 static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
391 {
...
403
404 /*
405 * File systems that write out of place and always allocate new blocks
406 * need each bio to be block aligned as that's the unit of allocation.
407 */
408 if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
409 alignment = fs_block_size;
410 else
411 alignment = bdev_logical_block_size(iomap->bdev);
412
413 if ((pos | length) & (alignment - 1))
414 return -EINVAL;
415 ...
Sashiko points out the buffer-address should do alignment check as well,
I firstly suspect it based on the FS extra check before the iomap_dio_rw:
ext4_file_write_iter
ext4_dio_write_iter
ext4_should_use_dio
iov_iter_alignment <--- do buffer/writesize alignment check
842 unsigned long iov_iter_alignment(const struct iov_iter *i)
843 {
...
853 return iov_iter_alignment_iovec(i);
...
865 }
799 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
800 {
...
809 res |= (unsigned long)iov->iov_base + skip;
812 res |= len;
...
818 return res;
819 }
But eventually I found that this is only fallback to the buffer I/O
when the direct I/O is unsupported (go to: ext4_buffered_write_iter).
This wouldn't happen in the test as it open with O_DIRECT flag.
Then, I turned to look at Btrfs path:
btrfs_file_write_iter
btrfs_do_write_iter
btrfs_direct_write
check_direct_IO <--- do buffer alignment check
...
btrfs_dio_write
__iomap_dio_rw <--- do samething like ext4
778 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
779 const struct iov_iter *iter, loff_t offset)
780 {
781 const u32 blocksize_mask = fs_info->sectorsize - 1;
782
783 if (offset & blocksize_mask)
784 return -EINVAL;
785
786 if (iov_iter_alignment(iter) & blocksize_mask)
787 return -EINVAL;
788 return 0;
789 }
Yes, here I found the evendice that iov_iter_alignment(iter) & blocksize_mask)
do the alignment check.
Unlike ext4 which never reaches the check for normal files, btrfs always checks
buffer alignment for every DIO operation. And it's a hard -EINVAL, not a silent
fallback to buffered I/O.
--
Regards,
Li Wang
prev parent reply other threads:[~2026-03-30 5:19 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-27 12:03 [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible Li Wang
2026-03-27 19:13 ` Andrew Morton
2026-03-28 0:28 ` Li Wang
2026-03-30 5:19 ` Li Wang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acoH7TALgBmBkk2h@redhat.com \
--to=liwang@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=aubaker@redhat.com \
--cc=david@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.