public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible
       [not found] ` <20260327121350.858a127fa49ed6e1eb4a40a7@linux-foundation.org>
@ 2026-03-30  5:19   ` Li Wang
  0 siblings, 0 replies; only message in thread
From: Li Wang @ 2026-03-30  5:19 UTC (permalink / raw)
  To: Andrew Morton, linux-fsdevel
  Cc: rppt, david, ljs, Liam.Howlett, vbabka, surenb, mhocko, shuah,
	aubaker, linux-mm, linux-kselftest, linux-kernel

Reply Sashiko comment:

> https://sashiko.dev/#/patchset/20260327120305.58653-1-liwang@redhat.com
>
> > +	if (writesize % dio_align != 0) {
> +		ksft_test_result_skip("DIO alignment (%u) incompatible with offset %zu\n",
> +				dio_align, writesize);
> +		return;
> +	}
>
> Is this alignment check complete? 
>
> Direct I/O requires both the transfer length and the memory buffer address
> to be aligned. Later in this function, start_off is used as the buffer offset:
> 	buffer = orig_buffer;
> 	buffer += start_off;
> If start_off is pagesize / 2 (e.g., 2048) and writesize is pagesize * 3
> (e.g., 12288), writesize is a multiple of a 4096-byte alignment, so the test
> is not skipped.
>
> However, the memory buffer itself is only 2048-byte aligned. Will the
> subsequent write() still fail with -EINVAL on 4K-sector devices?

TL;DR: Yes, we should do both buffer address and writesize alignment
       checks to satisfy all FS types.

Looking at the kernel code: fs/iomap/direct-dio.c, the only alignment
check there is at line#413, which checks file's pos and write length.

EXT4:

ext4_file_write_iter
  ext4_dio_write_iter
    iomap_dio_rw
      __iomap_dio_rw
        iomap_dio_iter
          iomap_dio_bio_iter

  390	static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
  391	{
  		...
  403
  404		/*
  405		 * File systems that write out of place and always allocate new blocks
  406		 * need each bio to be block aligned as that's the unit of allocation.
  407		 */
  408		if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED)
  409			alignment = fs_block_size;
  410		else
  411			alignment = bdev_logical_block_size(iomap->bdev);
  412
  413		if ((pos | length) & (alignment - 1))
  414			return -EINVAL;
  415		...

Sashiko points out the buffer-address should do alignment check as well,
I firstly suspect it based on the FS extra check before the iomap_dio_rw:

ext4_file_write_iter
  ext4_dio_write_iter
    ext4_should_use_dio
       iov_iter_alignment <--- do buffer/writesize alignment check

  842 unsigned long iov_iter_alignment(const struct iov_iter *i)
  843 {
  		...
  853                 return iov_iter_alignment_iovec(i);
  		...
  865 }

  799 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
  800 {
  		...
  809                   res |= (unsigned long)iov->iov_base + skip;
  812                   res |= len;
		...
  818         return res;
  819 }

But eventually I found that this is only fallback to the buffer I/O
when the direct I/O is unsupported (go to: ext4_buffered_write_iter).
This wouldn't happen in the test as it open with O_DIRECT flag.

Then, I turned to look at Btrfs path:

btrfs_file_write_iter
  btrfs_do_write_iter
    btrfs_direct_write
      check_direct_IO <--- do buffer alignment check
      ...
      btrfs_dio_write
        __iomap_dio_rw <--- do samething like ext4

  778 static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
  779                                const struct iov_iter *iter, loff_t offset)
  780 {
  781         const u32 blocksize_mask = fs_info->sectorsize - 1;
  782
  783         if (offset & blocksize_mask)
  784                 return -EINVAL;
  785
  786         if (iov_iter_alignment(iter) & blocksize_mask)
  787                 return -EINVAL;
  788         return 0;
  789 }

Yes, here I found the evendice that iov_iter_alignment(iter) & blocksize_mask)
do the alignment check.

Unlike ext4 which never reaches the check for normal files, btrfs always checks
buffer alignment for every DIO operation. And it's a hard -EINVAL, not a silent
fallback to buffered I/O.

-- 
Regards,
Li Wang


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-30  5:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260327120305.58653-1-liwang@redhat.com>
     [not found] ` <20260327121350.858a127fa49ed6e1eb4a40a7@linux-foundation.org>
2026-03-30  5:19   ` [PATCH v2] selftests/mm: skip hugetlb_dio tests when DIO alignment is incompatible Li Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox