linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Zorro Lang <zlang@redhat.com>
Cc: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: Re: [PATCH] fstests: add a new generic test case to verify direct io read
Date: Sun, 2 Nov 2025 07:42:45 +1030	[thread overview]
Message-ID: <9acb730d-6174-4ae6-a5d8-d1bca947462b@suse.com> (raw)
In-Reply-To: <20251101155702.5e2g42ixg3qvh5b5@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com>



在 2025/11/2 02:27, Zorro Lang 写道:
> On Fri, Sep 26, 2025 at 07:20:13PM +0930, Qu Wenruo wrote:
>> [POSSIBLE PROBLEM]
>> For filesystems with data checksum, a user space program can provide a
>> block of aligned memory as read buffer, and modify the buffer during the
>> read.
>>
>> Btrfs used to utilize such memory directly during checksum
>> verification, and since the content can be modified by user space, the
>> checksum verification can fail easily when such modification happened.
>>
>> This will cause a false checksum mismatch alert, and if there is no more
>> mirror, the read will fail.
>> And even if there is an extra mirror, checksum verification can still
>> fail due to user space interference.
>>
>> [NEW TEST CASE]
>> The new test case is pretty much the same as the existing generic/761,
>> utilizing a user space multi-thread program to do a direct read,
>> meanwhile modifying the buffer at the same time.
>>
>> The new program, dio-read-race, is almost the same, with some changes:
>>
>> - O_RDRW flag to open the file
>> - Populate the contents of the file
>> - Do read and modify
>>    Instead of write and modify
>>
>> [EXPECTED RESULTS]
>> For unpatched kernels, the new test case fails like this:
>>
>>   generic/772       - output mismatch (see /home/adam/xfstests/results//generic/772.out.bad)
>>       --- tests/generic/772.out	2025-09-26 19:07:37.319000000 +0930
>>       +++ /home/adam/xfstests/results//generic/772.out.bad	2025-09-26 19:08:33.873401495 +0930
>>       @@ -1,2 +1,3 @@
>>        QA output created by 772
>>       +io thread failed
>>        Silence is golden
>>       ...
>>       (Run 'diff -u /home/adam/xfstests/tests/generic/772.out /home/adam/xfstests/results//generic/772.out.bad'  to see the entire diff)
>>
>>   HINT: You _MAY_ be missing kernel fix:
>>         xxxxxxxxxxxx btrfs: fallback to buffered read if the inode has data checksum
>>
>> With dmesg recording some checksum mismatch:
>>
>>   BTRFS warning (device dm-3): csum failed root 5 ino 257 off 241664 csum 0x13fec125 expected csum 0x8941f998 mirror 1
>>   BTRFS error (device dm-3): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
>>   BTRFS warning (device dm-3): direct IO failed ino 257 op 0x8800 offset 0x3b000 len 4096 err no 10
>>
>> For patched kernels, the new test case passes, without any error in
>> dmesg:
>>
>>   generic/772 6s ...  6s
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
> 
> This patch has been there more than 1 month. It works on my side, and
> didn't bring in any regressions. I'd like to merge this patch,
> 
> Reviewed-by: Zorro Lang <zlang@redhat.com>

Thanks for the review, I guess the reason the patch and test case didn't 
get reviewed is the performance drop.

When we enables buffered fallback for direct writes, it indeed caused 
performance drop, and end users noticed (mostly by benchmark tools).

But thankfully it's only affecting those benchmark tools, and those 
tools are adapting to btrfs by using NOCOW flags, so no real world 
performance regression (yet).

I'll keep pushing the fix so that the test case can be merged with an 
upstream fix.

Thanks,
Qu

> 
>>   .gitignore            |   1 +
>>   src/Makefile          |   2 +-
>>   src/dio-read-race.c   | 167 ++++++++++++++++++++++++++++++++++++++++++
>>   tests/generic/772     |  41 +++++++++++
>>   tests/generic/772.out |   2 +
>>   5 files changed, 212 insertions(+), 1 deletion(-)
>>   create mode 100644 src/dio-read-race.c
>>   create mode 100755 tests/generic/772
>>   create mode 100644 tests/generic/772.out
>>
>> diff --git a/.gitignore b/.gitignore
>> index 82c57f41..3e950079 100644
>> --- a/.gitignore
>> +++ b/.gitignore
>> @@ -210,6 +210,7 @@ tags
>>   /src/fiemap-fault
>>   /src/min_dio_alignment
>>   /src/dio-writeback-race
>> +/src/dio-read-race
>>   /src/unlink-fsync
>>   /src/file_attr
>>   
>> diff --git a/src/Makefile b/src/Makefile
>> index 711dbb91..e7dd4db5 100644
>> --- a/src/Makefile
>> +++ b/src/Makefile
>> @@ -21,7 +21,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
>>   	t_mmap_writev_overlap checkpoint_journal mmap-rw-fault allocstale \
>>   	t_mmap_cow_memory_failure fake-dump-rootino dio-buf-fault rewinddir-test \
>>   	readdir-while-renames dio-append-buf-fault dio-write-fsync-same-fd \
>> -	dio-writeback-race unlink-fsync
>> +	dio-writeback-race dio-read-race unlink-fsync
>>   
>>   LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
>>   	preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
>> diff --git a/src/dio-read-race.c b/src/dio-read-race.c
>> new file mode 100644
>> index 00000000..7c6389e0
>> --- /dev/null
>> +++ b/src/dio-read-race.c
>> @@ -0,0 +1,167 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * dio-read-race -- test direct IO read with the contents of
>> + * the buffer changed during the read, which should not cause any read error.
>> + *
>> + * Copyright (C) 2025 SUSE Linux Products GmbH.
>> + */
>> +
>> +/*
>> + * dio-read-race
>> + *
>> + * Compile:
>> + *
>> + *   gcc -Wall -pthread -o dio-read-race dio-read-race.c
>> + *
>> + * Make sure filesystems with explicit data checksum can handle direct IO
>> + * reads correctly, so that even the contents of the direct IO buffer changes
>> + * during read, the fs should still work fine without data checksum mismatch.
>> + * (Normally by falling back to buffer IO to keep the checksum and contents
>> + *  consistent)
>> + *
>> + * Usage:
>> + *
>> + *   dio-read-race [-b <buffersize>] [-s <filesize>] <filename>
>> + *
>> + * Where:
>> + *
>> + *   <filename> is the name of the test file to create, if the file exists
>> + *   it will be overwritten.
>> + *
>> + *   <buffersize> is the buffer size for direct IO. Users are responsible to
>> + *   probe the correct direct IO size and alignment requirement.
>> + *   If not specified, the default value will be 4096.
>> + *
>> + *   <filesize> is the total size of the test file. If not aligned to <blocksize>
>> + *   the result file size will be rounded up to <blocksize>.
>> + *   If not specified, the default value will be 64MiB.
>> + */
>> +
>> +#include <fcntl.h>
>> +#include <stdlib.h>
>> +#include <stdio.h>
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +#include <getopt.h>
>> +#include <string.h>
>> +#include <errno.h>
>> +#include <sys/stat.h>
>> +
>> +static int fd = -1;
>> +static void *buf = NULL;
>> +static unsigned long long filesize = 64 * 1024 * 1024;
>> +static int blocksize = 4096;
>> +
>> +const int orig_pattern = 0x00;
>> +const int modify_pattern = 0xff;
>> +
>> +static void *do_io(void *arg)
>> +{
>> +	ssize_t ret;
>> +
>> +	ret = read(fd, buf, blocksize);
>> +	pthread_exit((void *)ret);
>> +}
>> +
>> +static void *do_modify(void *arg)
>> +{
>> +	memset(buf, modify_pattern, blocksize);
>> +	pthread_exit(NULL);
>> +}
>> +
>> +int main (int argc, char *argv[])
>> +{
>> +	pthread_t io_thread;
>> +	pthread_t modify_thread;
>> +	unsigned long long filepos;
>> +	int opt;
>> +	int ret = -EINVAL;
>> +
>> +	while ((opt = getopt(argc, argv, "b:s:")) > 0) {
>> +		switch (opt) {
>> +		case 'b':
>> +			blocksize = atoi(optarg);
>> +			if (blocksize == 0) {
>> +				fprintf(stderr, "invalid blocksize '%s'\n", optarg);
>> +				goto error;
>> +			}
>> +			break;
>> +		case 's':
>> +			filesize = atoll(optarg);
>> +			if (filesize == 0) {
>> +				fprintf(stderr, "invalid filesize '%s'\n", optarg);
>> +				goto error;
>> +			}
>> +			break;
>> +		default:
>> +			fprintf(stderr, "unknown option '%c'\n", opt);
>> +			goto error;
>> +		}
>> +	}
>> +	if (optind >= argc) {
>> +		fprintf(stderr, "missing argument\n");
>> +		goto error;
>> +	}
>> +	ret = posix_memalign(&buf, sysconf(_SC_PAGESIZE), blocksize);
>> +	if (!buf) {
>> +		fprintf(stderr, "failed to allocate aligned memory\n");
>> +		exit(EXIT_FAILURE);
>> +	}
>> +	fd = open(argv[optind], O_DIRECT | O_RDWR | O_CREAT, 0600);
>> +	if (fd < 0) {
>> +		fprintf(stderr, "failed to open file '%s': %m\n", argv[optind]);
>> +		goto error;
>> +	}
>> +
>> +	memset(buf, orig_pattern, blocksize);
>> +	/* Create the original file. */
>> +	for (filepos = 0; filepos < filesize; filepos += blocksize) {
>> +
>> +		ret = write(fd, buf, blocksize);
>> +		if (ret < 0) {
>> +			ret = -errno;
>> +			fprintf(stderr, "failed to write the initial content: %m");
>> +			goto error;
>> +		}
>> +	}
>> +	ret = lseek(fd, 0, SEEK_SET);
>> +	if (ret < 0) {
>> +		ret = -errno;
>> +		fprintf(stderr, "failed to set file offset to 0: %m");
>> +		goto error;
>> +	}
>> +
>> +	/* Do the read race. */
>> +	for (filepos = 0; filepos < filesize; filepos += blocksize) {
>> +		void *retval = NULL;
>> +
>> +		memset(buf, orig_pattern, blocksize);
>> +		pthread_create(&io_thread, NULL, do_io, NULL);
>> +		pthread_create(&modify_thread, NULL, do_modify, NULL);
>> +		ret = pthread_join(io_thread, &retval);
>> +		if (ret) {
>> +			errno = ret;
>> +			ret = -ret;
>> +			fprintf(stderr, "failed to join io thread: %m\n");
>> +			goto error;
>> +		}
>> +		if ((ssize_t )retval < blocksize) {
>> +			ret = -EIO;
>> +			fprintf(stderr, "io thread failed\n");
>> +			goto error;
>> +		}
>> +		ret = pthread_join(modify_thread, NULL);
>> +		if (ret) {
>> +			errno = ret;
>> +			ret = -ret;
>> +			fprintf(stderr, "failed to join modify thread: %m\n");
>> +			goto error;
>> +		}
>> +	}
>> +error:
>> +	close(fd);
>> +	free(buf);
>> +	if (ret < 0)
>> +		return EXIT_FAILURE;
>> +	return EXIT_SUCCESS;
>> +}
>> diff --git a/tests/generic/772 b/tests/generic/772
>> new file mode 100755
>> index 00000000..46593536
>> --- /dev/null
>> +++ b/tests/generic/772
>> @@ -0,0 +1,41 @@
>> +#! /bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +# Copyright (c) 2025 2025 SUSE Linux Products GmbH.  All Rights Reserved.
>> +#
>> +# FS QA Test 772
>> +#
>> +# Making sure direct IO (O_DIRECT) reads won't cause any data checksum mismatch,
>> +# even if the contents of the buffer changes during read.
>> +#
>> +# This is mostly for filesystems with data checksum support, as the content can
>> +# be modified by user space during checksum verification.
>> +#
>> +# To avoid such false alerts, such filesystems should fallback to buffer IO to
>> +# avoid inconsistency, and never trust user space memory when checksum is involved.
>> +# For filesystems without data checksum support, nothing needs to be bothered.
>> +#
>> +. ./common/preamble
>> +_begin_fstest auto quick
>> +
>> +_require_scratch
>> +_require_odirect
>> +_require_test_program dio-read-race
>> +
>> +
>> +[ "$FSTYP" = "btrfs" ] && _fixed_by_kernel_commit xxxxxxxxxxxx \
>> +	"btrfs: fallback to buffered read if the inode has data checksum"
>> +
>> +_scratch_mkfs > $seqres.full 2>&1
>> +_scratch_mount
>> +
>> +blocksize=$(_get_file_block_size $SCRATCH_MNT)
>> +filesize=$(( 64 * 1024 * 1024))
>> +
>> +echo "blocksize=$blocksize filesize=$filesize" >> $seqres.full
>> +
>> +$here/src/dio-read-race -b $blocksize -s $filesize $SCRATCH_MNT/foobar
>> +
>> +echo "Silence is golden"
>> +
>> +# success, all done
>> +_exit 0
>> diff --git a/tests/generic/772.out b/tests/generic/772.out
>> new file mode 100644
>> index 00000000..98c13968
>> --- /dev/null
>> +++ b/tests/generic/772.out
>> @@ -0,0 +1,2 @@
>> +QA output created by 772
>> +Silence is golden
>> -- 
>> 2.51.0
>>
>>
> 


  reply	other threads:[~2025-11-01 21:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26  9:50 [PATCH] fstests: add a new generic test case to verify direct io read Qu Wenruo
2025-11-01 15:57 ` Zorro Lang
2025-11-01 21:12   ` Qu Wenruo [this message]
2025-11-02 10:08     ` Zorro Lang
2025-11-02 21:00       ` Qu Wenruo
2025-11-04  6:30         ` Zorro Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9acb730d-6174-4ae6-a5d8-d1bca947462b@suse.com \
    --to=wqu@suse.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).