From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Cc: axboe@kernel.dk
Subject: Re: [PATCH v2] fstests: fsstress: wait interrupted aio to finish
Date: Tue, 22 Aug 2023 12:05:21 +0800 [thread overview]
Message-ID: <590727d0-5452-5469-711f-2b8cdff25719@oracle.com> (raw)
In-Reply-To: <20230821230129.31723-1-wqu@suse.com>
On 22/08/2023 07:01, Qu Wenruo wrote:
> [BUG]
> There is a very low chance to hit data csum mismatch (caught by scrub)
> during test case btrfs/06[234567].
>
> After some extra digging, it turns out that plain fsstress itself is
> enough to cause the problem:
>
> ```
> workload()
> {
> mkfs.btrfs -f -m single -d single --csum sha256 $dev1 > /dev/null
> mount $dev1 $mnt
>
> #$fsstress -p 10 -n 1000 -w -d $mnt
> umount $mnt
> btrfs check --check-data-csum $dev1 || fail
> }
>
> runtime=1024
> for (( i = 0; i < $runtime; i++ )); do
> echo "=== $i / $runtime ==="
> workload
> done
> ```
>
> Inside a VM which has only 6 cores, above script can trigger with 1/20
> possibility.
>
> [CAUSE]
> Locally I got a much smaller workload to reproduce:
>
> $fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt -v > /tmp/fsstress
>
> With extra kernel trace_prinkt() on the buffered/direct writes.
>
> It turns out that the following direct write is always the cause:
>
> btrfs_do_write_iter: r/i=5/283 buffered fileoff=708608(709121) len=12288(7712)
>
> btrfs_do_write_iter: r/i=5/283 direct fileoff=8192(8192) len=73728(73728) <<<<<
>
> btrfs_do_write_iter: r/i=5/283 direct fileoff=589824(589824) len=16384(16384)
>
> With the involved byte number, it's easy to pin down the fsstress
> opeartion:
>
> 0/31: writev d0/f3[285 2 0 0 296 1457078] [709121,8,964] 0
> 0/32: chown d0/f2 308134/1763236 0
>
> 0/33: do_aio_rw - xfsctl(XFS_IOC_DIOINFO) d0/f2[285 2 308134 1763236 320 1457078] return 25, fallback to stat()
> 0/33: awrite - io_getevents failed -4 <<<<
>
> 0/34: dwrite - xfsctl(XFS_IOC_DIOINFO) d0/f3[285 2 308134 1763236 320 1457078] return 25, fallback to stat()
>
> Note the 0/33, when the data csum mismatch triggered, it always fail
> with -4 (-EINTR).
>
> It looks like with lucky enough concurrency, we can get to the following
> situation inside fsstress:
>
> Process A | Process B
> -----------------------------------+---------------------------------------
> do_aio_rw() |
> |- io_sumit(); |
> |- io_get_events(); |
> | Returned -EINTR, but IO hasn't |
> | finished. |
> `- free(buf); | malloc();
> | Got the same memory of @buf from
> | thread A.
> | Modify the memory
> | Now the buffer is changed while
> | still under IO
>
> This is the typical buffer modification during direct IO, which is going
> to cause csum mismatch for btrfs, and btrfs properly detects it.
>
> This is the direct cause of the problem.
>
> The root cause is that, io_uring would use signals to handle
> submission/completion of IOs.
> Thus io_uring operations would interrupt AIO operations, thus causing
> the above problem.
>
> [FIX]
> To fix the problem, we can just retry io_getevents() so that we can
> properly wait for the IO.
>
> This prevents us to modify the IO buffer before writeback really
> finishes.
>
> With this fixes, I can no longer reproduce the data corruption.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Fix all call sites of io_getevents()
Should io_getevents() in aio-stress.c and fsx.c also be using
io_get_single_event()?
Thanks, Anand
> - Update the commit message to show the root cause
> Thanks a lot to Jens Axboe for pointing out the root problem.
> ---
> ltp/fsstress.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/ltp/fsstress.c b/ltp/fsstress.c
> index 6641a525..abe28742 100644
> --- a/ltp/fsstress.c
> +++ b/ltp/fsstress.c
> @@ -2072,6 +2072,22 @@ void inode_info(char *str, size_t sz, struct stat64 *s, int verbose)
> (long long) s->st_blocks, (long long) s->st_size);
> }
>
> +#ifdef AIO
> +static int io_get_single_event(struct io_event *event)
> +{
> + int ret;
> +
> + /*
> + * We can get -EINTR if competing with io_uring using signal
> + * based notifications. For that case, just retry the wait.
> + */
> + do {
> + ret = io_getevents(io_ctx, 1, 1, event, NULL);
> + } while (ret == -EINTR);
> + return ret;
> +}
> +#endif
> +
> void
> afsync_f(opnum_t opno, long r)
> {
> @@ -2111,7 +2127,7 @@ afsync_f(opnum_t opno, long r)
> close(fd);
> return;
> }
> - if ((e = io_getevents(io_ctx, 1, 1, &event, NULL)) != 1) {
> + if ((e = io_get_single_event(&event)) != 1) {
> if (v)
> printf("%d/%lld: afsync - io_getevents failed %d\n",
> procid, opno, e);
> @@ -2223,7 +2239,7 @@ do_aio_rw(opnum_t opno, long r, int flags)
> procid, opno, iswrite ? "awrite" : "aread", e);
> goto aio_out;
> }
> - if ((e = io_getevents(io_ctx, 1, 1, &event, NULL)) != 1) {
> + if ((e = io_get_single_event(&event)) != 1) {
> if (v)
> printf("%d/%lld: %s - io_getevents failed %d\n",
> procid, opno, iswrite ? "awrite" : "aread", e);
next prev parent reply other threads:[~2023-08-22 4:05 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 23:01 [PATCH v2] fstests: fsstress: wait interrupted aio to finish Qu Wenruo
2023-08-22 4:05 ` Anand Jain [this message]
2023-08-22 5:16 ` Qu Wenruo
2023-08-23 2:37 ` Anand Jain
2023-08-22 17:14 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=590727d0-5452-5469-711f-2b8cdff25719@oracle.com \
--to=anand.jain@oracle.com \
--cc=axboe@kernel.dk \
--cc=fstests@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox