Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org, fstests@vger.kernel.org
Subject: [PATCH] fstests: fsstress: wait interrupted aio to finish
Date: Sun, 20 Aug 2023 09:02:19 +0800	[thread overview]
Message-ID: <20230820010219.12907-1-wqu@suse.com> (raw)

[BUG]
There is a very low chance to hit data csum mismatch (caught by scrub)
during test case btrfs/06[234567].

After some extra digging, it turns out that plain fsstress itself is
enough to cause the problem:

```
workload()
{
	mkfs.btrfs -f -m single -d single --csum sha256 $dev1 > /dev/null
	mount $dev1 $mnt

	#$fsstress -p 10 -n 1000 -w -d $mnt
	umount $mnt
	btrfs check --check-data-csum $dev1 || fail
}

runtime=1024
for (( i = 0; i < $runtime; i++ )); do
	echo "=== $i / $runtime ==="
	workload
done
```

Inside a VM which has only 6 cores, above script can trigger with 1/20
possibility.

[CAUSE]
Locally I got a much smaller workload to reproduce:

	$fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt -v > /tmp/fsstress

With extra kernel trace_prinkt() on the buffered/direct writes.

It turns out that the following direct write is always the cause:

  btrfs_do_write_iter: r/i=5/283 buffered fileoff=708608(709121) len=12288(7712)

  btrfs_do_write_iter: r/i=5/283 direct fileoff=8192(8192) len=73728(73728) <<<<<

  btrfs_do_write_iter: r/i=5/283 direct fileoff=589824(589824) len=16384(16384)

With the involved byte number, it's easy to pin down the fsstress
opeartion:

 0/31: writev d0/f3[285 2 0 0 296 1457078] [709121,8,964] 0
 0/32: chown d0/f2 308134/1763236 0

 0/33: do_aio_rw - xfsctl(XFS_IOC_DIOINFO) d0/f2[285 2 308134 1763236 320 1457078] return 25, fallback to stat()
 0/33: awrite - io_getevents failed -4 <<<<

 0/34: dwrite - xfsctl(XFS_IOC_DIOINFO) d0/f3[285 2 308134 1763236 320 1457078] return 25, fallback to stat()

Note the 0/33, when the data csum mismatch triggered, it always fail
with -4 (-EINTR).

It looks like with lucky enough concurrency, we can get to the following
situation inside fsstress:

          Process A                 |               Process B
 -----------------------------------+---------------------------------------
 do_aio_rw()                        |
 |- io_sumit();                     |
 |- io_get_events();                |
 |  Returned -EINTR, but IO hasn't  |
 |  finished.                       |
 `- free(buf);                      | malloc();
                                    |  Got the same memory of @buf from
                                    |  thread A.
                                    | Modify the memory
                                    | Now the buffer is changed while
                                    | still under IO

This is the typical buffer modification during direct IO, which is going
to cause csum mismatch for btrfs, and btrfs properly detects it.

This is the direct cause of the problem.

However I'm still not 100% sure on why we got -EINTR for io_getevents().
My previous tests shows that if I disable uring_write, then no more such
data corruption, thus I guess io_uring has something affecting aio?

[FIX]
To fix the problem, we can just retry io_getevents() so that we can
properly wait for the IO.

This prevents us to modify the IO buffer before writeback really
finishes.

With this fixes, I can no longer reproduce the data corruption.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 ltp/fsstress.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index 6641a525..f9f132bf 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -2223,7 +2223,15 @@ do_aio_rw(opnum_t opno, long r, int flags)
 			       procid, opno, iswrite ? "awrite" : "aread", e);
 		goto aio_out;
 	}
-	if ((e = io_getevents(io_ctx, 1, 1, &event, NULL)) != 1) {
+	do {
+		e = io_getevents(io_ctx, 1, 1, &event, NULL);
+		if (e == -EINTR) {
+			if (v)
+				printf("%d/%lld: %s - io_getevents interruped %d, retrying\n",
+				       procid, opno, iswrite ? "awrite" : "aread", e);
+		}
+	} while (e == -EINTR);
+	if (e != 1) {
 		if (v)
 			printf("%d/%lld: %s - io_getevents failed %d\n",
 			       procid, opno, iswrite ? "awrite" : "aread", e);
-- 
2.41.0


             reply	other threads:[~2023-08-20  2:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-20  1:02 Qu Wenruo [this message]
2023-08-21  8:45 ` [PATCH] fstests: fsstress: wait interrupted aio to finish Anand Jain
2023-08-21  9:26   ` Qu Wenruo
2023-08-21  9:37     ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230820010219.12907-1-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox