Re: [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Alexander Atanasov <alex@zazolabs.com>
Cc: Shuah Khan <shuah@kernel.org>,
	linux-block@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh
Date: Sun, 25 Jan 2026 23:28:41 +0800	[thread overview]
Message-ID: <aXY2qdYtfVTSc7jB@fedora> (raw)
In-Reply-To: <03C8F5B9-B2E1-4C78-A043-B4F9422508B7@zazolabs.com>

On Fri, Jan 23, 2026 at 05:00:33PM +0200, Alexander Atanasov wrote:
> 
> 
> > On 23 Jan 2026, at 16:33, Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > On Fri, Jan 23, 2026 at 03:59:31PM +0200, Alexander Atanasov wrote:
> >> On 23 Jan 2026, at 15:33, Ming Lei <ming.lei@redhat.com> wrote:
> >>> 
> >>> On Fri, Jan 23, 2026 at 11:20:36AM +0000, Alexander Atanasov wrote:
> >>>> Create a temp dir for temporary files and use it instead of
> >>>> placing them inside source tree.
> >>> 
> >>> Many temporary files are backing files of file storage target, so far
> >>> the code requires O_DIRECT, or the size could be a bit big.
> >>> 
> >>> In case of ramfs/tmpfs of temp dir, it may cause problem for tests.
> >>> 
> >> 
> >> I am aware of O_DIRECT problem but you can export different TMPDIR that has working O_DIRECT.
> > 
> > Can you share how to export TMPDIR capable of O_DIRECT?
> 
> 
> +export TMPDIR=$(mktemp -d ${TMPDIR:-/tmp}/ublktest-dir.XXXXXX)
> 
> I made the tests to run in own TMPDIR. Which is under already set TMPDIR or
> if TMPDIR is not set it is defaults to  /tmp.
> 
> export TMPDIR=/path/to/odirect/capable
> Before running and tests will run in:
> /path/to/odirect/capable/ublktest-dir.XXXXXX
> 
> 
> > 
> >> 
> >> I use sshfs mount of the build to run the tests and that is a problem sshfs/fuse does not
> >> do O_DIRECT too.
> >> 
> >> I think test_generic_06.sh is the only one that fails due to this(thou I still have to investigate).
> >> 
> >> If O_DIRECT is required by the tests it may be possible to go thru a RAM disk which does support it,
> >> so it works eveerywhere
> >> 
> >> Other option is to preserve working in source tree as it is now, and just add a variable to specify working directory -
> >> UBLK_TMPDIR or something.
> >> 
> >> 
> >> I get a lot of out of order io - between 0 and 10 on average on my test setup:
> >> tools/testing/selftests/ublk/test_generic_01.sh 
> >> Attached 3 probes
> >> io_out_of_order: exp 564688 actual 564648
> >> io_out_of_order: exp 564648 actual 565584
> >> io_out_of_order: exp 565584 actual 564688
> >> io_out_of_order: exp 565592 actual 564688
> >> io_out_of_order: exp 566328 actual 565592
> >> io_out_of_order: exp 882256 actual 882248
> >> io_out_of_order: exp 883032 actual 882912
> >> io_out_of_order: exp 882912 actual 883040
> >> io_out_of_order: exp 883040 actual 883032
> >> 
> >> 
> >> generic_01 : [FAIL]
> >> 
> >> All rq-s are there just reordered , AFAIK blk-mq does not guarantee that requests will be completed in order, what’s the idea to catch this and
> > 
> > If there is just 0 ~ 10, it could be fine. But if all are reorderd,
> > something must be wrong. One improvement could be check if there is too
> > many reorder...
> > 
> > Actually what I am trying to test is to make sure same order is observed
> > from both ublk driver dispatch code path and ublk target io handling code
> > path, because io_uring task work schedule uses llist, which may introduce io
> > reorder.
> 
> There are for sure other places where a reordering can be introduced, so the code should be ready and expecting 
> It. (For my case see bellow) Is preserving the order required for some reason for ublk?
> 
> > 
> > However, that involves ublk kprobe/kfunc trace, which may not be stable,
> > so I simply check the end-to-end IO order. Sometimes blk-mq IO queue/dispatch
> > may re-order IO.
> > 
> > I guess the following change may avoid the re-order, but batch IO case may
> > not be covered:
> > 
> > diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
> > index 21a31cd5491a..5805da4c84c5 100755
> > --- a/tools/testing/selftests/ublk/test_generic_01.sh
> > +++ b/tools/testing/selftests/ublk/test_generic_01.sh
> > @@ -29,14 +29,8 @@ if ! kill -0 "$btrace_pid" > /dev/null 2>&1; then
> >        exit "$UBLK_SKIP_CODE"
> > fi
> > 
> > -# run fio over this ublk disk
> > -fio --name=write_seq \
> > -    --filename=/dev/ublkb"${dev_id}" \
> > -    --ioengine=libaio --iodepth=16 \
> > -    --rw=write \
> > -    --size=512M \
> > -    --direct=1 \
> > -    --bs=4k > /dev/null 2>&1
> > +taskset -c 0 dd if=/dev/zero of=/dev/ublkb"${dev_id}" bs=1M count=256 oflag=direct > /dev/null 2>&1
> > +
> > 
> > 
> >> consider it an error? (Latest tree with batch io and batch io fixes on top of if that matters)
> > 
> > Never observe generic_01 failure in my test VM and hardware.
> > 
> > My kernel config is based on Fedora, maybe scheduler config option makes the difference.
> 
> Fedora 43 default config with some debugging options enabled, but no changes in schedulers.
> Test VM storage is on a networked NAS over iSCSI - both boxes VM host and NAS have two NICs,
> I get the errors when I load the network. So I believe the requests really complete out of 
> order due to the network in my case. All tests that have the bpftrace check fail on occasion.

Can you test the following patch and see if re-order still can happen? 


diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
index 26cf3c7ceeb5..26d5e52ece29 100755
--- a/tools/testing/selftests/ublk/test_generic_01.sh
+++ b/tools/testing/selftests/ublk/test_generic_01.sh
@@ -13,7 +13,7 @@ if ! _have_program fio; then
 	exit "$UBLK_SKIP_CODE"
 fi
 
-_prep_test "null" "sequential io order"
+_prep_test "null" "ublk dispatch won't reorder IO"
 
 dev_id=$(_add_ublk_dev -t null)
 _check_add_dev $TID $?
@@ -39,9 +39,13 @@ fio --name=write_seq \
 ERR_CODE=$?
 kill "$btrace_pid"
 wait
-if grep -q "io_out_of_order" "$UBLK_TMP"; then
-	cat "$UBLK_TMP"
+
+# Check for out-of-order completions detected by bpftrace
+if grep -q "^out_of_order:" "$UBLK_TMP"; then
+	echo "I/O reordering detected:"
+	grep "^out_of_order:" "$UBLK_TMP"
 	ERR_CODE=255
 fi
+
 _cleanup_test "null"
 _show_result $TID $ERR_CODE
diff --git a/tools/testing/selftests/ublk/trace/seq_io.bt b/tools/testing/selftests/ublk/trace/seq_io.bt
index b2f60a92b118..60ac40e66606 100644
--- a/tools/testing/selftests/ublk/trace/seq_io.bt
+++ b/tools/testing/selftests/ublk/trace/seq_io.bt
@@ -2,23 +2,45 @@
 	$1: 	dev_t
 	$2: 	RWBS
 	$3:     strlen($2)
+
+	Track request order between block_io_start and block_rq_complete.
+	For each request, record its start sequence number and verify
+	completions happen in the same order.
 */
+
 BEGIN {
-	@last_rw[$1, str($2)] = (uint64)0;
+	@start_seq = (uint64)0;
+	@complete_seq = (uint64)0;
+	@out_of_order = (uint64)0;
+}
+
+tracepoint:block:block_io_start
+{
+	if ((int64)args.dev == $1 && !strncmp(args.rwbs, str($2), $3)) {
+		@start_order[args.sector] = @start_seq;
+		@start_seq = @start_seq + 1;
+	}
 }
+
 tracepoint:block:block_rq_complete
 {
-	$dev = $1;
 	if ((int64)args.dev == $1 && !strncmp(args.rwbs, str($2), $3)) {
-		$last = @last_rw[$dev, str($2)];
-		if ((uint64)args.sector != $last) {
-			printf("io_out_of_order: exp %llu actual %llu\n",
-				args.sector, $last);
+		$expected_order = @start_order[args.sector];
+		if ($expected_order != @complete_seq) {
+			printf("out_of_order: sector %llu started at seq %llu but completed at seq %llu\n",
+				args.sector, $expected_order, @complete_seq);
+			@out_of_order = @out_of_order + 1;
 		}
-		@last_rw[$dev, str($2)] = (args.sector + args.nr_sector);
+		delete(@start_order[args.sector]);
+		@complete_seq = @complete_seq + 1;
 	}
 }
 
 END {
-	clear(@last_rw);
+	printf("total_start: %llu total_complete: %llu out_of_order: %llu\n",
+		@start_seq, @complete_seq, @out_of_order);
+	clear(@start_order);
+	clear(@start_seq);
+	clear(@complete_seq);
+	clear(@out_of_order);
 }

Thanks,
Ming

next prev parent reply	other threads:[~2026-01-25 15:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23 11:20 [PATCH] selftests: ublk: use tmpdir for scratch files and improve relative paths use Alexander Atanasov
2026-01-23 13:33 ` Ming Lei
2026-01-23 13:59   ` Alexander Atanasov
2026-01-23 14:33     ` Ming Lei
2026-01-23 15:00       ` Alexander Atanasov
2026-01-25 15:28         ` Ming Lei [this message]
2026-01-25 15:34           ` [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh Ming Lei
2026-01-25 18:35           ` Alexander Atanasov
2026-01-26  1:27             ` Ming Lei
2026-01-26  8:33               ` Alexander Atanasov

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:26cf3c7ceeb dfblob:26d5e52ece2 dfblob:b2f60a92b11
dfblob:60ac40e6660 )
 OR (
bs:"Re: [PATCH] selftests: ublk: io-reorder triggered in test_generic_01.sh" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aXY2qdYtfVTSc7jB@fedora \
    --to=ming.lei@redhat.com \
    --cc=alex@zazolabs.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.