FS/XFS testing framework
 help / color / mirror / Atom feed
* [PATCH v3 0/3] use shuf to choose a random file
@ 2023-08-21  7:12 Naohiro Aota
  2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Naohiro Aota @ 2023-08-21  7:12 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Naohiro Aota

Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
random file in a directory.It sorts the files with "ls", sort it randomly
and pick the first line, which wastes the "ls" sort.

Also, using "sort -R | head -n1" is inefficient. Furthermore, even without
"head" or "tail", "shuf" is faster than "sort -R".

This series introduces a new helper _random_file() to choose a file in a
directory randomly. Also, replace "sort -R" with _random_file() or "shuf".

Changes:
- v2
  - Introduce _random_file() helper
  - Rewrite other "sort -R" with _random_file() or "shuf"
- v3
  - Fix _random_file() helper to add the base directory as prefix

Naohiro Aota (3):
  common/rc: introduce _random_file() helper
  fstests/btrfs: use _random_file() helper
  btrfs/004: use shuf to shuffle the file lines

 common/rc       |  7 +++++++
 tests/btrfs/004 |  2 +-
 tests/btrfs/179 |  9 ++++-----
 tests/btrfs/192 | 14 ++++----------
 4 files changed, 16 insertions(+), 16 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] common/rc: introduce _random_file() helper
  2023-08-21  7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota
@ 2023-08-21  7:12 ` Naohiro Aota
  2023-08-21  9:24   ` Anand Jain
  2023-08-25 13:39   ` Zorro Lang
  2023-08-21  7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota
  2023-08-21  7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota
  2 siblings, 2 replies; 10+ messages in thread
From: Naohiro Aota @ 2023-08-21  7:12 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Naohiro Aota

Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
random file in a directory.It sorts the files with "ls", sort it randomly
and pick the first line, which wastes the "ls" sort.

Also, using "sort -R | head -n1" is inefficient. For example, in a
directory with 1000000 files, it takes more than 15 seconds to pick a file.

  $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
  bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total

  $ time bash -c "ls -U | shuf -n 1 >/dev/null"
  bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total

So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
Introduce _random_file() helper to do it properly.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 common/rc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/common/rc b/common/rc
index 5c4429ed0425..4d414955f6d9 100644
--- a/common/rc
+++ b/common/rc
@@ -5224,6 +5224,13 @@ _soak_loop_running() {
 	return 0
 }
 
+# Return a random file in a directory. A directory is *not* followed
+# recursively.
+_random_file() {
+	local basedir=$1
+	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
+}
+
 init_rc
 
 ################################################################################
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] fstests/btrfs: use _random_file() helper
  2023-08-21  7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota
  2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
@ 2023-08-21  7:12 ` Naohiro Aota
  2023-08-21  9:25   ` Anand Jain
  2023-08-21  7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota
  2 siblings, 1 reply; 10+ messages in thread
From: Naohiro Aota @ 2023-08-21  7:12 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Naohiro Aota

Use _random_file() helper to choose a random file in a directory.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 tests/btrfs/179 |  9 ++++-----
 tests/btrfs/192 | 14 ++++----------
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/tests/btrfs/179 b/tests/btrfs/179
index 2f17c9f9fb4a..479667f05fd2 100755
--- a/tests/btrfs/179
+++ b/tests/btrfs/179
@@ -45,8 +45,8 @@ fill_workload()
 
 		# Randomly remove some files for every 5 loop
 		if [ $(( $i % 5 )) -eq 0 ]; then
-			victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
-			rm "$SCRATCH_MNT/src/$victim"
+			victim=$(_random_file "$SCRATCH_MNT/src")
+			rm "$victim"
 		fi
 		i=$((i + 1))
 	done
@@ -69,13 +69,12 @@ delete_workload()
 	trap "wait; exit" SIGTERM
 	while true; do
 		sleep $((sleep_time * 2))
-		victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1)
+		victim=$(_random_file "$SCRATCH_MNT/snapshots")
 		if [ -z "$victim" ]; then
 			# No snapshots available, sleep and retry later.
 			continue
 		fi
-		$BTRFS_UTIL_PROG subvolume delete \
-			"$SCRATCH_MNT/snapshots/$victim" > /dev/null
+		$BTRFS_UTIL_PROG subvolume delete "$victim" > /dev/null
 	done
 }
 
diff --git a/tests/btrfs/192 b/tests/btrfs/192
index bcf14ebb8e3b..7324c9e39833 100755
--- a/tests/btrfs/192
+++ b/tests/btrfs/192
@@ -69,12 +69,6 @@ $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null
 mkdir -p $SCRATCH_MNT/snapshots
 mkdir -p $SCRATCH_MNT/src/padding
 
-random_file()
-{
-	local basedir=$1
-	echo "$basedir/$(ls $basedir | sort -R | tail -1)"
-}
-
 snapshot_workload()
 {
 	trap "wait; exit" SIGTERM
@@ -85,9 +79,9 @@ snapshot_workload()
 			$SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \
 			> /dev/null
 		# Do something small to make snapshots different
-		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
-		rm -f "$(random_file $SCRATCH_MNT/src/padding)"
-		touch "$(random_file $SCRATCH_MNT/src/padding)"
+		rm -f "$(_random_file $SCRATCH_MNT/src/padding)"
+		rm -f "$(_random_file $SCRATCH_MNT/src/padding)"
+		touch "$(_random_file $SCRATCH_MNT/src/padding)"
 		touch "$SCRATCH_MNT/src/padding/random_$RANDOM"
 
 		i=$(($i + 1))
@@ -102,7 +96,7 @@ delete_workload()
 	while true; do
 		sleep 2
 		$BTRFS_UTIL_PROG subvolume delete \
-			"$(random_file $SCRATCH_MNT/snapshots)" \
+			"$(_random_file $SCRATCH_MNT/snapshots)" \
 			> /dev/null 2>&1
 	done
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines
  2023-08-21  7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota
  2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
  2023-08-21  7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota
@ 2023-08-21  7:12 ` Naohiro Aota
  2023-08-21  9:25   ` Anand Jain
  2 siblings, 1 reply; 10+ messages in thread
From: Naohiro Aota @ 2023-08-21  7:12 UTC (permalink / raw)
  To: fstests; +Cc: linux-btrfs, Naohiro Aota

The "sort -R" is slower than "shuf" even with the full output because
"sort -R" actually sort them to group the identical keys.

  $ time bash -c "seq 1000000 | shuf >/dev/null"
  bash -c "seq 1000000 | shuf >/dev/null"  0.18s user 0.03s system 104% cpu 0.196 total

  $ time bash -c "seq 1000000 | sort -R >/dev/null"
  bash -c "seq 1000000 | sort -R >/dev/null"  19.61s user 0.03s system 99% cpu 19.739 total

Since the "find"'s outputs never be identical, we can just use "shuf" to
optimize the selection.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 tests/btrfs/004 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/btrfs/004 b/tests/btrfs/004
index ea40dbf62880..78df6a3af6b1 100755
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -201,7 +201,7 @@ workout()
 	cnt=0
 	errcnt=0
 	dir="$SCRATCH_MNT/$snap_name/"
-	for file in `find $dir -name f\* -size +0 | sort -R`; do
+	for file in `find $dir -name f\* -size +0 | shuf`; do
 		extents=`_check_file_extents $file`
 		ret=$?
 		if [ $ret -ne 0 ]; then
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper
  2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
@ 2023-08-21  9:24   ` Anand Jain
  2023-08-25 13:39   ` Zorro Lang
  1 sibling, 0 replies; 10+ messages in thread
From: Anand Jain @ 2023-08-21  9:24 UTC (permalink / raw)
  To: Naohiro Aota, fstests; +Cc: linux-btrfs

LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] fstests/btrfs: use _random_file() helper
  2023-08-21  7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota
@ 2023-08-21  9:25   ` Anand Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2023-08-21  9:25 UTC (permalink / raw)
  To: Naohiro Aota, fstests; +Cc: linux-btrfs

LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines
  2023-08-21  7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota
@ 2023-08-21  9:25   ` Anand Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2023-08-21  9:25 UTC (permalink / raw)
  To: Naohiro Aota, fstests; +Cc: linux-btrfs

LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper
  2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
  2023-08-21  9:24   ` Anand Jain
@ 2023-08-25 13:39   ` Zorro Lang
  2023-08-25 14:00     ` Zorro Lang
  1 sibling, 1 reply; 10+ messages in thread
From: Zorro Lang @ 2023-08-25 13:39 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: fstests, linux-btrfs

On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> random file in a directory.It sorts the files with "ls", sort it randomly
> and pick the first line, which wastes the "ls" sort.
> 
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
> 
>   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
>   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> 
>   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
>   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> 
> So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> Introduce _random_file() helper to do it properly.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  common/rc | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/common/rc b/common/rc
> index 5c4429ed0425..4d414955f6d9 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -5224,6 +5224,13 @@ _soak_loop_running() {
>  	return 0
>  }
>  
> +# Return a random file in a directory. A directory is *not* followed
> +# recursively.
> +_random_file() {
> +	local basedir=$1
> +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"

I think the "1" can be the second argument, for we might want to get a random
file list sometimes. For example:

  local basedir=$1
  local num=$2
  local opt

  if [ -n "$num" ];then
	  opt="-n $num"
  fi
  echo "$basedir/$(ls -U $basedir | shuf $opt)"

What do you think?

Thanks,
Zorro

> +}
> +
>  init_rc
>  
>  ################################################################################
> -- 
> 2.41.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper
  2023-08-25 13:39   ` Zorro Lang
@ 2023-08-25 14:00     ` Zorro Lang
  2023-08-28  1:37       ` Naohiro Aota
  0 siblings, 1 reply; 10+ messages in thread
From: Zorro Lang @ 2023-08-25 14:00 UTC (permalink / raw)
  To: Naohiro Aota; +Cc: fstests, linux-btrfs

On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote:
> On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> > random file in a directory.It sorts the files with "ls", sort it randomly
> > and pick the first line, which wastes the "ls" sort.
> > 
> > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > 
> >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > 
> >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > 
> > So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> > Introduce _random_file() helper to do it properly.
> > 
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> >  common/rc | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/common/rc b/common/rc
> > index 5c4429ed0425..4d414955f6d9 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -5224,6 +5224,13 @@ _soak_loop_running() {
> >  	return 0
> >  }
> >  
> > +# Return a random file in a directory. A directory is *not* followed
> > +# recursively.
> > +_random_file() {
> > +	local basedir=$1
> > +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
> 
> I think the "1" can be the second argument, for we might want to get a random
> file list sometimes. For example:
> 
>   local basedir=$1
>   local num=$2
>   local opt
> 
>   if [ -n "$num" ];then
> 	  opt="-n $num"
>   fi
>   echo "$basedir/$(ls -U $basedir | shuf $opt)"
> 
> What do you think?

Hmm... nack my review point. Looks like this makes a simple change to be
complicated, especially multiple output lines. I'll merge this patchset
at first, then we can support that second argument If we need that
feature in one day. Or if you're interested in it.

Thanks,
Zorro

> 
> Thanks,
> Zorro
> 
> > +}
> > +
> >  init_rc
> >  
> >  ################################################################################
> > -- 
> > 2.41.0
> > 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper
  2023-08-25 14:00     ` Zorro Lang
@ 2023-08-28  1:37       ` Naohiro Aota
  0 siblings, 0 replies; 10+ messages in thread
From: Naohiro Aota @ 2023-08-28  1:37 UTC (permalink / raw)
  To: Zorro Lang; +Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org

On Fri, Aug 25, 2023 at 10:00:09PM +0800, Zorro Lang wrote:
> On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote:
> > On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote:
> > > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
> > > random file in a directory.It sorts the files with "ls", sort it randomly
> > > and pick the first line, which wastes the "ls" sort.
> > > 
> > > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> > > 
> > >   $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> > >   bash -c "ls -U | sort -R | head -n 1 >/dev/null"  15.38s user 0.14s system 99% cpu 15.536 total
> > > 
> > >   $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> > >   bash -c "ls -U | shuf -n 1 >/dev/null"  0.30s user 0.12s system 138% cpu 0.306 total
> > > 
> > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file.
> > > Introduce _random_file() helper to do it properly.
> > > 
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > >  common/rc | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 5c4429ed0425..4d414955f6d9 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -5224,6 +5224,13 @@ _soak_loop_running() {
> > >  	return 0
> > >  }
> > >  
> > > +# Return a random file in a directory. A directory is *not* followed
> > > +# recursively.
> > > +_random_file() {
> > > +	local basedir=$1
> > > +	echo "$basedir/$(ls -U $basedir | shuf -n 1)"
> > 
> > I think the "1" can be the second argument, for we might want to get a random
> > file list sometimes. For example:
> > 
> >   local basedir=$1
> >   local num=$2
> >   local opt
> > 
> >   if [ -n "$num" ];then
> > 	  opt="-n $num"
> >   fi
> >   echo "$basedir/$(ls -U $basedir | shuf $opt)"
> > 
> > What do you think?
> 
> Hmm... nack my review point. Looks like this makes a simple change to be
> complicated, especially multiple output lines. I'll merge this patchset
> at first, then we can support that second argument If we need that
> feature in one day. Or if you're interested in it.

Yeah, I think so too. Currently, we don't have "shuf -n N" usage where N >
1. Also, while there is a plain "shuf" usage, it is used with "find".

So, I agree we can extent the function when we need it.

> 
> Thanks,
> Zorro
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +}
> > > +
> > >  init_rc
> > >  
> > >  ################################################################################
> > > -- 
> > > 2.41.0
> > > 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-28  1:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-21  7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota
2023-08-21  7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
2023-08-21  9:24   ` Anand Jain
2023-08-25 13:39   ` Zorro Lang
2023-08-25 14:00     ` Zorro Lang
2023-08-28  1:37       ` Naohiro Aota
2023-08-21  7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota
2023-08-21  9:25   ` Anand Jain
2023-08-21  7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota
2023-08-21  9:25   ` Anand Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox