* [PATCH v3 0/3] use shuf to choose a random file
@ 2023-08-21 7:12 Naohiro Aota
2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Naohiro Aota @ 2023-08-21 7:12 UTC (permalink / raw)
To: fstests; +Cc: linux-btrfs, Naohiro Aota
Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a
random file in a directory.It sorts the files with "ls", sort it randomly
and pick the first line, which wastes the "ls" sort.
Also, using "sort -R | head -n1" is inefficient. Furthermore, even without
"head" or "tail", "shuf" is faster than "sort -R".
This series introduces a new helper _random_file() to choose a file in a
directory randomly. Also, replace "sort -R" with _random_file() or "shuf".
Changes:
- v2
- Introduce _random_file() helper
- Rewrite other "sort -R" with _random_file() or "shuf"
- v3
- Fix _random_file() helper to add the base directory as prefix
Naohiro Aota (3):
common/rc: introduce _random_file() helper
fstests/btrfs: use _random_file() helper
btrfs/004: use shuf to shuffle the file lines
common/rc | 7 +++++++
tests/btrfs/004 | 2 +-
tests/btrfs/179 | 9 ++++-----
tests/btrfs/192 | 14 ++++----------
4 files changed, 16 insertions(+), 16 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v3 1/3] common/rc: introduce _random_file() helper 2023-08-21 7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota @ 2023-08-21 7:12 ` Naohiro Aota 2023-08-21 9:24 ` Anand Jain 2023-08-25 13:39 ` Zorro Lang 2023-08-21 7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota 2 siblings, 2 replies; 10+ messages in thread From: Naohiro Aota @ 2023-08-21 7:12 UTC (permalink / raw) To: fstests; +Cc: linux-btrfs, Naohiro Aota Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a random file in a directory.It sorts the files with "ls", sort it randomly and pick the first line, which wastes the "ls" sort. Also, using "sort -R | head -n1" is inefficient. For example, in a directory with 1000000 files, it takes more than 15 seconds to pick a file. $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total $ time bash -c "ls -U | shuf -n 1 >/dev/null" bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total So, we should just use "ls -U" and "shuf -n 1" to choose a random file. Introduce _random_file() helper to do it properly. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- common/rc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/common/rc b/common/rc index 5c4429ed0425..4d414955f6d9 100644 --- a/common/rc +++ b/common/rc @@ -5224,6 +5224,13 @@ _soak_loop_running() { return 0 } +# Return a random file in a directory. A directory is *not* followed +# recursively. +_random_file() { + local basedir=$1 + echo "$basedir/$(ls -U $basedir | shuf -n 1)" +} + init_rc ################################################################################ -- 2.41.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper 2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota @ 2023-08-21 9:24 ` Anand Jain 2023-08-25 13:39 ` Zorro Lang 1 sibling, 0 replies; 10+ messages in thread From: Anand Jain @ 2023-08-21 9:24 UTC (permalink / raw) To: Naohiro Aota, fstests; +Cc: linux-btrfs LGTM Reviewed-by: Anand Jain <anand.jain@oracle.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper 2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota 2023-08-21 9:24 ` Anand Jain @ 2023-08-25 13:39 ` Zorro Lang 2023-08-25 14:00 ` Zorro Lang 1 sibling, 1 reply; 10+ messages in thread From: Zorro Lang @ 2023-08-25 13:39 UTC (permalink / raw) To: Naohiro Aota; +Cc: fstests, linux-btrfs On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote: > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a > random file in a directory.It sorts the files with "ls", sort it randomly > and pick the first line, which wastes the "ls" sort. > > Also, using "sort -R | head -n1" is inefficient. For example, in a > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file. > Introduce _random_file() helper to do it properly. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > common/rc | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/common/rc b/common/rc > index 5c4429ed0425..4d414955f6d9 100644 > --- a/common/rc > +++ b/common/rc > @@ -5224,6 +5224,13 @@ _soak_loop_running() { > return 0 > } > > +# Return a random file in a directory. A directory is *not* followed > +# recursively. > +_random_file() { > + local basedir=$1 > + echo "$basedir/$(ls -U $basedir | shuf -n 1)" I think the "1" can be the second argument, for we might want to get a random file list sometimes. For example: local basedir=$1 local num=$2 local opt if [ -n "$num" ];then opt="-n $num" fi echo "$basedir/$(ls -U $basedir | shuf $opt)" What do you think? Thanks, Zorro > +} > + > init_rc > > ################################################################################ > -- > 2.41.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper 2023-08-25 13:39 ` Zorro Lang @ 2023-08-25 14:00 ` Zorro Lang 2023-08-28 1:37 ` Naohiro Aota 0 siblings, 1 reply; 10+ messages in thread From: Zorro Lang @ 2023-08-25 14:00 UTC (permalink / raw) To: Naohiro Aota; +Cc: fstests, linux-btrfs On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote: > On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote: > > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a > > random file in a directory.It sorts the files with "ls", sort it randomly > > and pick the first line, which wastes the "ls" sort. > > > > Also, using "sort -R | head -n1" is inefficient. For example, in a > > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file. > > Introduce _random_file() helper to do it properly. > > > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > --- > > common/rc | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/common/rc b/common/rc > > index 5c4429ed0425..4d414955f6d9 100644 > > --- a/common/rc > > +++ b/common/rc > > @@ -5224,6 +5224,13 @@ _soak_loop_running() { > > return 0 > > } > > > > +# Return a random file in a directory. A directory is *not* followed > > +# recursively. > > +_random_file() { > > + local basedir=$1 > > + echo "$basedir/$(ls -U $basedir | shuf -n 1)" > > I think the "1" can be the second argument, for we might want to get a random > file list sometimes. For example: > > local basedir=$1 > local num=$2 > local opt > > if [ -n "$num" ];then > opt="-n $num" > fi > echo "$basedir/$(ls -U $basedir | shuf $opt)" > > What do you think? Hmm... nack my review point. Looks like this makes a simple change to be complicated, especially multiple output lines. I'll merge this patchset at first, then we can support that second argument If we need that feature in one day. Or if you're interested in it. Thanks, Zorro > > Thanks, > Zorro > > > +} > > + > > init_rc > > > > ################################################################################ > > -- > > 2.41.0 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] common/rc: introduce _random_file() helper 2023-08-25 14:00 ` Zorro Lang @ 2023-08-28 1:37 ` Naohiro Aota 0 siblings, 0 replies; 10+ messages in thread From: Naohiro Aota @ 2023-08-28 1:37 UTC (permalink / raw) To: Zorro Lang; +Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org On Fri, Aug 25, 2023 at 10:00:09PM +0800, Zorro Lang wrote: > On Fri, Aug 25, 2023 at 09:39:48PM +0800, Zorro Lang wrote: > > On Mon, Aug 21, 2023 at 04:12:11PM +0900, Naohiro Aota wrote: > > > Currently, we use "ls ... | sort -R | head -n1" (or tail) to choose a > > > random file in a directory.It sorts the files with "ls", sort it randomly > > > and pick the first line, which wastes the "ls" sort. > > > > > > Also, using "sort -R | head -n1" is inefficient. For example, in a > > > directory with 1000000 files, it takes more than 15 seconds to pick a file. > > > > > > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null" > > > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total > > > > > > $ time bash -c "ls -U | shuf -n 1 >/dev/null" > > > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total > > > > > > So, we should just use "ls -U" and "shuf -n 1" to choose a random file. > > > Introduce _random_file() helper to do it properly. > > > > > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > > --- > > > common/rc | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/common/rc b/common/rc > > > index 5c4429ed0425..4d414955f6d9 100644 > > > --- a/common/rc > > > +++ b/common/rc > > > @@ -5224,6 +5224,13 @@ _soak_loop_running() { > > > return 0 > > > } > > > > > > +# Return a random file in a directory. A directory is *not* followed > > > +# recursively. > > > +_random_file() { > > > + local basedir=$1 > > > + echo "$basedir/$(ls -U $basedir | shuf -n 1)" > > > > I think the "1" can be the second argument, for we might want to get a random > > file list sometimes. For example: > > > > local basedir=$1 > > local num=$2 > > local opt > > > > if [ -n "$num" ];then > > opt="-n $num" > > fi > > echo "$basedir/$(ls -U $basedir | shuf $opt)" > > > > What do you think? > > Hmm... nack my review point. Looks like this makes a simple change to be > complicated, especially multiple output lines. I'll merge this patchset > at first, then we can support that second argument If we need that > feature in one day. Or if you're interested in it. Yeah, I think so too. Currently, we don't have "shuf -n N" usage where N > 1. Also, while there is a plain "shuf" usage, it is used with "find". So, I agree we can extent the function when we need it. > > Thanks, > Zorro > > > > > Thanks, > > Zorro > > > > > +} > > > + > > > init_rc > > > > > > ################################################################################ > > > -- > > > 2.41.0 > > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] fstests/btrfs: use _random_file() helper 2023-08-21 7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota @ 2023-08-21 7:12 ` Naohiro Aota 2023-08-21 9:25 ` Anand Jain 2023-08-21 7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota 2 siblings, 1 reply; 10+ messages in thread From: Naohiro Aota @ 2023-08-21 7:12 UTC (permalink / raw) To: fstests; +Cc: linux-btrfs, Naohiro Aota Use _random_file() helper to choose a random file in a directory. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- tests/btrfs/179 | 9 ++++----- tests/btrfs/192 | 14 ++++---------- 2 files changed, 8 insertions(+), 15 deletions(-) diff --git a/tests/btrfs/179 b/tests/btrfs/179 index 2f17c9f9fb4a..479667f05fd2 100755 --- a/tests/btrfs/179 +++ b/tests/btrfs/179 @@ -45,8 +45,8 @@ fill_workload() # Randomly remove some files for every 5 loop if [ $(( $i % 5 )) -eq 0 ]; then - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1) - rm "$SCRATCH_MNT/src/$victim" + victim=$(_random_file "$SCRATCH_MNT/src") + rm "$victim" fi i=$((i + 1)) done @@ -69,13 +69,12 @@ delete_workload() trap "wait; exit" SIGTERM while true; do sleep $((sleep_time * 2)) - victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1) + victim=$(_random_file "$SCRATCH_MNT/snapshots") if [ -z "$victim" ]; then # No snapshots available, sleep and retry later. continue fi - $BTRFS_UTIL_PROG subvolume delete \ - "$SCRATCH_MNT/snapshots/$victim" > /dev/null + $BTRFS_UTIL_PROG subvolume delete "$victim" > /dev/null done } diff --git a/tests/btrfs/192 b/tests/btrfs/192 index bcf14ebb8e3b..7324c9e39833 100755 --- a/tests/btrfs/192 +++ b/tests/btrfs/192 @@ -69,12 +69,6 @@ $BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/src > /dev/null mkdir -p $SCRATCH_MNT/snapshots mkdir -p $SCRATCH_MNT/src/padding -random_file() -{ - local basedir=$1 - echo "$basedir/$(ls $basedir | sort -R | tail -1)" -} - snapshot_workload() { trap "wait; exit" SIGTERM @@ -85,9 +79,9 @@ snapshot_workload() $SCRATCH_MNT/src $SCRATCH_MNT/snapshots/$i \ > /dev/null # Do something small to make snapshots different - rm -f "$(random_file $SCRATCH_MNT/src/padding)" - rm -f "$(random_file $SCRATCH_MNT/src/padding)" - touch "$(random_file $SCRATCH_MNT/src/padding)" + rm -f "$(_random_file $SCRATCH_MNT/src/padding)" + rm -f "$(_random_file $SCRATCH_MNT/src/padding)" + touch "$(_random_file $SCRATCH_MNT/src/padding)" touch "$SCRATCH_MNT/src/padding/random_$RANDOM" i=$(($i + 1)) @@ -102,7 +96,7 @@ delete_workload() while true; do sleep 2 $BTRFS_UTIL_PROG subvolume delete \ - "$(random_file $SCRATCH_MNT/snapshots)" \ + "$(_random_file $SCRATCH_MNT/snapshots)" \ > /dev/null 2>&1 done } -- 2.41.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/3] fstests/btrfs: use _random_file() helper 2023-08-21 7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota @ 2023-08-21 9:25 ` Anand Jain 0 siblings, 0 replies; 10+ messages in thread From: Anand Jain @ 2023-08-21 9:25 UTC (permalink / raw) To: Naohiro Aota, fstests; +Cc: linux-btrfs LGTM Reviewed-by: Anand Jain <anand.jain@oracle.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines 2023-08-21 7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota @ 2023-08-21 7:12 ` Naohiro Aota 2023-08-21 9:25 ` Anand Jain 2 siblings, 1 reply; 10+ messages in thread From: Naohiro Aota @ 2023-08-21 7:12 UTC (permalink / raw) To: fstests; +Cc: linux-btrfs, Naohiro Aota The "sort -R" is slower than "shuf" even with the full output because "sort -R" actually sort them to group the identical keys. $ time bash -c "seq 1000000 | shuf >/dev/null" bash -c "seq 1000000 | shuf >/dev/null" 0.18s user 0.03s system 104% cpu 0.196 total $ time bash -c "seq 1000000 | sort -R >/dev/null" bash -c "seq 1000000 | sort -R >/dev/null" 19.61s user 0.03s system 99% cpu 19.739 total Since the "find"'s outputs never be identical, we can just use "shuf" to optimize the selection. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- tests/btrfs/004 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/btrfs/004 b/tests/btrfs/004 index ea40dbf62880..78df6a3af6b1 100755 --- a/tests/btrfs/004 +++ b/tests/btrfs/004 @@ -201,7 +201,7 @@ workout() cnt=0 errcnt=0 dir="$SCRATCH_MNT/$snap_name/" - for file in `find $dir -name f\* -size +0 | sort -R`; do + for file in `find $dir -name f\* -size +0 | shuf`; do extents=`_check_file_extents $file` ret=$? if [ $ret -ne 0 ]; then -- 2.41.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines 2023-08-21 7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota @ 2023-08-21 9:25 ` Anand Jain 0 siblings, 0 replies; 10+ messages in thread From: Anand Jain @ 2023-08-21 9:25 UTC (permalink / raw) To: Naohiro Aota, fstests; +Cc: linux-btrfs LGTM Reviewed-by: Anand Jain <anand.jain@oracle.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-08-28 1:38 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-21 7:12 [PATCH v3 0/3] use shuf to choose a random file Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 1/3] common/rc: introduce _random_file() helper Naohiro Aota 2023-08-21 9:24 ` Anand Jain 2023-08-25 13:39 ` Zorro Lang 2023-08-25 14:00 ` Zorro Lang 2023-08-28 1:37 ` Naohiro Aota 2023-08-21 7:12 ` [PATCH v3 2/3] fstests/btrfs: use " Naohiro Aota 2023-08-21 9:25 ` Anand Jain 2023-08-21 7:12 ` [PATCH v3 3/3] btrfs/004: use shuf to shuffle the file lines Naohiro Aota 2023-08-21 9:25 ` Anand Jain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox