* [PATCH] btrfs/179: optimize remove file selection
@ 2023-08-17 5:13 Naohiro Aota
2023-08-17 12:58 ` Josef Bacik
2023-08-18 19:35 ` Zorro Lang
0 siblings, 2 replies; 4+ messages in thread
From: Naohiro Aota @ 2023-08-17 5:13 UTC (permalink / raw)
To: fstests; +Cc: linux-btrfs, Naohiro Aota
Currently, we use "ls ... | sort -R | head -n1" to choose a removing
victim. It sorts the files with "ls", sort it randomly and pick the first
line, which wastes the "ls" sort.
Also, using "sort -R | head -n1" is inefficient. For example, in a
directory with 1000000 files, it takes more than 15 seconds to pick a file.
$ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total
$ time bash -c "ls -U | shuf -n 1 >/dev/null"
bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total
So, just use "ls -U" and "shuf -n 1" to choose a victim.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
tests/btrfs/179 | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tests/btrfs/179 b/tests/btrfs/179
index 2f17c9f9fb4a..0fbd875cf01b 100755
--- a/tests/btrfs/179
+++ b/tests/btrfs/179
@@ -45,7 +45,7 @@ fill_workload()
# Randomly remove some files for every 5 loop
if [ $(( $i % 5 )) -eq 0 ]; then
- victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
+ victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)
rm "$SCRATCH_MNT/src/$victim"
fi
i=$((i + 1))
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs/179: optimize remove file selection
2023-08-17 5:13 [PATCH] btrfs/179: optimize remove file selection Naohiro Aota
@ 2023-08-17 12:58 ` Josef Bacik
2023-08-18 19:35 ` Zorro Lang
1 sibling, 0 replies; 4+ messages in thread
From: Josef Bacik @ 2023-08-17 12:58 UTC (permalink / raw)
To: Naohiro Aota; +Cc: fstests, linux-btrfs
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> victim. It sorts the files with "ls", sort it randomly and pick the first
> line, which wastes the "ls" sort.
>
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
>
> $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total
>
> $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total
>
> So, just use "ls -U" and "shuf -n 1" to choose a victim.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs/179: optimize remove file selection
2023-08-17 5:13 [PATCH] btrfs/179: optimize remove file selection Naohiro Aota
2023-08-17 12:58 ` Josef Bacik
@ 2023-08-18 19:35 ` Zorro Lang
2023-08-21 5:17 ` Naohiro Aota
1 sibling, 1 reply; 4+ messages in thread
From: Zorro Lang @ 2023-08-18 19:35 UTC (permalink / raw)
To: Naohiro Aota; +Cc: fstests, linux-btrfs
On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> victim. It sorts the files with "ls", sort it randomly and pick the first
> line, which wastes the "ls" sort.
>
> Also, using "sort -R | head -n1" is inefficient. For example, in a
> directory with 1000000 files, it takes more than 15 seconds to pick a file.
>
> $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total
>
> $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total
>
> So, just use "ls -U" and "shuf -n 1" to choose a victim.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> tests/btrfs/179 | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tests/btrfs/179 b/tests/btrfs/179
> index 2f17c9f9fb4a..0fbd875cf01b 100755
> --- a/tests/btrfs/179
> +++ b/tests/btrfs/179
> @@ -45,7 +45,7 @@ fill_workload()
>
> # Randomly remove some files for every 5 loop
> if [ $(( $i % 5 )) -eq 0 ]; then
> - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> + victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)
Thanks for this improvement. This case has two lines have this similar logic,
Why not change them both?
And btrfs/192 has a similar line too:
$ grep -rsn -- "sort -R" tests
tests/btrfs/179:48: victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
tests/btrfs/179:72: victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1)
tests/btrfs/192:75: echo "$basedir/$(ls $basedir | sort -R | tail -1)"
tests/btrfs/004:204: for file in `find $dir -name f\* -size +0 | sort -R`; do
Do we need to change that too? And a common helper might help, if more cases
would like to have this helper?
Thanks,
Zorro
> rm "$SCRATCH_MNT/src/$victim"
> fi
> i=$((i + 1))
> --
> 2.41.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] btrfs/179: optimize remove file selection
2023-08-18 19:35 ` Zorro Lang
@ 2023-08-21 5:17 ` Naohiro Aota
0 siblings, 0 replies; 4+ messages in thread
From: Naohiro Aota @ 2023-08-21 5:17 UTC (permalink / raw)
To: Zorro Lang; +Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org
On Sat, Aug 19, 2023 at 03:35:33AM +0800, Zorro Lang wrote:
> On Thu, Aug 17, 2023 at 02:13:17PM +0900, Naohiro Aota wrote:
> > Currently, we use "ls ... | sort -R | head -n1" to choose a removing
> > victim. It sorts the files with "ls", sort it randomly and pick the first
> > line, which wastes the "ls" sort.
> >
> > Also, using "sort -R | head -n1" is inefficient. For example, in a
> > directory with 1000000 files, it takes more than 15 seconds to pick a file.
> >
> > $ time bash -c "ls -U | sort -R | head -n 1 >/dev/null"
> > bash -c "ls -U | sort -R | head -n 1 >/dev/null" 15.38s user 0.14s system 99% cpu 15.536 total
> >
> > $ time bash -c "ls -U | shuf -n 1 >/dev/null"
> > bash -c "ls -U | shuf -n 1 >/dev/null" 0.30s user 0.12s system 138% cpu 0.306 total
> >
> > So, just use "ls -U" and "shuf -n 1" to choose a victim.
> >
> > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > ---
> > tests/btrfs/179 | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tests/btrfs/179 b/tests/btrfs/179
> > index 2f17c9f9fb4a..0fbd875cf01b 100755
> > --- a/tests/btrfs/179
> > +++ b/tests/btrfs/179
> > @@ -45,7 +45,7 @@ fill_workload()
> >
> > # Randomly remove some files for every 5 loop
> > if [ $(( $i % 5 )) -eq 0 ]; then
> > - victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> > + victim=$(ls -U "$SCRATCH_MNT/src" | shuf -n 1)
>
> Thanks for this improvement. This case has two lines have this similar logic,
> Why not change them both?
Indeed. I forgot to pick it.
> And btrfs/192 has a similar line too:
>
> $ grep -rsn -- "sort -R" tests
> tests/btrfs/179:48: victim=$(ls "$SCRATCH_MNT/src" | sort -R | head -n1)
> tests/btrfs/179:72: victim=$(ls "$SCRATCH_MNT/snapshots" | sort -R | head -n1)
> tests/btrfs/192:75: echo "$basedir/$(ls $basedir | sort -R | tail -1)"
> tests/btrfs/004:204: for file in `find $dir -name f\* -size +0 | sort -R`; do
Yeah, we can use "shuf" there too. I found even the simple "sort -R"
(without head or tail) is really slower than "shuf". I'll address them too.
>
> Do we need to change that too? And a common helper might help, if more cases
> would like to have this helper?
Yeah, maybe, we can define _random_file() as named in btrfs/192.
> Thanks,
> Zorro
>
> > rm "$SCRATCH_MNT/src/$victim"
> > fi
> > i=$((i + 1))
> > --
> > 2.41.0
> >
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-21 5:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 5:13 [PATCH] btrfs/179: optimize remove file selection Naohiro Aota
2023-08-17 12:58 ` Josef Bacik
2023-08-18 19:35 ` Zorro Lang
2023-08-21 5:17 ` Naohiro Aota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox