From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: fstests@vger.kernel.org
Subject: Re: [PATCH 1/3] populate: fix horrible performance due to excessive forking
Date: Tue, 10 Jan 2023 22:02:37 -0800 [thread overview]
Message-ID: <Y75Q/dBXOIeIPonK@magnolia> (raw)
In-Reply-To: <20230110224906.1171483-2-david@fromorbit.com>
On Wed, Jan 11, 2023 at 09:49:04AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs/155 is taking close on 4 minutes to populate the filesystem,
> and most of that is because the populate functions are coded without
> consideration of performance.
>
> Most of the operations can be executed in parallel as the operate on
> separate files or in separate directories.
>
> Creating a zero length file in a shell script can be very fast if we
> do the creation within the shell, but running touch, xfs_io or some
> other process to create the file is extremely slow - performance is
> limited by the process creation/destruction rate, not the filesystem
> create rate. Same goes for unlinking files.
>
> We can use 'echo -n > $file' to create or truncate an existing file
> to zero length from within the shell. This is much, much faster than
> calling touch.
>
> For removing lots of files, there is no shell built in to do this
> without forking, but we can easily build a file list and pipe it
> to 'xargs rm -f' to execute rm with as many files as possible in one
> execution.
>
> Doing this removes approximately 50,000 process creat/destroy cycles
> to populate the filesystem, reducing system time from ~200s to ~35s
> to populate the filesystem. Along with running operations in
> parallel, this brings the population time down from ~235s to less
> than 45s.
Hmm. I took the nerdsnipe bait and came up with my own approach. I
replaced the shell loops with a perl script. I didn't parallelize
anything, but the perl script cut the runtime down to about ~35s.
> The long tail of that 45s runtime time is the btree format attribute
> tree create. That executes setfattr a very large number of times,
> taking 44s to run and consuming 36s of system time mostly just
> creating and destroying thousands of setfattr process contexts.
> There's no easy shell coding solution to that issue, so that's for
> another rainy day.
...well it's pouring on the west coast here, so I'll post my solution
that uses setfattr --restore tomorrow when I get it back from QA.
Granted, I hadn't found a solution to the removexattr stuff yet, so I
might keep working on that.
(removexattr looks like a pain in perl though...)
Anyway it's late now, I'll look at the diff tomorrow.
--D
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> common/populate | 179 ++++++++++++++++++++++++++++--------------------
> 1 file changed, 104 insertions(+), 75 deletions(-)
>
> diff --git a/common/populate b/common/populate
> index 44b4af166..9b60fa5c1 100644
> --- a/common/populate
> +++ b/common/populate
> @@ -52,23 +52,64 @@ __populate_fragment_file() {
> test -f "${fname}" && $here/src/punch-alternating "${fname}"
> }
>
> -# Create a large directory
> -__populate_create_dir() {
> - name="$1"
> - nr="$2"
> - missing="$3"
> +# Create a specified number of files or until the maximum extent count is
> +# reached. If the extent count is reached, return the number of files created.
> +# This is optimised for speed - do not add anything that executes a separate
> +# process in every loop as this will slow it down by a factor of at least 5.
> +__populate_create_nfiles() {
> + local name="$1"
> + local nr="$2"
> + local max_nextents="$3"
> + local d=0
>
> mkdir -p "${name}"
> - seq 0 "${nr}" | while read d; do
> - creat=mkdir
> - test "$((d % 20))" -eq 0 && creat=touch
> - $creat "${name}/$(printf "%.08d" "$d")"
> + for d in `seq 0 "${nr}"`; do
> + local fname=""
> + printf -v fname "${name}/%.08d" "$d"
> +
> + if [ "$((d % 20))" -eq 0 ]; then
> + mkdir ${fname}
> + else
> + echo -n > ${fname}
> + fi
> +
> + if [ "${max_nextents}" -eq 0 ]; then
> + continue
> + fi
> + if [ "$((d % 40))" -ne 0 ]; then
> + continue
> + fi
> +
> + local nextents="$(_xfs_get_fsxattr nextents $name)"
> + if [ "${nextents}" -gt "${max_nextents}" ]; then
> + echo ${d}
> + break
> + fi
> done
> +}
> +
> +# remove every second file in the given directory. This is optimised for speed -
> +# do not add anything that executes a separate process in each loop as this will
> +# slow it down by at least factor of 10.
> +__populate_remove_nfiles() {
> + local name="$1"
> + local nr="$2"
> + local d=1
> +
> + for d in `seq 1 2 "${nr}"`; do
> + printf "${name}/%.08d " "$d"
> + done | xargs rm -f
> +}
>
> +# Create a large directory
> +__populate_create_dir() {
> + local name="$1"
> + local nr="$2"
> + local missing="$3"
> +
> + __populate_create_nfiles "${name}" "${nr}" 0
> test -z "${missing}" && return
> - seq 1 2 "${nr}" | while read d; do
> - rm -rf "${name}/$(printf "%.08d" "$d")"
> - done
> + __populate_remove_nfiles "${name}" "${nr}"
> }
>
> # Create a large directory and ensure that it's a btree format
> @@ -82,31 +123,18 @@ __populate_xfs_create_btree_dir() {
> # watch for when the extent count exceeds the space after the
> # inode core.
> local max_nextents="$(((isize - icore_size) / 16))"
> - local nr=0
> -
> - mkdir -p "${name}"
> - while true; do
> - local creat=mkdir
> - test "$((nr % 20))" -eq 0 && creat=touch
> - $creat "${name}/$(printf "%.08d" "$nr")"
> - if [ "$((nr % 40))" -eq 0 ]; then
> - local nextents="$(_xfs_get_fsxattr nextents $name)"
> - [ $nextents -gt $max_nextents ] && break
> - fi
> - nr=$((nr+1))
> - done
> + local nr=100000
>
> + nr=$(__populate_create_nfiles "${name}" "${nr}" "${max_nextents}")
> test -z "${missing}" && return
> - seq 1 2 "${nr}" | while read d; do
> - rm -rf "${name}/$(printf "%.08d" "$d")"
> - done
> + __populate_remove_nfiles "${name}" "${nr}"
> }
>
> # Add a bunch of attrs to a file
> __populate_create_attr() {
> - name="$1"
> - nr="$2"
> - missing="$3"
> + local name="$1"
> + local nr="$2"
> + local missing="$3"
>
> touch "${name}"
> seq 0 "${nr}" | while read d; do
> @@ -121,17 +149,18 @@ __populate_create_attr() {
>
> # Fill up some percentage of the remaining free space
> __populate_fill_fs() {
> - dir="$1"
> - pct="$2"
> + local dir="$1"
> + local pct="$2"
> + local nr=0
> test -z "${pct}" && pct=60
>
> mkdir -p "${dir}/test/1"
> cp -pRdu "${dir}"/S_IFREG* "${dir}/test/1/"
>
> - SRC_SZ="$(du -ks "${dir}/test/1" | cut -f 1)"
> - FS_SZ="$(( $(stat -f "${dir}" -c '%a * %S') / 1024 ))"
> + local SRC_SZ="$(du -ks "${dir}/test/1" | cut -f 1)"
> + local FS_SZ="$(( $(stat -f "${dir}" -c '%a * %S') / 1024 ))"
>
> - NR="$(( (FS_SZ * ${pct} / 100) / SRC_SZ ))"
> + local NR="$(( (FS_SZ * ${pct} / 100) / SRC_SZ ))"
>
> echo "FILL FS"
> echo "src_sz $SRC_SZ fs_sz $FS_SZ nr $NR"
> @@ -220,45 +249,45 @@ _scratch_xfs_populate() {
> # Data:
>
> # Fill up the root inode chunk
> - echo "+ fill root ino chunk"
> + ( echo "+ fill root ino chunk"
> seq 1 64 | while read f; do
> - $XFS_IO_PROG -f -c "truncate 0" "${SCRATCH_MNT}/dummy${f}"
> - done
> + echo -n > "${SCRATCH_MNT}/dummy${f}"
> + done ) &
>
> # Regular files
> # - FMT_EXTENTS
> echo "+ extents file"
> - __populate_create_file $blksz "${SCRATCH_MNT}/S_IFREG.FMT_EXTENTS"
> + __populate_create_file $blksz "${SCRATCH_MNT}/S_IFREG.FMT_EXTENTS" &
>
> # - FMT_BTREE
> echo "+ btree extents file"
> nr="$((blksz * 2 / 16))"
> - __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/S_IFREG.FMT_BTREE"
> + __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/S_IFREG.FMT_BTREE" &
>
> # Directories
> # - INLINE
> - echo "+ inline dir"
> - __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_INLINE" 1
> + echo "+ inline dir"
> + __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_INLINE" 1 "" &
>
> # - BLOCK
> echo "+ block dir"
> - __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BLOCK" "$((dblksz / 40))"
> + __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BLOCK" "$((dblksz / 40))" "" &
>
> # - LEAF
> echo "+ leaf dir"
> - __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAF" "$((dblksz / 12))"
> + __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAF" "$((dblksz / 12))" "" &
>
> # - LEAFN
> echo "+ leafn dir"
> - __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAFN" "$(( ((dblksz - leaf_hdr_size) / 8) - 3 ))"
> + __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAFN" "$(( ((dblksz - leaf_hdr_size) / 8) - 3 ))" "" &
>
> # - NODE
> echo "+ node dir"
> - __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_NODE" "$((16 * dblksz / 40))" true
> + __populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_NODE" "$((16 * dblksz / 40))" true &
>
> # - BTREE
> echo "+ btree dir"
> - __populate_xfs_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize" true
> + __populate_xfs_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize" true &
>
> # Symlinks
> # - FMT_LOCAL
> @@ -280,20 +309,20 @@ _scratch_xfs_populate() {
>
> # Attribute formats
> # LOCAL
> - echo "+ local attr"
> - __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LOCAL" 1
> + echo "+ local attr"
> + __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LOCAL" 1 "" &
>
> # LEAF
> - echo "+ leaf attr"
> - __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LEAF" "$((blksz / 40))"
> + echo "+ leaf attr"
> + __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LEAF" "$((blksz / 40))" "" &
>
> # NODE
> echo "+ node attr"
> - __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_NODE" "$((8 * blksz / 40))"
> + __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_NODE" "$((8 * blksz / 40))" "" &
>
> # BTREE
> echo "+ btree attr"
> - __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_BTREE" "$((64 * blksz / 40))" true
> + __populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_BTREE" "$((64 * blksz / 40))" true &
>
> # trusted namespace
> touch ${SCRATCH_MNT}/ATTR.TRUSTED
> @@ -321,68 +350,68 @@ _scratch_xfs_populate() {
> rm -rf "${SCRATCH_MNT}/attrvalfile"
>
> # Make an unused inode
> - echo "+ empty file"
> + ( echo "+ empty file"
> touch "${SCRATCH_MNT}/unused"
> $XFS_IO_PROG -f -c 'fsync' "${SCRATCH_MNT}/unused"
> - rm -rf "${SCRATCH_MNT}/unused"
> + rm -rf "${SCRATCH_MNT}/unused" ) &
>
> # Free space btree
> echo "+ freesp btree"
> nr="$((blksz * 2 / 8))"
> - __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/BNOBT"
> + __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/BNOBT" &
>
> # Inode btree
> - echo "+ inobt btree"
> + ( echo "+ inobt btree"
> local ino_per_rec=64
> local rec_per_btblock=16
> local nr="$(( 2 * (blksz / rec_per_btblock) * ino_per_rec ))"
> local dir="${SCRATCH_MNT}/INOBT"
> - mkdir -p "${dir}"
> - seq 0 "${nr}" | while read f; do
> - touch "${dir}/${f}"
> - done
> -
> - seq 0 2 "${nr}" | while read f; do
> - rm -f "${dir}/${f}"
> - done
> + __populate_create_dir "${SCRATCH_MNT}/INOBT" "${nr}" true
> + ) &
>
> # Reverse-mapping btree
> is_rmapbt="$(_xfs_has_feature "$SCRATCH_MNT" rmapbt -v)"
> if [ $is_rmapbt -gt 0 ]; then
> - echo "+ rmapbt btree"
> + ( echo "+ rmapbt btree"
> nr="$((blksz * 2 / 24))"
> __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RMAPBT"
> + ) &
> fi
>
> # Realtime Reverse-mapping btree
> is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")"
> if [ $is_rmapbt -gt 0 ] && [ $is_rt -gt 0 ]; then
> - echo "+ rtrmapbt btree"
> + ( echo "+ rtrmapbt btree"
> nr="$((blksz * 2 / 32))"
> $XFS_IO_PROG -R -f -c 'truncate 0' "${SCRATCH_MNT}/RTRMAPBT"
> __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RTRMAPBT"
> + ) &
> fi
>
> # Reference-count btree
> is_reflink="$(_xfs_has_feature "$SCRATCH_MNT" reflink -v)"
> if [ $is_reflink -gt 0 ]; then
> - echo "+ reflink btree"
> + ( echo "+ reflink btree"
> nr="$((blksz * 2 / 12))"
> __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/REFCOUNTBT"
> cp --reflink=always "${SCRATCH_MNT}/REFCOUNTBT" "${SCRATCH_MNT}/REFCOUNTBT2"
> + ) &
> fi
>
> # Copy some real files (xfs tests, I guess...)
> echo "+ real files"
> test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5
>
> - # Make sure we get all the fragmentation we asked for
> - __populate_fragment_file "${SCRATCH_MNT}/S_IFREG.FMT_BTREE"
> - __populate_fragment_file "${SCRATCH_MNT}/BNOBT"
> - __populate_fragment_file "${SCRATCH_MNT}/RMAPBT"
> - __populate_fragment_file "${SCRATCH_MNT}/RTRMAPBT"
> - __populate_fragment_file "${SCRATCH_MNT}/REFCOUNTBT"
> + # Wait for all file creation to complete before we start fragmenting
> + # the files as needed.
> + wait
> + __populate_fragment_file "${SCRATCH_MNT}/S_IFREG.FMT_BTREE" &
> + __populate_fragment_file "${SCRATCH_MNT}/BNOBT" &
> + __populate_fragment_file "${SCRATCH_MNT}/RMAPBT" &
> + __populate_fragment_file "${SCRATCH_MNT}/RTRMAPBT" &
> + __populate_fragment_file "${SCRATCH_MNT}/REFCOUNTBT" &
>
> + wait
> umount "${SCRATCH_MNT}"
> }
>
> --
> 2.38.1
>
next prev parent reply other threads:[~2023-01-11 6:02 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-10 22:49 [PATCH 0/3] fstests: filesystem population fixes Dave Chinner
2023-01-10 22:49 ` [PATCH 1/3] populate: fix horrible performance due to excessive forking Dave Chinner
2023-01-11 6:02 ` Darrick J. Wong [this message]
2023-01-12 1:58 ` Darrick J. Wong
2023-01-12 10:24 ` [PATCH 1/3] more python dependence. was: " David Disseldorp
2023-01-12 17:07 ` Darrick J. Wong
2023-01-12 20:23 ` David Disseldorp
2023-01-12 20:42 ` Zorro Lang
2023-01-15 18:33 ` Darrick J. Wong
2023-01-10 22:49 ` [PATCH 2/3] populate: ensure btree directories are created reliably Dave Chinner
2023-01-11 5:47 ` Darrick J. Wong
2023-01-12 5:42 ` Gao Xiang
2023-01-10 22:49 ` [PATCH 3/3] xfs/294: performance is unreasonably slow Dave Chinner
2023-01-11 20:29 ` David Disseldorp
2023-01-12 8:39 ` Zorro Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y75Q/dBXOIeIPonK@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox