* [PATCH 0/5] *** Introduce new space allocation algorithm ***
@ 2024-11-04 1:40 zhangshida
2024-11-04 6:52 ` frag.sh zhangshida
2024-11-04 6:53 ` auto_frag.sh zhangshida
0 siblings, 2 replies; 4+ messages in thread
From: zhangshida @ 2024-11-04 1:40 UTC (permalink / raw)
To: djwong, dchinner, leo.lilong, wozizhi, osandov, xiang,
zhangjiachen.jaycee
Cc: linux-xfs, linux-kernel, zhangshida, starzhangzsd
From: Shida Zhang <zhangshida@kylinos.cn>
Hi all,
Recently, we've been encounter xfs problems from our two
major users continuously.
They are all manifested as the same phonomenon: a xfs
filesystem can't touch new file when there are nearly
half of the available space even with sparse inode enabled.
It turns out that the filesystem is too fragmented to have
enough continuous free space to create a new file.
Life still has to goes on.
But from our users' perspective, worse than the situation
that xfs is hard to use is that xfs is non-able to use,
since even one single file can't be created now.
So we try to introduce a new space allocation algorithm to
solve this.
To achieve that, we try to propose a new concept:
Allocation Fields, where its name is borrowed from the
mathmatical concepts(Groups,Rings,Fields), will be
abbrivated as AF in the rest of the article.
what is a AF?
An one-pic-to-say-it-all version of explaination:
|<--------+ af 0 +-------->|<--+ af 1 +-->| af 2|
|------------------------------------------------+
| ag 0 | ag 1 | ag 2 | ag 3| ag 4 | ag 5 | ag 6 |
+------------------------------------------------+
A text-based definition of AF:
1.An AF is a incore-only concept comparing with the on-disk
AG concept.
2.An AF is consisted of a continuous series of AGs.
3.Lower AFs will NEVER go to higher AFs for allocation if
it can complete it in the current AF.
Rule 3 can serve as a barrier between the AF to slow down
the over-speed extending of fragmented pieces.
With these patches applied, the code logic will be exactly
the same as the original code logic, unless you run with the
extra mount opiton. For example:
mount -o af1=1 $dev $mnt
That will change the default AF layout:
|<--------+ af 0 +--------->|
|----------------------------
| ag 0 | ag 1 | ag 2 | ag 3 |
+----------------------------
to :
|<-----+ af 0 +----->|<af 1>|
|----------------------------
| ag 0 | ag 1 | ag 2 | ag 3 |
+----------------------------
So the 'af1=1' here means the start agno is one ag away from
the m_sb.agcount.
We did some tests verify it. You can verify it yourself
by running the following the command:
1. Create an 1g sized img file and formated it as xfs:
dd if=/dev/zero of=test.img bs=1M count=1024
mkfs.xfs -f test.img
sync
2. Make a mount directory:
mkdir mnt
3. Run the auto_frag.sh script, which will call another scripts
frag.sh. These scripts will be attached in the mail.
To enable the AF, run:
./auto_frag.sh 1
To disable the AF, run:
./auto_frag.sh 0
Please feel free to communicate with us if you have any thoughts
about these problems.
Cheers,
Shida
Shida Zhang (5):
xfs: add two wrappers for iterating ags in a AF
xfs: add two mp member to record the alloction field layout
xfs: add mount options as a way to change the AF layout
xfs: add infrastructure to support AF allocation algorithm
xfs: modify the logic to comply with AF rules
fs/xfs/libxfs/xfs_ag.h | 17 ++++++++++++
fs/xfs/libxfs/xfs_alloc.c | 20 ++++++++++++++-
fs/xfs/libxfs/xfs_alloc.h | 2 ++
fs/xfs/libxfs/xfs_bmap.c | 47 ++++++++++++++++++++++++++++++++--
fs/xfs/libxfs/xfs_bmap_btree.c | 2 ++
fs/xfs/xfs_mount.h | 3 +++
fs/xfs/xfs_super.c | 12 ++++++++-
7 files changed, 99 insertions(+), 4 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* frag.sh
2024-11-04 1:40 [PATCH 0/5] *** Introduce new space allocation algorithm *** zhangshida
@ 2024-11-04 6:52 ` zhangshida
2024-11-04 12:21 ` frag.sh Dave Chinner
2024-11-04 6:53 ` auto_frag.sh zhangshida
1 sibling, 1 reply; 4+ messages in thread
From: zhangshida @ 2024-11-04 6:52 UTC (permalink / raw)
To: starzhangzsd
Cc: dchinner, djwong, leo.lilong, linux-kernel, linux-xfs, osandov,
wozizhi, xiang, zhangjiachen.jaycee, zhangshida
From: Shida Zhang <zhangshida@kylinos.cn>
#usage: ./frag.sh $dev $dir $size_k $filename
#!/bin/bash
cleanup() {
echo "Ctrl+C detected. Killing child processes..." >&2
pkill -P $$ # Kill all child processes
echo "exit...umount ${test_dev}" >&2
umount ${test_dev}
exit 1
}
trap cleanup SIGINT SIGTERM
test_dev=$1
if [ -z $test_dev ]; then
echo "test_dev cant be null"
echo "usage: ./create_file.sh [test_dev] [test_dir] [file_size_k]"
exit 1
fi
test_mnt=$2
if [ -z $test_mnt ]; then
echo "test_mnt cant be null"
echo "usage: ./create_file.sh [test_dev] [test_dir] [file_size_k]"
exit 1
fi
file_size_k=$3
if [ -z ${file_size_k} ]; then
echo "file_size_k cant be null"
echo "usage: ./create_file.sh [test_dev] [test_dir] [file_size_k]"
exit 1
fi
echo "test_dev:${test_dev} test_mnt:${test_mnt} fize_size:${file_size_k}KB"
#mkfs.xfs -f ${test_dev}
if [ $5 -eq 0 ]; then
echo "mount ${test_dev} ${test_mnt}"
mount $test_dev $test_mnt
else
echo "mount -o af1=1 ${test_dev} ${test_mnt}"
mount -o af1=1 $test_dev $test_mnt
fi
# Parameters
FILE=${test_mnt}/"$4" # File name
echo "$FILE"
if [ -z ${FILE} ]; then
FILE=${test_mnt}/"fragmented_file" # File name
fi
TOTAL_SIZE=${file_size_k} # Total size in KB
CHUNK_SIZE=4 # Size of each punch operation in KB
# Create a big file with allocated space
xfs_io -f -c "falloc 0 $((TOTAL_SIZE))k" $FILE
# Calculate total number of punches needed
NUM_PUNCHES=$(( TOTAL_SIZE / (CHUNK_SIZE * 2) ))
last_percentage=-1
# Punch holes alternately to create fragmentation
for ((i=0; i<NUM_PUNCHES; i++)); do
OFFSET=$(( i * CHUNK_SIZE * 2 * 1024 ))
xfs_io -c "fpunch $OFFSET ${CHUNK_SIZE}k" $FILE
# Calculate current percentage and print if changed
PERCENTAGE=$(( (i + 1) * 100 / NUM_PUNCHES ))
if [ "$PERCENTAGE" -ne "$last_percentage" ]; then
#echo "Processing...${PERCENTAGE}%"
last_percentage=$PERCENTAGE
fi
done
# Verify the extent list (to see fragmentation)
# echo "Extent list for the file:"
# xfs_bmap -v $FILE
df -Th ${test_mnt}
echo "umount ${test_dev}"
umount $test_dev
xfs_db -c 'freesp' $test_dev
^ permalink raw reply [flat|nested] 4+ messages in thread
* auto_frag.sh
2024-11-04 1:40 [PATCH 0/5] *** Introduce new space allocation algorithm *** zhangshida
2024-11-04 6:52 ` frag.sh zhangshida
@ 2024-11-04 6:53 ` zhangshida
1 sibling, 0 replies; 4+ messages in thread
From: zhangshida @ 2024-11-04 6:53 UTC (permalink / raw)
To: starzhangzsd
Cc: dchinner, djwong, leo.lilong, linux-kernel, linux-xfs, osandov,
wozizhi, xiang, zhangjiachen.jaycee, zhangshida
From: Shida Zhang <zhangshida@kylinos.cn>
#!/bin/bash
cleanup() {
echo "Ctrl+C detected. Killing child processes..." >&2
pkill -P $$ # Kill all child processes
exit 1
}
trap cleanup SIGINT SIGTERM
./frag.sh test.img mnt/ $((500*1024)) frag $1
./frag.sh test.img mnt/ $((200*1024)) frag2 $1
./frag.sh test.img mnt/ $((100*1024)) frag3 $1
./frag.sh test.img mnt/ $((100*1024)) frag4 $1
./frag.sh test.img mnt/ $((100*1024)) frag5 $1
./frag.sh test.img mnt/ $((100*1024)) frag6 $1
./frag.sh test.img mnt/ $((100*1024)) frag7 $1
./frag.sh test.img mnt/ $((100*1024)) frag8 $1
./frag.sh test.img mnt/ $((100*1024)) frag9 $1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: frag.sh
2024-11-04 6:52 ` frag.sh zhangshida
@ 2024-11-04 12:21 ` Dave Chinner
0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2024-11-04 12:21 UTC (permalink / raw)
To: zhangshida
Cc: dchinner, djwong, leo.lilong, linux-kernel, linux-xfs, osandov,
wozizhi, xiang, zhangjiachen.jaycee, zhangshida
On Mon, Nov 04, 2024 at 02:52:14PM +0800, zhangshida wrote:
> From: Shida Zhang <zhangshida@kylinos.cn>
>
> #usage: ./frag.sh $dev $dir $size_k $filename
> #!/bin/bash
.....
> # Create a big file with allocated space
> xfs_io -f -c "falloc 0 $((TOTAL_SIZE))k" $FILE
>
> # Calculate total number of punches needed
> NUM_PUNCHES=$(( TOTAL_SIZE / (CHUNK_SIZE * 2) ))
>
> last_percentage=-1
> # Punch holes alternately to create fragmentation
> for ((i=0; i<NUM_PUNCHES; i++)); do
> OFFSET=$(( i * CHUNK_SIZE * 2 * 1024 ))
> xfs_io -c "fpunch $OFFSET ${CHUNK_SIZE}k" $FILE
>
> # Calculate current percentage and print if changed
> PERCENTAGE=$(( (i + 1) * 100 / NUM_PUNCHES ))
> if [ "$PERCENTAGE" -ne "$last_percentage" ]; then
> #echo "Processing...${PERCENTAGE}%"
> last_percentage=$PERCENTAGE
> fi
> done
Yup, that re-invents fstests::src/punch-alternating.c pretty much
exactly.
The fact that there is a production workload that is generating this
exact operational pattern and running it to ENOSPC repeatedly is
horrifying....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-11-04 12:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-04 1:40 [PATCH 0/5] *** Introduce new space allocation algorithm *** zhangshida
2024-11-04 6:52 ` frag.sh zhangshida
2024-11-04 12:21 ` frag.sh Dave Chinner
2024-11-04 6:53 ` auto_frag.sh zhangshida
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox