public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Josef Bacik <josef@toxicpanda.com>, Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs-progs: mkfs: optimize file descriptor usage in mkfs.btrfs
Date: Thu, 1 Feb 2024 19:19:15 +1030	[thread overview]
Message-ID: <04a9a9cf-0c77-4f1e-b9f3-12cceeb7ef57@gmx.com> (raw)
In-Reply-To: <20240131204800.GB3203388@perftesting>



On 2024/2/1 07:18, Josef Bacik wrote:
> On Wed, Jan 31, 2024 at 11:49:28PM +0800, Anand Jain wrote:
>> I've reproduced the missing udev events issue without device partitions
>> using the test case as below. The test waits for the creation of 'by-uuid'
>> and, waiting indefinitely means successful reproduction. as below:
>>
>> --------------------------------------------------
>> #!/bin/bash
>> usage()
>> {
>> 	echo
>> 	echo "Usage: ./t1 sdb btrfs"
>> 	exit 1
>> }
>>
>> : ${1?"arg1 <dev> is missing"} || usage
>> dev=$1
>>
>> : ${2?"arg2 <fstype> is missing"} || usage
>> fstype=$2
>>
>> systemd=$(rpm -q --queryformat='%{NAME}-%{VERSION}-%{RELEASE}' systemd)
>>
>> run_testcase()
>> {
>> 	local cnt=$1
>> 	local ret=0
>> 	local sleepcnt=0
>>
>> 	local newuuid=""
>> 	local logpid=""
>> 	local log=""
>> 	local logfile="./udev_log_${systemd}_${fstype}.out"
>>
>> 	>$logfile
>>
>> 	wipefs -a /dev/${dev}* &>/dev/null
>> 	sync
>> 	udevadm settle /dev/$dev
>>
>> 	udevadm monitor -k -u -p > $logfile &
>> 	logpid=$!
>> 	>strace.out
>> 	run "strace -f -ttT -o strace.out mkfs.$fstype -q -f /dev/$dev"
>>
>> 	newuuid=$(blkid -p /dev/$dev | awk '{print $2}' | sed -e 's/UUID=//' -e 's/\"//g')
>>
>> 	kill $logpid
>> 	sync $logfile
>>
>> 	ret=-1
>> 	while [ $ret != 0 ]
>> 	do
>> 		run -s -q "ls -lt /dev/disk/by-uuid | grep $newuuid"
>> 		ret=$?
>> 		((sleepcnt++))
>> 		sleep 1
>> 	done
>>
>> 	#for systemd-239-78.0.3.el8
>> 	log=$(cat $logfile|grep ID_FS_TYPE_NEW)
>> 	#for systemd-252-18.0.1.el9.x86_64
>> 	#log=$(grep --text "ID_FS_UUID=${newuuid}" $logfile)
>>
>> 	echo $cnt sleepcnt=$sleepcnt newuuid=$newuuid ret=$ret log=$log
>> }
>>
>> echo Test case: t1: version 3.
>> echo
>>
>> run -o cat /etc/system-release
>> run -o uname -a
>> run -o rpm -q systemd
>> if [ $fstype == "btrfs" ]; then
>> 	run -o mkfs.btrfs --version
>> elif [ $fstype == "xfs" ]; then
>> 	run -o mkfs.xfs -V
>> else
>> 	echo unknown fstype $fstype
>> fi
>> echo
>>
>> for ((cnt = 0; cnt < 13; cnt++)); do
>> 	run_testcase $cnt
>> done
>> -----------------------------------------------------------------
>>
>>
>> It appears that the problem is due to the convoluted nested device open
>> and device close in mkfs.btrfs as shown below:
>>
>> -------------
>>   prepare_ctx opens all devices <-- fd1
>>     zero the super-block
>>     writes temp-sb to the first device.
>>
>>   open_ctree opens the first device again <-- fd2
>>
>>   prepare_ctx writes temp-sb to all the remaining devs <-- fd1
>>
>>   fs_info->finalize_on_close = 1;
>>   close_ctree_fs_info()<-- writes valid <--- fd2
>>
>>   prepare_ctx is closed <--- fd1.
>> -------------

The problem is, even if we change the sequence, it doesn't make much
difference.

There are several things involved, and most of them are out of our control:

- The udev scan is triggered on writable fd close().
- The udev scan is always executed on the parent block device
   Not the partition device.
- The udev scan the whole disk, not just the partition

With those involved, changing the nested behavior would not change anything.

The write in another partition of the same parent block device can still
triggered a scan meanwhile we're making fs on our partition.

If you really want to change the behavior, you need to reuse all device
fd, and even with the reuse, you still can not solve the problem as mkfs
on another partition can still lead to the race.

Thus the proper way to do, is just to follow the udev's doc, to use
flock() on the parent block device.

Thanks,
Qu
>>
>> This cleanup patch reuses the main file descriptor (fd1) in open_ctree(),
>> and with this change both the test cases (with partition and without
>> partition) now runs fine.
>>
>> I've done an initial tests only, not validated with the multi-device mkfs.
>> More cleanup is possible but pending feedback;  marking this patch as an RFC.
>
> I'd like to see the cleaned up version of this patch, but I have a few comments.
>
> 1) I think re-using the fd is reasonable, tho could this just be reworked to
>     create the temp-sb and write this to all the devs, close the file
>     descriptors, and then call open_ctree?
>
> 2) I hate adding another thing into a core file that we'll have to figure out
>     how to undo later as we sync more code from the kernel into btrfs-progs, I'm
>     not sure if there's a way around this, but thinking harder about adding
>     something to disk-io.c that is for userspace only would be good.
>
> Thanks,
>
> Josef
>

  reply	other threads:[~2024-02-01  8:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 15:49 [PATCH RFC] btrfs-progs: mkfs: optimize file descriptor usage in mkfs.btrfs Anand Jain
2024-01-31 20:48 ` Josef Bacik
2024-02-01  8:49   ` Qu Wenruo [this message]
2024-02-01 19:07     ` David Sterba
2024-02-01 20:50       ` Qu Wenruo
2024-02-01 19:18   ` David Sterba
2024-02-02  1:52     ` Anand Jain
2024-02-01 21:59   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=04a9a9cf-0c77-4f1e-b9f3-12cceeb7ef57@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=anand.jain@oracle.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox