From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Boris Burkov <boris@bur.io>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 2/3] btrfs-progs: mkfs: rework how we traverse rootdir
Date: Thu, 1 Aug 2024 08:49:39 +0930 [thread overview]
Message-ID: <dd3444f9-c8be-46e7-97b1-8f95a161c709@gmx.com> (raw)
In-Reply-To: <20240731225935.GB3808023@zen.localdomain>
在 2024/8/1 08:29, Boris Burkov 写道:
> On Wed, Jul 31, 2024 at 07:08:47PM +0930, Qu Wenruo wrote:
>> There are several hidden pitfalls of the existing traverse_directory():
>>
>> - Hand written preorder traversal
>> Meanwhile there is already a better written standard library function,
>> nftw() doing exactly what we need.
>
> This is great!
>
>>
>> - Over-designed path list
>> To properly handle the directory change, we have structure
>> directory_name_entry, to record every inode until rootdir.
>>
>> But it has two string members, dir_name and path, which is a little
>> overkilled.
>> As for preorder traversal, we will never need to read the parent's
>> filename, just its inode number.
>>
>> And it's exported while no one utilizes it out of mkfs/rootdir.c.
>>
>> - Weird inode numbers
>> We use the inode number from st->st_ino, with an extra offset.
>> This by itself is not safe, if the rootdir has child directory in
>> another filesystem.
>
> Can you explain what you mean by not safe? As far as I can tell, the
> +256 is to handle that particular invariant of btrfs. Are you worried
> about duplicate inode numbers in the vein of the usual st_dev/st_ino
> debates?
I'm worried about this case:
rootdir (on fs1)
|- dir1
| |- file1 (fs1, ino 1024)
|- dir2 (on fs2)
|- file2 (fs2, ino 1024)
We do not have any cross mount point checks.
I created a case like this:
# fallocate -l 1G 1.img
# fallocate -l 1G 2.img
# mkfs.ext4 1.img
# mkfs.ext4 2.img
# mount 1.img mnt
# mkdir mnt/dir1 mnt/dir2
# touch mnt/dir1/file1 mnt/file1
# umount mnt
# mount 2.img mnt
# mkdir mnt/dir1 mnt/dir2
# touch mnt/dir1/file1 mnt/file1
# umount mnt
# mkdir rootdir
# mount 1.img rootdir
# mount 2.img rootdir/dir2
# mkfs.btrfs -f --rootdir rootdir test.img
ERROR: item file1 already exists but has wrong st_nlink 1 <= 1
ERROR: unable to traverse directory rootdir/: 1
ERROR: error while filling filesystem: 1
And it failed as expeceted.
>
> In that case, if this is a bugfix, I think it makes sense to write a
> regression test that highlights it. It would be nice to pull the fix
> out of the refactor, too, but I suppose that with such a deep refactor,
> that might not be possible/worth it.
Unfortunately I didn't even notice how serious these problems are, until
I tried...
Will add two new test cases. One is the above one, the other is the
hardlink out of rootdir bug I mentioned in 3/3.
>
>>
>> And this results very weird inode numbers, e.g:
>>
>> item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>> item 6 key (88347519 INODE_ITEM 0) itemoff 15815 itemsize 160
>> item 16 key (88347520 INODE_ITEM 0) itemoff 15363 itemsize 160
>> item 20 key (88347521 INODE_ITEM 0) itemoff 15119 itemsize 160
>> item 24 key (88347522 INODE_ITEM 0) itemoff 14875 itemsize 160
>> item 26 key (88347523 INODE_ITEM 0) itemoff 14700 itemsize 160
>> item 28 key (88347524 INODE_ITEM 0) itemoff 14525 itemsize 160
>> item 30 key (88347557 INODE_ITEM 0) itemoff 14350 itemsize 160
>> item 32 key (88347566 INODE_ITEM 0) itemoff 14175 itemsize 160
>>
>> Which is far from a regular fs created by copying the data.
>
> +1, especially since it doesn't even *preserve* the inode numbers, which
> would be a comprehensible (if not working) behavior. Creating the oids
> in "btrfs order" seems like a nice improvement!
>
>>
>> - Weird directory inode size calculation
>> Unlike kernel, which updated the directory inode size every time new
>> child inodes are added, we calculate the directory inode size by
>> searching all its children first, then later new inodes linked to this
>> directory won't touch the inode size.
>>
>> Enhance all these points by:
>>
>> - Use nftw() to do the preorder traversal
>> It also provides the extra level detection, which is pretty handy.
>>
>> - Use a simple local inode_entry to record each parent
>> The only value is a u64 to record the inode number.
>> And one simple rootdir_path structure to record the list of
>> inode_entry, alone with the current level.
>>
>> This rootdir_path structure along with two helpers,
>> rootdir_path_push() and rootdir_path_pop(), along with the
>> preorder traversal provided by nftw(), are enough for us to record
>> all the parent directories until the rootdir.
>>
>> - Grab new inode number properly
>> Just call btrfs_get_free_objectid() to grab a proper inode number,
>> other than using some weird calculated value.
>>
>> - Use btrfs_insert_inode() and btrfs_add_link() to update directory
>> inode automatically
>>
>> With all the refactor, the code is shorter and easier to read.
>
> Agreed on all counts, I think this is a lovely refactor.
>
> For a change of this magnitude, I think it is helpful to justify the
> correctness of the change by documenting the testing plan. Are there
> existing unit tests of mkfs that cover all the cases we are modifying
> here?
We have existing rootdir test cases, from the regular functional tests,
to corner cases.
But we lack corner cases of corner cases, like duplicating inode number
(cross mnt) and hard links.
> Is there some --rootdir master case that covers everything?
It's inside tests/mkfs-tests/
A quick grep shows:
004-rootdir-keeps-size
009-special-files-for-rootdir
011-rootdir-create-file
012-rootdir-no-shrink
014-rootdir-inline-extent
016-rootdir-bad-symbolic-link
021-rfeatures-quota-rootdir
022-rootdir-size
027-rootdir-inode
> Is it
> in fstests? Knowing that all of the new code has at least run correctly
> would go a long way to feeling confident in the details of the
> transformation.
We have btrfs-progs github CI, runs the all the selftests on each PR.
And it's all green (except a typo caught by code spell).
[...]
>> - parent_inum = highest_inum + BTRFS_FIRST_FREE_OBJECTID;
>> - dir_entry->inum = parent_inum;
>> - list_add_tail(&dir_entry->list, &dir_head->list);
>> + /*
>> + * If our path level is higher than this level - 1, this means
>> + * we have changed directory.
>> + * Poping out the unrelated inodes.
>
> Popping
Exposed by the CI, and fixed immediately in github.
>
> Also, this took me like 15 minutes of working examples to figure out why
> it worked. I think it could definitely do with some deeper explanation
> of the invariant, like:
>
> ftwbuf->level - 1 is the level of the parent of the current traversed
> inode. ftw will traverse all inodes in a directory before leaving it,
> and will never traverse an inode without first traversing its dir,
> so if we see a level less than or equal to the last directory we saw,
> we are finished with that directory and can pop it.
>
> Perhaps with an annotated drawing like:
>
> 0: /
> 1: /foo/
> 2: /foo/bar/
> 3: /foo/bar/f
> 2: /foo/baz/ POP! 2 > (2 - 1); done with /foo/bar/
> 3: /foo/baz/g
> 1: /h POP! 2 > (1 - 1); done with /foo/baz/
>
> To help make it clearer. I honestly even think just changing to >= makes
> it clearer? Not sure about that.
I'll add comments with an example to explain the workflow.
BTW, David and I are working with Github review system a lot recently:
https://github.com/kdave/btrfs-progs/pull/855
We do not force anyone to use specific system to do anything, but you
may find it a little easier to comment, and feel free to fall back to
the mail based review workflow at any time.
Thanks a lot for the detailed review!
Qu
next prev parent reply other threads:[~2024-07-31 23:19 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-31 9:38 [PATCH 0/3] btrfs-progs: rework how we traverse rootdir Qu Wenruo
2024-07-31 9:38 ` [PATCH 1/3] btrfs-progs: constify the name parameter of btrfs_add_link() Qu Wenruo
2024-07-31 9:38 ` [PATCH 2/3] btrfs-progs: mkfs: rework how we traverse rootdir Qu Wenruo
2024-07-31 22:59 ` Boris Burkov
2024-07-31 23:19 ` Qu Wenruo [this message]
2024-07-31 23:42 ` Boris Burkov
2024-08-01 0:06 ` Qu Wenruo
2024-07-31 9:38 ` [PATCH 3/3] btrfs-progs: rootdir: reject hard links Qu Wenruo
2024-07-31 22:17 ` Boris Burkov
2024-07-31 22:55 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dd3444f9-c8be-46e7-97b1-8f95a161c709@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=boris@bur.io \
--cc=linux-btrfs@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox