From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Filipe Manana <fdmanana@kernel.org>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, erosca@de.adit-jv.com,
Filipe Manana <fdmanana@suse.com>, Rob Landley <rob@landley.net>,
stable@vger.kernel.org, David Sterba <dsterba@suse.com>
Subject: Re: [PATCH v2] btrfs: fix infinite directory reads
Date: Fri, 26 Jan 2024 09:36:26 +1030 [thread overview]
Message-ID: <d52c41e0-1f82-4936-bfd4-e6e989560edf@gmx.com> (raw)
In-Reply-To: <CAL3q7H77i3kv7C352k2R6nr-m-cgh_cdCCeTkXna+v1yjpMuoA@mail.gmail.com>
On 2024/1/25 20:32, Filipe Manana wrote:
> On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
>>
>> The readdir implementation currently processes always up to the last index
>> it finds. This however can result in an infinite loop if the directory has
>> a large number of entries such that they won't all fit in the given buffer
>> passed to the readdir callback, that is, dir_emit() returns a non-zero
>> value. Because in that case readdir() will be called again and if in the
>> meanwhile new directory entries were added and we still can't put all the
>> remaining entries in the buffer, we keep repeating this over and over.
>>
>> The following C program and test script reproduce the problem:
>>
>> $ cat /mnt/readdir_prog.c
>> #include <sys/types.h>
>> #include <dirent.h>
>> #include <stdio.h>
>>
>> int main(int argc, char *argv[])
>> {
>> DIR *dir = opendir(".");
>> struct dirent *dd;
>>
>> while ((dd = readdir(dir))) {
>> printf("%s\n", dd->d_name);
>> rename(dd->d_name, "TEMPFILE");
>> rename("TEMPFILE", dd->d_name);
>> }
>> closedir(dir);
>> }
>>
>> $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
>>
>> $ cat test.sh
>> #!/bin/bash
>>
>> DEV=/dev/sdi
>> MNT=/mnt/sdi
>>
>> mkfs.btrfs -f $DEV &> /dev/null
>> #mkfs.xfs -f $DEV &> /dev/null
>> #mkfs.ext4 -F $DEV &> /dev/null
>>
>> mount $DEV $MNT
>>
>> mkdir $MNT/testdir
>> for ((i = 1; i <= 2000; i++)); do
>> echo -n > $MNT/testdir/file_$i
>> done
>>
>> cd $MNT/testdir
>> /mnt/readdir_prog
>>
>> cd /mnt
>>
>> umount $MNT
>>
>> This behaviour is surprising to applications and it's unlike ext4, xfs,
>> tmpfs, vfat and other filesystems, which always finish. In this case where
>> new entries were added due to renames, some file names may be reported
>> more than once, but this varies according to each filesystem - for example
>> ext4 never reported the same file more than once while xfs reports the
>> first 13 file names twice.
>>
>> So change our readdir implementation to track the last index number when
>> opendir() is called and then make readdir() never process beyond that
>> index number. This gives the same behaviour as ext4.
>>
>> Reported-by: Rob Landley <rob@landley.net>
>> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
>> CC: stable@vger.kernel.org # 5.15
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> [ Resolve a conflict due to member changes in 96d89923fa94 ]
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>
> Thanks for the backport, and running the corresponding test case from
> fstests to verify it's working.
>
> However when backporting a commit, one should also check if there are
> fixes for that commit, as they
> often introduce regressions or have some other bug - and that's the
> case here. We also need to backport
> the following 3 commits:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5
>
> One regression, the one regarding rewinddir(3), even has a test case
> in fstests too (generic/471) and would have been caught
> when running the "dir" group tests in fstests:
My bad, I get used to be informed by our internal building system about
missing fixes.
And obviously there is no such automatic systems checking missing fixes
here...
>
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626
>
> I'll work on making backports of those 3 other patches on top of your
> backport, and then send all of them in a series,
> including your patch, to make it easier to follow and apply all at once.
Thanks a lot, that's much appreciated.
Thanks,
Qu
>
> Thanks.
>
>
>> fs/btrfs/ctree.h | 1 +
>> fs/btrfs/delayed-inode.c | 5 +-
>> fs/btrfs/delayed-inode.h | 1 +
>> fs/btrfs/inode.c | 131 +++++++++++++++++++++++----------------
>> 4 files changed, 84 insertions(+), 54 deletions(-)
>> ---
>> Changelog:
>> v2:
>> - Fix the upstream commit hash
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 1467bf439cb4..7905f178efa3 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1361,6 +1361,7 @@ struct btrfs_drop_extents_args {
>>
>> struct btrfs_file_private {
>> void *filldir_buf;
>> + u64 last_index;
>> };
>>
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index fd951aeaeac5..5a98c5da1225 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1513,6 +1513,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
>> }
>>
>> bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> + u64 last_index,
>> struct list_head *ins_list,
>> struct list_head *del_list)
>> {
>> @@ -1532,14 +1533,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
>>
>> mutex_lock(&delayed_node->mutex);
>> item = __btrfs_first_delayed_insertion_item(delayed_node);
>> - while (item) {
>> + while (item && item->key.offset <= last_index) {
>> refcount_inc(&item->refs);
>> list_add_tail(&item->readdir_list, ins_list);
>> item = __btrfs_next_delayed_item(item);
>> }
>>
>> item = __btrfs_first_delayed_deletion_item(delayed_node);
>> - while (item) {
>> + while (item && item->key.offset <= last_index) {
>> refcount_inc(&item->refs);
>> list_add_tail(&item->readdir_list, del_list);
>> item = __btrfs_next_delayed_item(item);
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> index b2412160c5bc..a9cfce856d2e 100644
>> --- a/fs/btrfs/delayed-inode.h
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -123,6 +123,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
>>
>> /* Used for readdir() */
>> bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> + u64 last_index,
>> struct list_head *ins_list,
>> struct list_head *del_list);
>> void btrfs_readdir_put_delayed_items(struct inode *inode,
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 95af29634e55..1df374ce829b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6121,6 +6121,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>> return d_splice_alias(inode, dentry);
>> }
>>
>> +/*
>> + * Find the highest existing sequence number in a directory and then set the
>> + * in-memory index_cnt variable to the first free sequence number.
>> + */
>> +static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> +{
>> + struct btrfs_root *root = inode->root;
>> + struct btrfs_key key, found_key;
>> + struct btrfs_path *path;
>> + struct extent_buffer *leaf;
>> + int ret;
>> +
>> + key.objectid = btrfs_ino(inode);
>> + key.type = BTRFS_DIR_INDEX_KEY;
>> + key.offset = (u64)-1;
>> +
>> + path = btrfs_alloc_path();
>> + if (!path)
>> + return -ENOMEM;
>> +
>> + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> + if (ret < 0)
>> + goto out;
>> + /* FIXME: we should be able to handle this */
>> + if (ret == 0)
>> + goto out;
>> + ret = 0;
>> +
>> + if (path->slots[0] == 0) {
>> + inode->index_cnt = BTRFS_DIR_START_INDEX;
>> + goto out;
>> + }
>> +
>> + path->slots[0]--;
>> +
>> + leaf = path->nodes[0];
>> + btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> + if (found_key.objectid != btrfs_ino(inode) ||
>> + found_key.type != BTRFS_DIR_INDEX_KEY) {
>> + inode->index_cnt = BTRFS_DIR_START_INDEX;
>> + goto out;
>> + }
>> +
>> + inode->index_cnt = found_key.offset + 1;
>> +out:
>> + btrfs_free_path(path);
>> + return ret;
>> +}
>> +
>> +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
>> +{
>> + if (dir->index_cnt == (u64)-1) {
>> + int ret;
>> +
>> + ret = btrfs_inode_delayed_dir_index_count(dir);
>> + if (ret) {
>> + ret = btrfs_set_inode_index_count(dir);
>> + if (ret)
>> + return ret;
>> + }
>> + }
>> +
>> + *index = dir->index_cnt;
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * All this infrastructure exists because dir_emit can fault, and we are holding
>> * the tree lock when doing readdir. For now just allocate a buffer and copy
>> @@ -6133,10 +6201,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>> static int btrfs_opendir(struct inode *inode, struct file *file)
>> {
>> struct btrfs_file_private *private;
>> + u64 last_index;
>> + int ret;
>> +
>> + ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
>> + if (ret)
>> + return ret;
>>
>> private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
>> if (!private)
>> return -ENOMEM;
>> + private->last_index = last_index;
>> private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>> if (!private->filldir_buf) {
>> kfree(private);
>> @@ -6205,7 +6280,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>>
>> INIT_LIST_HEAD(&ins_list);
>> INIT_LIST_HEAD(&del_list);
>> - put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
>> + put = btrfs_readdir_get_delayed_items(inode, private->last_index,
>> + &ins_list, &del_list);
>>
>> again:
>> key.type = BTRFS_DIR_INDEX_KEY;
>> @@ -6238,6 +6314,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>> break;
>> if (found_key.offset < ctx->pos)
>> goto next;
>> + if (found_key.offset > private->last_index)
>> + break;
>> if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
>> goto next;
>> di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
>> @@ -6371,57 +6449,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
>> return dirty ? btrfs_dirty_inode(inode) : 0;
>> }
>>
>> -/*
>> - * find the highest existing sequence number in a directory
>> - * and then set the in-memory index_cnt variable to reflect
>> - * free sequence numbers
>> - */
>> -static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> -{
>> - struct btrfs_root *root = inode->root;
>> - struct btrfs_key key, found_key;
>> - struct btrfs_path *path;
>> - struct extent_buffer *leaf;
>> - int ret;
>> -
>> - key.objectid = btrfs_ino(inode);
>> - key.type = BTRFS_DIR_INDEX_KEY;
>> - key.offset = (u64)-1;
>> -
>> - path = btrfs_alloc_path();
>> - if (!path)
>> - return -ENOMEM;
>> -
>> - ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> - if (ret < 0)
>> - goto out;
>> - /* FIXME: we should be able to handle this */
>> - if (ret == 0)
>> - goto out;
>> - ret = 0;
>> -
>> - if (path->slots[0] == 0) {
>> - inode->index_cnt = BTRFS_DIR_START_INDEX;
>> - goto out;
>> - }
>> -
>> - path->slots[0]--;
>> -
>> - leaf = path->nodes[0];
>> - btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> -
>> - if (found_key.objectid != btrfs_ino(inode) ||
>> - found_key.type != BTRFS_DIR_INDEX_KEY) {
>> - inode->index_cnt = BTRFS_DIR_START_INDEX;
>> - goto out;
>> - }
>> -
>> - inode->index_cnt = found_key.offset + 1;
>> -out:
>> - btrfs_free_path(path);
>> - return ret;
>> -}
>> -
>> /*
>> * helper to find a free sequence number in a given directory. This current
>> * code is very simple, later versions will do smarter things in the btree
>> --
>> 2.43.0
>>
>>
>
prev parent reply other threads:[~2024-01-25 23:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 9:50 [PATCH v2] btrfs: fix infinite directory reads Qu Wenruo
2024-01-25 10:02 ` Filipe Manana
2024-01-25 11:33 ` Eugeniu Rosca
2024-01-25 12:03 ` Filipe Manana
2024-01-25 23:06 ` Qu Wenruo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d52c41e0-1f82-4936-bfd4-e6e989560edf@gmx.com \
--to=quwenruo.btrfs@gmx.com \
--cc=dsterba@suse.com \
--cc=erosca@de.adit-jv.com \
--cc=fdmanana@kernel.org \
--cc=fdmanana@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=rob@landley.net \
--cc=stable@vger.kernel.org \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox