Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Filipe Manana <fdmanana@kernel.org>, Qu Wenruo <wqu@suse.com>
Cc: linux-btrfs@vger.kernel.org, erosca@de.adit-jv.com,
	Filipe Manana <fdmanana@suse.com>, Rob Landley <rob@landley.net>,
	stable@vger.kernel.org, David Sterba <dsterba@suse.com>
Subject: Re: [PATCH v2] btrfs: fix infinite directory reads
Date: Fri, 26 Jan 2024 09:36:26 +1030	[thread overview]
Message-ID: <d52c41e0-1f82-4936-bfd4-e6e989560edf@gmx.com> (raw)
In-Reply-To: <CAL3q7H77i3kv7C352k2R6nr-m-cgh_cdCCeTkXna+v1yjpMuoA@mail.gmail.com>



On 2024/1/25 20:32, Filipe Manana wrote:
> On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
>>
>> The readdir implementation currently processes always up to the last index
>> it finds. This however can result in an infinite loop if the directory has
>> a large number of entries such that they won't all fit in the given buffer
>> passed to the readdir callback, that is, dir_emit() returns a non-zero
>> value. Because in that case readdir() will be called again and if in the
>> meanwhile new directory entries were added and we still can't put all the
>> remaining entries in the buffer, we keep repeating this over and over.
>>
>> The following C program and test script reproduce the problem:
>>
>>    $ cat /mnt/readdir_prog.c
>>    #include <sys/types.h>
>>    #include <dirent.h>
>>    #include <stdio.h>
>>
>>    int main(int argc, char *argv[])
>>    {
>>      DIR *dir = opendir(".");
>>      struct dirent *dd;
>>
>>      while ((dd = readdir(dir))) {
>>        printf("%s\n", dd->d_name);
>>        rename(dd->d_name, "TEMPFILE");
>>        rename("TEMPFILE", dd->d_name);
>>      }
>>      closedir(dir);
>>    }
>>
>>    $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
>>
>>    $ cat test.sh
>>    #!/bin/bash
>>
>>    DEV=/dev/sdi
>>    MNT=/mnt/sdi
>>
>>    mkfs.btrfs -f $DEV &> /dev/null
>>    #mkfs.xfs -f $DEV &> /dev/null
>>    #mkfs.ext4 -F $DEV &> /dev/null
>>
>>    mount $DEV $MNT
>>
>>    mkdir $MNT/testdir
>>    for ((i = 1; i <= 2000; i++)); do
>>        echo -n > $MNT/testdir/file_$i
>>    done
>>
>>    cd $MNT/testdir
>>    /mnt/readdir_prog
>>
>>    cd /mnt
>>
>>    umount $MNT
>>
>> This behaviour is surprising to applications and it's unlike ext4, xfs,
>> tmpfs, vfat and other filesystems, which always finish. In this case where
>> new entries were added due to renames, some file names may be reported
>> more than once, but this varies according to each filesystem - for example
>> ext4 never reported the same file more than once while xfs reports the
>> first 13 file names twice.
>>
>> So change our readdir implementation to track the last index number when
>> opendir() is called and then make readdir() never process beyond that
>> index number. This gives the same behaviour as ext4.
>>
>> Reported-by: Rob Landley <rob@landley.net>
>> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
>> CC: stable@vger.kernel.org # 5.15
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> [ Resolve a conflict due to member changes in 96d89923fa94 ]
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>
> Thanks for the backport, and running the corresponding test case from
> fstests to verify it's working.
>
> However when backporting a commit, one should also check if there are
> fixes for that commit, as they
> often introduce regressions or have some other bug - and that's the
> case here. We also need to backport
> the following 3 commits:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5
>
> One regression, the one regarding rewinddir(3), even has a test case
> in fstests too (generic/471) and would have been caught
> when running the "dir" group tests in fstests:

My bad, I get used to be informed by our internal building system about
missing fixes.

And obviously there is no such automatic systems checking missing fixes
here...
>
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626
>
> I'll work on making backports of those 3 other patches on top of your
> backport, and then send all of them in a series,
> including your patch, to make it easier to follow and apply all at once.

Thanks a lot, that's much appreciated.

Thanks,
Qu
>
> Thanks.
>
>
>>   fs/btrfs/ctree.h         |   1 +
>>   fs/btrfs/delayed-inode.c |   5 +-
>>   fs/btrfs/delayed-inode.h |   1 +
>>   fs/btrfs/inode.c         | 131 +++++++++++++++++++++++----------------
>>   4 files changed, 84 insertions(+), 54 deletions(-)
>> ---
>> Changelog:
>> v2:
>> - Fix the upstream commit hash
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 1467bf439cb4..7905f178efa3 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1361,6 +1361,7 @@ struct btrfs_drop_extents_args {
>>
>>   struct btrfs_file_private {
>>          void *filldir_buf;
>> +       u64 last_index;
>>   };
>>
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index fd951aeaeac5..5a98c5da1225 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1513,6 +1513,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
>>   }
>>
>>   bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> +                                    u64 last_index,
>>                                       struct list_head *ins_list,
>>                                       struct list_head *del_list)
>>   {
>> @@ -1532,14 +1533,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
>>
>>          mutex_lock(&delayed_node->mutex);
>>          item = __btrfs_first_delayed_insertion_item(delayed_node);
>> -       while (item) {
>> +       while (item && item->key.offset <= last_index) {
>>                  refcount_inc(&item->refs);
>>                  list_add_tail(&item->readdir_list, ins_list);
>>                  item = __btrfs_next_delayed_item(item);
>>          }
>>
>>          item = __btrfs_first_delayed_deletion_item(delayed_node);
>> -       while (item) {
>> +       while (item && item->key.offset <= last_index) {
>>                  refcount_inc(&item->refs);
>>                  list_add_tail(&item->readdir_list, del_list);
>>                  item = __btrfs_next_delayed_item(item);
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> index b2412160c5bc..a9cfce856d2e 100644
>> --- a/fs/btrfs/delayed-inode.h
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -123,6 +123,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
>>
>>   /* Used for readdir() */
>>   bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> +                                    u64 last_index,
>>                                       struct list_head *ins_list,
>>                                       struct list_head *del_list);
>>   void btrfs_readdir_put_delayed_items(struct inode *inode,
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 95af29634e55..1df374ce829b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6121,6 +6121,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>>          return d_splice_alias(inode, dentry);
>>   }
>>
>> +/*
>> + * Find the highest existing sequence number in a directory and then set the
>> + * in-memory index_cnt variable to the first free sequence number.
>> + */
>> +static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> +{
>> +       struct btrfs_root *root = inode->root;
>> +       struct btrfs_key key, found_key;
>> +       struct btrfs_path *path;
>> +       struct extent_buffer *leaf;
>> +       int ret;
>> +
>> +       key.objectid = btrfs_ino(inode);
>> +       key.type = BTRFS_DIR_INDEX_KEY;
>> +       key.offset = (u64)-1;
>> +
>> +       path = btrfs_alloc_path();
>> +       if (!path)
>> +               return -ENOMEM;
>> +
>> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> +       if (ret < 0)
>> +               goto out;
>> +       /* FIXME: we should be able to handle this */
>> +       if (ret == 0)
>> +               goto out;
>> +       ret = 0;
>> +
>> +       if (path->slots[0] == 0) {
>> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> +               goto out;
>> +       }
>> +
>> +       path->slots[0]--;
>> +
>> +       leaf = path->nodes[0];
>> +       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> +       if (found_key.objectid != btrfs_ino(inode) ||
>> +           found_key.type != BTRFS_DIR_INDEX_KEY) {
>> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> +               goto out;
>> +       }
>> +
>> +       inode->index_cnt = found_key.offset + 1;
>> +out:
>> +       btrfs_free_path(path);
>> +       return ret;
>> +}
>> +
>> +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
>> +{
>> +       if (dir->index_cnt == (u64)-1) {
>> +               int ret;
>> +
>> +               ret = btrfs_inode_delayed_dir_index_count(dir);
>> +               if (ret) {
>> +                       ret = btrfs_set_inode_index_count(dir);
>> +                       if (ret)
>> +                               return ret;
>> +               }
>> +       }
>> +
>> +       *index = dir->index_cnt;
>> +
>> +       return 0;
>> +}
>> +
>>   /*
>>    * All this infrastructure exists because dir_emit can fault, and we are holding
>>    * the tree lock when doing readdir.  For now just allocate a buffer and copy
>> @@ -6133,10 +6201,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>>   static int btrfs_opendir(struct inode *inode, struct file *file)
>>   {
>>          struct btrfs_file_private *private;
>> +       u64 last_index;
>> +       int ret;
>> +
>> +       ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
>> +       if (ret)
>> +               return ret;
>>
>>          private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
>>          if (!private)
>>                  return -ENOMEM;
>> +       private->last_index = last_index;
>>          private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>          if (!private->filldir_buf) {
>>                  kfree(private);
>> @@ -6205,7 +6280,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>>
>>          INIT_LIST_HEAD(&ins_list);
>>          INIT_LIST_HEAD(&del_list);
>> -       put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
>> +       put = btrfs_readdir_get_delayed_items(inode, private->last_index,
>> +                                             &ins_list, &del_list);
>>
>>   again:
>>          key.type = BTRFS_DIR_INDEX_KEY;
>> @@ -6238,6 +6314,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>>                          break;
>>                  if (found_key.offset < ctx->pos)
>>                          goto next;
>> +               if (found_key.offset > private->last_index)
>> +                       break;
>>                  if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
>>                          goto next;
>>                  di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
>> @@ -6371,57 +6449,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
>>          return dirty ? btrfs_dirty_inode(inode) : 0;
>>   }
>>
>> -/*
>> - * find the highest existing sequence number in a directory
>> - * and then set the in-memory index_cnt variable to reflect
>> - * free sequence numbers
>> - */
>> -static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> -{
>> -       struct btrfs_root *root = inode->root;
>> -       struct btrfs_key key, found_key;
>> -       struct btrfs_path *path;
>> -       struct extent_buffer *leaf;
>> -       int ret;
>> -
>> -       key.objectid = btrfs_ino(inode);
>> -       key.type = BTRFS_DIR_INDEX_KEY;
>> -       key.offset = (u64)-1;
>> -
>> -       path = btrfs_alloc_path();
>> -       if (!path)
>> -               return -ENOMEM;
>> -
>> -       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> -       if (ret < 0)
>> -               goto out;
>> -       /* FIXME: we should be able to handle this */
>> -       if (ret == 0)
>> -               goto out;
>> -       ret = 0;
>> -
>> -       if (path->slots[0] == 0) {
>> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> -               goto out;
>> -       }
>> -
>> -       path->slots[0]--;
>> -
>> -       leaf = path->nodes[0];
>> -       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> -
>> -       if (found_key.objectid != btrfs_ino(inode) ||
>> -           found_key.type != BTRFS_DIR_INDEX_KEY) {
>> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> -               goto out;
>> -       }
>> -
>> -       inode->index_cnt = found_key.offset + 1;
>> -out:
>> -       btrfs_free_path(path);
>> -       return ret;
>> -}
>> -
>>   /*
>>    * helper to find a free sequence number in a given directory.  This current
>>    * code is very simple, later versions will do smarter things in the btree
>> --
>> 2.43.0
>>
>>
>

      parent reply	other threads:[~2024-01-25 23:06 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25  9:50 [PATCH v2] btrfs: fix infinite directory reads Qu Wenruo
2024-01-25 10:02 ` Filipe Manana
2024-01-25 11:33   ` Eugeniu Rosca
2024-01-25 12:03     ` Filipe Manana
2024-01-25 23:06   ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d52c41e0-1f82-4936-bfd4-e6e989560edf@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=dsterba@suse.com \
    --cc=erosca@de.adit-jv.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=rob@landley.net \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox