From: "Jitindar SIngh, Suraj" <surajjs@amazon.com>
To: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
"tytso@mit.edu" <tytso@mit.edu>
Cc: "paulmck@kernel.org" <paulmck@kernel.org>
Subject: Re: [PATCH RFC] ext4: fix potential race between online resizing and write operations
Date: Wed, 19 Feb 2020 03:09:07 +0000 [thread overview]
Message-ID: <6ad43fbad38c8f986f35995ed61f9077abd3b0cc.camel@amazon.com> (raw)
In-Reply-To: <20200215233817.GA670792@mit.edu>
On Sat, 2020-02-15 at 18:38 -0500, Theodore Y. Ts'o wrote:
> This is a revision of a proposed patch[1] to fix a bug[2] to fix a
> reported crash caused by the fact that we are growing an array, it's
> possible for another process to try to dereference that array, get
> the
> old copy of the array, and then before it fetch an element of the
> array and use it, it could get reused for something else.
>
> [1] https://bugzilla.kernel.org/attachment.cgi?id=287189
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=206443
>
> So this is a pretty classical case of RCU, and in the original
> version
> of the patch[1], it used synchronize_rcu_expedited() followed by a
> call kvfree(). If you read the RCU documentation it states that you
> really shouldn't call synchronize_rcu() and kfree() in a loop, and
> while synchronize_rcu_expedited() does speed things up, it does so by
> impacting the performance of all the other CPU's.
>
> And unfortunately add_new_gdb() get's called in a loop. If you
> expand
> a file system by say, 1TB, add_new_gdb() and/or add_new_gdb_meta_gb()
> will get called 8,192 times.
>
> To fix this, I added ext4_kvfree_array_rcu() which allocates an
> object
> containing a void *ptr and the rcu_head, and then uses call_rcu() to
> free the pointer and the stub object. I'm cc'ing Paul because I'm a
> bit surprised no one else has needed something like this before; so
> I'm wondering if I'm missing something. If not, would it make sense
> to make something like kvfree_array_rcu as a more general facility?
>
> - Ted
>
> From 5ab7e4d38318c125246a4aa899dd614a37c803ef Mon Sep 17 00:00:00
> 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Sat, 15 Feb 2020 16:40:37 -0500
> Subject: [PATCH] ext4: fix potential race between online resizing and
> write operations
>
> During an online resize an array of pointers to buffer heads gets
> replaced so it can get enlarged. If there is a racing block
> allocation or deallocation which uses the old array, and the old
> array
> has gotten reused this can lead to a GPF or some other random kernel
> memory getting modified.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
> Reported-by: Suraj Jitindar Singh <surajjs@amazon.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
One comment below where I think you free the wrong object.
With that fixed up:
Tested-by: Suraj Jitindar Singh <surajjs@amazon.com>
> Cc: stable@kernel.org
> ---
> fs/ext4/resize.c | 35 +++++++++++++++++++++++++++++++----
> fs/ext4/balloc.c | 15 ++++++++++++---
> fs/ext4/ext4.h | 1 +
> 3 files changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 86a2500ed292..98d3b4ec3422 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -17,6 +17,33 @@
>
> #include "ext4_jbd2.h"
>
> +struct ext4_rcu_ptr {
> + struct rcu_head rcu;
> + void *ptr;
> +};
> +
> +static void ext4_rcu_ptr_callback(struct rcu_head *head)
> +{
> + struct ext4_rcu_ptr *ptr;
> +
> + ptr = container_of(head, struct ext4_rcu_ptr, rcu);
> + kvfree(ptr->ptr);
> + kfree(ptr);
> +}
> +
> +void ext4_kvfree_array_rcu(void *to_free)
> +{
> + struct ext4_rcu_ptr *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
> +
> + if (ptr) {
> + ptr->ptr = to_free;
> + call_rcu(&ptr->rcu, ext4_rcu_ptr_callback);
> + return;
> + }
> + synchronize_rcu();
The below needs to be:
kvfree(to_free);
> + kvfree(ptr);
> +}
> +
> int ext4_resize_begin(struct super_block *sb)
> {
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> @@ -864,9 +891,9 @@ static int add_new_gdb(handle_t *handle, struct
> inode *inode,
> memcpy(n_group_desc, o_group_desc,
> EXT4_SB(sb)->s_gdb_count * sizeof(struct buffer_head
> *));
> n_group_desc[gdb_num] = gdb_bh;
> - EXT4_SB(sb)->s_group_desc = n_group_desc;
> + rcu_assign_pointer(EXT4_SB(sb)->s_group_desc, n_group_desc);
> EXT4_SB(sb)->s_gdb_count++;
> - kvfree(o_group_desc);
> + ext4_kvfree_array_rcu(o_group_desc);
>
> le16_add_cpu(&es->s_reserved_gdt_blocks, -1);
> err = ext4_handle_dirty_super(handle, sb);
> @@ -922,9 +949,9 @@ static int add_new_gdb_meta_bg(struct super_block
> *sb,
> return err;
> }
>
> - EXT4_SB(sb)->s_group_desc = n_group_desc;
> + rcu_assign_pointer(EXT4_SB(sb)->s_group_desc, n_group_desc);
> EXT4_SB(sb)->s_gdb_count++;
> - kvfree(o_group_desc);
> + ext4_kvfree_array_rcu(o_group_desc);
> return err;
> }
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 5f993a411251..5368bf67300b 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -270,6 +270,7 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
> ext4_group_t ngroups = ext4_get_groups_count(sb);
> struct ext4_group_desc *desc;
> struct ext4_sb_info *sbi = EXT4_SB(sb);
> + struct buffer_head *bh_p;
>
> if (block_group >= ngroups) {
> ext4_error(sb, "block_group >= groups_count -
> block_group = %u,"
> @@ -280,7 +281,15 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
>
> group_desc = block_group >> EXT4_DESC_PER_BLOCK_BITS(sb);
> offset = block_group & (EXT4_DESC_PER_BLOCK(sb) - 1);
> - if (!sbi->s_group_desc[group_desc]) {
> + rcu_read_lock();
> + bh_p = rcu_dereference(sbi->s_group_desc)[group_desc];
> + /*
> + * We can unlock here since the pointer being dereferenced
> won't be
> + * dereferenced again. By looking at the usage in add_new_gdb()
> the
> + * value isn't modified, just the pointer, and so it remains
> valid.
> + */
> + rcu_read_unlock();
> + if (!bh_p) {
> ext4_error(sb, "Group descriptor not loaded - "
> "block_group = %u, group_desc = %u, desc =
> %u",
> block_group, group_desc, offset);
> @@ -288,10 +297,10 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
> }
>
> desc = (struct ext4_group_desc *)(
> - (__u8 *)sbi->s_group_desc[group_desc]->b_data +
> + (__u8 *)bh_p->b_data +
> offset * EXT4_DESC_SIZE(sb));
> if (bh)
> - *bh = sbi->s_group_desc[group_desc];
> + *bh = bh_p;
> return desc;
> }
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 4441331d06cc..b7824d56b968 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2730,8 +2730,8 @@ extern int ext4_generic_delete_entry(handle_t
> *handle,
> extern bool ext4_empty_dir(struct inode *inode);
>
> /* resize.c */
> +extern void ext4_kvfree_array_rcu(void *to_free);
> extern int ext4_group_add(struct super_block *sb,
> struct ext4_new_group_data *input);
> extern int ext4_group_extend(struct super_block *sb,
>
next prev parent reply other threads:[~2020-02-19 3:09 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-15 23:38 [PATCH RFC] ext4: fix potential race between online resizing and write operations Theodore Y. Ts'o
2020-02-16 12:12 ` Paul E. McKenney
2020-02-16 20:32 ` Theodore Y. Ts'o
2020-02-17 16:08 ` Uladzislau Rezki
2020-02-17 19:33 ` Theodore Y. Ts'o
2020-02-18 17:08 ` Uladzislau Rezki
2020-02-20 4:52 ` Theodore Y. Ts'o
2020-02-21 0:30 ` Paul E. McKenney
2020-02-21 13:14 ` Uladzislau Rezki
2020-02-21 20:22 ` Paul E. McKenney
2020-02-22 22:24 ` Joel Fernandes
2020-02-23 1:10 ` Paul E. McKenney
2020-02-24 17:40 ` Uladzislau Rezki
2020-02-25 2:07 ` Joel Fernandes
2020-02-25 3:55 ` Paul E. McKenney
2020-02-25 14:17 ` Joel Fernandes
2020-02-25 16:38 ` Paul E. McKenney
2020-02-25 17:00 ` Joel Fernandes
2020-02-25 18:54 ` Uladzislau Rezki
2020-02-25 22:47 ` Paul E. McKenney
2020-02-26 13:04 ` Uladzislau Rezki
2020-02-26 15:06 ` Paul E. McKenney
2020-02-26 15:53 ` Uladzislau Rezki
2020-02-27 14:08 ` Joel Fernandes
2020-03-01 11:13 ` Uladzislau Rezki
2020-02-27 13:37 ` Joel Fernandes
2020-03-01 11:08 ` Uladzislau Rezki
2020-03-01 12:07 ` Uladzislau Rezki
2020-02-25 2:11 ` Joel Fernandes
2020-02-21 12:06 ` Joel Fernandes
2020-02-21 13:28 ` Joel Fernandes
2020-02-21 19:21 ` Uladzislau Rezki
2020-02-21 19:25 ` Uladzislau Rezki
2020-02-22 22:12 ` Joel Fernandes
2020-02-24 17:02 ` Uladzislau Rezki
2020-02-24 23:14 ` Paul E. McKenney
2020-02-25 1:48 ` Joel Fernandes
2020-02-19 3:09 ` Jitindar SIngh, Suraj [this message]
2020-02-20 4:34 ` Theodore Y. Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6ad43fbad38c8f986f35995ed61f9077abd3b0cc.camel@amazon.com \
--to=surajjs@amazon.com \
--cc=linux-ext4@vger.kernel.org \
--cc=paulmck@kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).