linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jitindar SIngh, Suraj" <surajjs@amazon.com>
To: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"tytso@mit.edu" <tytso@mit.edu>
Cc: "paulmck@kernel.org" <paulmck@kernel.org>
Subject: Re: [PATCH RFC] ext4: fix potential race between online resizing and write operations
Date: Wed, 19 Feb 2020 03:09:07 +0000	[thread overview]
Message-ID: <6ad43fbad38c8f986f35995ed61f9077abd3b0cc.camel@amazon.com> (raw)
In-Reply-To: <20200215233817.GA670792@mit.edu>

On Sat, 2020-02-15 at 18:38 -0500, Theodore Y. Ts'o wrote:
> This is a revision of a proposed patch[1] to fix a bug[2] to fix a
> reported crash caused by the fact that we are growing an array, it's
> possible for another process to try to dereference that array, get
> the
> old copy of the array, and then before it fetch an element of the
> array and use it, it could get reused for something else.
> 
> [1] https://bugzilla.kernel.org/attachment.cgi?id=287189
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=206443
> 
> So this is a pretty classical case of RCU, and in the original
> version
> of the patch[1], it used synchronize_rcu_expedited() followed by a
> call kvfree().  If you read the RCU documentation it states that you
> really shouldn't call synchronize_rcu() and kfree() in a loop, and
> while synchronize_rcu_expedited() does speed things up, it does so by
> impacting the performance of all the other CPU's.
> 
> And unfortunately add_new_gdb() get's called in a loop.  If you
> expand
> a file system by say, 1TB, add_new_gdb() and/or add_new_gdb_meta_gb()
> will get called 8,192 times.
> 
> To fix this, I added ext4_kvfree_array_rcu() which allocates an
> object
> containing a void *ptr and the rcu_head, and then uses call_rcu() to
> free the pointer and the stub object.  I'm cc'ing Paul because I'm a
> bit surprised no one else has needed something like this before; so
> I'm wondering if I'm missing something.  If not, would it make sense
> to make something like kvfree_array_rcu as a more general facility?
> 
>    		       			   - Ted
> 
> From 5ab7e4d38318c125246a4aa899dd614a37c803ef Mon Sep 17 00:00:00
> 2001
> From: Theodore Ts'o <tytso@mit.edu>
> Date: Sat, 15 Feb 2020 16:40:37 -0500
> Subject: [PATCH] ext4: fix potential race between online resizing and
> write operations
> 
> During an online resize an array of pointers to buffer heads gets
> replaced so it can get enlarged.  If there is a racing block
> allocation or deallocation which uses the old array, and the old
> array
> has gotten reused this can lead to a GPF or some other random kernel
> memory getting modified.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
> Reported-by: Suraj Jitindar Singh <surajjs@amazon.com>
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>

One comment below where I think you free the wrong object.

With that fixed up:
Tested-by: Suraj Jitindar Singh <surajjs@amazon.com>

> Cc: stable@kernel.org
> ---
>  fs/ext4/resize.c | 35 +++++++++++++++++++++++++++++++----
>  fs/ext4/balloc.c | 15 ++++++++++++---
>  fs/ext4/ext4.h   |  1 +
>  3 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 86a2500ed292..98d3b4ec3422 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -17,6 +17,33 @@
>  
>  #include "ext4_jbd2.h"
>  
> +struct ext4_rcu_ptr {
> +	struct rcu_head rcu;
> +	void *ptr;
> +};
> +
> +static void ext4_rcu_ptr_callback(struct rcu_head *head)
> +{
> +	struct ext4_rcu_ptr *ptr;
> +
> +	ptr = container_of(head, struct ext4_rcu_ptr, rcu);
> +	kvfree(ptr->ptr);
> +	kfree(ptr);
> +}
> +
> +void ext4_kvfree_array_rcu(void *to_free)
> +{
> +	struct ext4_rcu_ptr *ptr = kzalloc(sizeof(*ptr), GFP_KERNEL);
> +
> +	if (ptr) {
> +		ptr->ptr = to_free;
> +		call_rcu(&ptr->rcu, ext4_rcu_ptr_callback);
> +		return;
> +	}
> +	synchronize_rcu();

The below needs to be:
kvfree(to_free);

> +	kvfree(ptr);
> +}
> +
>  int ext4_resize_begin(struct super_block *sb)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
> @@ -864,9 +891,9 @@ static int add_new_gdb(handle_t *handle, struct
> inode *inode,
>  	memcpy(n_group_desc, o_group_desc,
>  	       EXT4_SB(sb)->s_gdb_count * sizeof(struct buffer_head
> *));
>  	n_group_desc[gdb_num] = gdb_bh;
> -	EXT4_SB(sb)->s_group_desc = n_group_desc;
> +	rcu_assign_pointer(EXT4_SB(sb)->s_group_desc, n_group_desc);
>  	EXT4_SB(sb)->s_gdb_count++;
> -	kvfree(o_group_desc);
> +	ext4_kvfree_array_rcu(o_group_desc);
>  
>  	le16_add_cpu(&es->s_reserved_gdt_blocks, -1);
>  	err = ext4_handle_dirty_super(handle, sb);
> @@ -922,9 +949,9 @@ static int add_new_gdb_meta_bg(struct super_block
> *sb,
>  		return err;
>  	}
>  
> -	EXT4_SB(sb)->s_group_desc = n_group_desc;
> +	rcu_assign_pointer(EXT4_SB(sb)->s_group_desc, n_group_desc);
>  	EXT4_SB(sb)->s_gdb_count++;
> -	kvfree(o_group_desc);
> +	ext4_kvfree_array_rcu(o_group_desc);
>  	return err;
>  }
>  
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 5f993a411251..5368bf67300b 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -270,6 +270,7 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
>  	ext4_group_t ngroups = ext4_get_groups_count(sb);
>  	struct ext4_group_desc *desc;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
> +	struct buffer_head *bh_p;
>  
>  	if (block_group >= ngroups) {
>  		ext4_error(sb, "block_group >= groups_count -
> block_group = %u,"
> @@ -280,7 +281,15 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
>  
>  	group_desc = block_group >> EXT4_DESC_PER_BLOCK_BITS(sb);
>  	offset = block_group & (EXT4_DESC_PER_BLOCK(sb) - 1);
> -	if (!sbi->s_group_desc[group_desc]) {
> +	rcu_read_lock();
> +	bh_p = rcu_dereference(sbi->s_group_desc)[group_desc];
> +	/*
> +	 * We can unlock here since the pointer being dereferenced
> won't be
> +	 * dereferenced again. By looking at the usage in add_new_gdb()
> the
> +	 * value isn't modified, just the pointer, and so it remains
> valid.
> +	 */
> +	rcu_read_unlock();
> +	if (!bh_p) {
>  		ext4_error(sb, "Group descriptor not loaded - "
>  			   "block_group = %u, group_desc = %u, desc =
> %u",
>  			   block_group, group_desc, offset);
> @@ -288,10 +297,10 @@ struct ext4_group_desc *
> ext4_get_group_desc(struct super_block *sb,
>  	}
>  
>  	desc = (struct ext4_group_desc *)(
> -		(__u8 *)sbi->s_group_desc[group_desc]->b_data +
> +		(__u8 *)bh_p->b_data +
>  		offset * EXT4_DESC_SIZE(sb));
>  	if (bh)
> -		*bh = sbi->s_group_desc[group_desc];
> +		*bh = bh_p;
>  	return desc;
>  }
>  
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 4441331d06cc..b7824d56b968 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2730,8 +2730,8 @@ extern int ext4_generic_delete_entry(handle_t
> *handle,
>  extern bool ext4_empty_dir(struct inode *inode);
>  
>  /* resize.c */
> +extern void ext4_kvfree_array_rcu(void *to_free);
>  extern int ext4_group_add(struct super_block *sb,
>  				struct ext4_new_group_data *input);
>  extern int ext4_group_extend(struct super_block *sb,
> 

  parent reply	other threads:[~2020-02-19  3:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-15 23:38 [PATCH RFC] ext4: fix potential race between online resizing and write operations Theodore Y. Ts'o
2020-02-16 12:12 ` Paul E. McKenney
2020-02-16 20:32   ` Theodore Y. Ts'o
2020-02-17 16:08   ` Uladzislau Rezki
2020-02-17 19:33     ` Theodore Y. Ts'o
2020-02-18 17:08       ` Uladzislau Rezki
2020-02-20  4:52         ` Theodore Y. Ts'o
2020-02-21  0:30           ` Paul E. McKenney
2020-02-21 13:14             ` Uladzislau Rezki
2020-02-21 20:22               ` Paul E. McKenney
2020-02-22 22:24                 ` Joel Fernandes
2020-02-23  1:10                   ` Paul E. McKenney
2020-02-24 17:40                     ` Uladzislau Rezki
2020-02-25  2:07                       ` Joel Fernandes
2020-02-25  3:55                         ` Paul E. McKenney
2020-02-25 14:17                           ` Joel Fernandes
2020-02-25 16:38                             ` Paul E. McKenney
2020-02-25 17:00                               ` Joel Fernandes
2020-02-25 18:54                         ` Uladzislau Rezki
2020-02-25 22:47                           ` Paul E. McKenney
2020-02-26 13:04                             ` Uladzislau Rezki
2020-02-26 15:06                               ` Paul E. McKenney
2020-02-26 15:53                                 ` Uladzislau Rezki
2020-02-27 14:08                                   ` Joel Fernandes
2020-03-01 11:13                                     ` Uladzislau Rezki
2020-02-27 13:37                           ` Joel Fernandes
2020-03-01 11:08                             ` Uladzislau Rezki
2020-03-01 12:07                               ` Uladzislau Rezki
2020-02-25  2:11                     ` Joel Fernandes
2020-02-21 12:06         ` Joel Fernandes
2020-02-21 13:28           ` Joel Fernandes
2020-02-21 19:21             ` Uladzislau Rezki
2020-02-21 19:25               ` Uladzislau Rezki
2020-02-22 22:12               ` Joel Fernandes
2020-02-24 17:02                 ` Uladzislau Rezki
2020-02-24 23:14                   ` Paul E. McKenney
2020-02-25  1:48                   ` Joel Fernandes
2020-02-19  3:09 ` Jitindar SIngh, Suraj [this message]
2020-02-20  4:34   ` Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ad43fbad38c8f986f35995ed61f9077abd3b0cc.camel@amazon.com \
    --to=surajjs@amazon.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).