All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Yufen Yu <yuyufen@huawei.com>
Cc: Tejun Heo <tj@kernel.org>,
	axboe@kernel.dk, linux-block@vger.kernel.org, jack@suse.cz,
	bvanassche@acm.org
Subject: Re: [PATCH] bdi: fix use-after-free for bdi device
Date: Wed, 19 Feb 2020 13:55:05 +0100	[thread overview]
Message-ID: <20200219125505.GP16121@quack2.suse.cz> (raw)
In-Reply-To: <32a14db2-65e0-5d5c-6c53-45b3474d841d@huawei.com>

Hi!

On Sat 15-02-20 21:54:08, Yufen Yu wrote:
> On 2020/2/14 22:05, Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Feb 14, 2020 at 10:50:01AM +0800, Yufen Yu wrote:
> > > > So, unregistering can leave ->dev along and re-registering can test
> > > > whether it's NULL and if not put the existing one and put a new one
> > > > there. Wouldn't that work?
> > > 
> > > Do you mean set bdi->dev as 'NULL' in call_rcu() callback function
> > > (i.e. bdi_release_device()) and test 'bdi->dev' in bdi_register_va()?
> > > 
> > > I think that may do not work.
> > > We cannot make sure the order of rcu callback function and re-registering.
> > > Then bdi_release_device() may put the new allocated device by re-registering.
> > 
> > No, I meant not freeing bdi->dev on deregistration and only doing so
> > when it actually needs to - on re-registration or release. So, sth
> > like the following.
> > 
> > * Unregister: Unregister bdi->dev but don't free it. Leave the pointer
> >    alone.
> > 
> > * Re-register: If bdi->dev is not null, initiate RCU-free and update
> >    bdi->dev to the new dev.
> >
> > * Release: If bdi->dev is not NULL, initiate RCU-free of it.
> 
> Okay, I think we can do that.
> 
> When do re-register, we need to update bdi->dev as 'NULL' or the new dev
> before invoking call_rcu() to free '->dev'. So readers started after call_rcu()
> cannot read the old dev, like:
> 
> bdi_register_va()
> {
> 	...
> 	if (bdi->dev) {
> 		//rcu_assgin_pointer(bdi->dev, new_dev);
> 		rcu_assgin_pointer(bdi->dev, NULL);
> 		call_rcu();//rcu callback function will free the old dev
> 	}
> 	...
> }
> 
> After assigning new value for bdi->dev, rcu callback function cannot get
> the old device. So I think we may need to replace the '->dev' with a new
> struct pointer, in which includes '->dev' and 'rcu_head'. We pass the
> struct.rcu_head pointer to call_rcu() and then it can get old dev address.
> 
> IMO, maybe we can maintain the original code logic, fix the problem like:

I've now noticed there's commit 68f23b8906 "memcg: fix a crash in wb_workfn
when a device disappears" from end of January which tries to address the
issue you're looking into. Now AFAIU the code is till somewhat racy after
that commit so I wanted to mention this mostly so that you fixup also the
new bdi_dev_name() while you're fixing blkg_dev_name().

Also I was wondering about one thing: If we really care about bdi->dev only
for the name, won't we be much better off with just copying the name to
bdi->name on registration? Sure it would consume a bit of memory for the
name copy but I don't think we really care and things would be IMO *much*
simpler that way... Yufen, Tejun, what do you think?

								Honza

> 
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 4fc87dee005a..e2de4a4e5392 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -185,6 +185,11 @@ struct bdi_writeback {
>  #endif
>  };
> 
> +struct bdi_rcu_device {
> +	struct device *dev;
> +	struct rcu_head rcu_head;
> +};
> +
>  struct backing_dev_info {
>  	u64 id;
>  	struct rb_node rb_node; /* keyed by ->id */
> @@ -219,7 +224,7 @@ struct backing_dev_info {
>  #endif
>  	wait_queue_head_t wb_waitq;
> 
> -	struct device *dev;
> +	struct bdi_rcu_device *rcu_dev;
>  	struct device *owner;
> 
>  	struct timer_list laptop_mode_wb_timer;
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 62f05f605fb5..05f07ce19091 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -850,7 +850,7 @@ static int bdi_init(struct backing_dev_info *bdi)
>  {
>  	int ret;
> 
> -	bdi->dev = NULL;
> +	bdi->rcu_dev = NULL;
> 
>  	kref_init(&bdi->refcnt);
>  	bdi->min_ratio = 0;
> @@ -932,20 +932,28 @@ struct backing_dev_info *bdi_get_by_id(u64 id)
> 
>  int bdi_register_va(struct backing_dev_info *bdi, const char *fmt, va_list args)
>  {
> -	struct device *dev;
>  	struct rb_node *parent, **p;
> +	struct bdi_rcu_device *rcu_dev;
> 
> -	if (bdi->dev)	/* The driver needs to use separate queues per device */
> +	/* The driver needs to use separate queues per device */
> +	if (bdi->rcu_dev)
>  		return 0;
> 
> -	dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0), bdi, fmt, args);
> -	if (IS_ERR(dev))
> -		return PTR_ERR(dev);
> +	rcu_dev = kzalloc(sizeof(struct bdi_rcu_device), GFP_KERNEL);
> +	if (!rcu_dev)
> +		return -ENOMEM;
> +
> +	rcu_dev->dev = device_create_vargs(bdi_class, NULL, MKDEV(0, 0),
> +						bdi, fmt, args);
> +	if (IS_ERR(rcu_dev->dev)) {
> +		kfree(rcu_dev);
> +		return PTR_ERR(rcu_dev->dev);
> +	}
> 
>  	cgwb_bdi_register(bdi);
> -	bdi->dev = dev;
> +	bdi->rcu_dev = rcu_dev;
> 
> -	bdi_debug_register(bdi, dev_name(dev));
> +	bdi_debug_register(bdi, dev_name(rcu_dev->dev));
>  	set_bit(WB_registered, &bdi->wb.state);
> 
>  	spin_lock_bh(&bdi_lock);
> @@ -1005,17 +1013,28 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
>  	synchronize_rcu_expedited();
>  }
> 
> +static void bdi_put_device_rcu(struct rcu_head *rcu)
> +{
> +	struct bdi_rcu_device *rcu_dev = container_of(rcu,
> +			struct bdi_rcu_device, rcu_head);
> +	put_device(rcu_dev->dev);
> +	kfree(rcu_dev);
> +}
> +
>  void bdi_unregister(struct backing_dev_info *bdi)
>  {
> +	struct bdi_rcu_device *rcu_dev = bdi->rcu_dev;
>  	/* make sure nobody finds us on the bdi_list anymore */
>  	bdi_remove_from_list(bdi);
>  	wb_shutdown(&bdi->wb);
>  	cgwb_bdi_unregister(bdi);
> 
> -	if (bdi->dev) {
> +	if (rcu_dev) {
>  		bdi_debug_unregister(bdi);
> -		device_unregister(bdi->dev);
> -		bdi->dev = NULL;
> +		get_device(rcu_dev->dev);
> +		device_unregister(rcu_dev->dev);
> +		rcu_assign_pointer(bdi->rcu_dev, NULL);
> +		call_rcu(&rcu_dev->rcu_head, bdi_put_device_rcu);
>  	}
> 
>  	if (bdi->owner) {
> @@ -1031,7 +1050,7 @@ static void release_bdi(struct kref *ref)
> 
>  	if (test_bit(WB_registered, &bdi->wb.state))
>  		bdi_unregister(bdi);
> -	WARN_ON_ONCE(bdi->dev);
> +	WARN_ON_ONCE(bdi->rcu_dev);
>  	wb_exit(&bdi->wb);
>  	cgwb_bdi_exit(bdi);
>  	kfree(bdi);
> 
> 
> 
> 
> 
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-02-19 12:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 14:00 [PATCH] bdi: fix use-after-free for bdi device Yufen Yu
2020-02-11 13:57 ` Yufen Yu
2020-02-12 21:33   ` Tejun Heo
2020-02-13  2:46     ` Yufen Yu
2020-02-13  3:48       ` Tejun Heo
2020-02-13  7:51         ` Yufen Yu
2020-02-13 13:58           ` Tejun Heo
2020-02-14  2:50             ` Yufen Yu
2020-02-14 14:05               ` Tejun Heo
2020-02-15 13:54                 ` Yufen Yu
2020-02-19 12:55                   ` Jan Kara [this message]
2020-02-19 15:12                     ` Tejun Heo
2020-02-20 11:07                     ` Yufen Yu
2020-02-20 12:07                       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200219125505.GP16121@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=linux-block@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=yuyufen@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.