All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrei Vagin <avagin@gmail.com>
To: David Howells <dhowells@redhat.com>
Cc: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 3/5] cgroup: Fix refcounting
Date: Fri, 4 Jan 2019 00:16:39 -0800	[thread overview]
Message-ID: <20190104081638.GA17865@gmail.com> (raw)
In-Reply-To: <154655905608.3032.5762393419161551596.stgit@warthog.procyon.org.uk>

On Thu, Jan 03, 2019 at 11:44:16PM +0000, David Howells wrote:
> Fix cgroup refcounting by the following means:
> 
>  (1) Don't use PERCPU_REF_INIT_DEAD and percpu_ref_reinit().  Using this
>      causes a problem should kernfs_get_tree() create a superblock and then
>      fail to create a root inode.  The superblock destructor will be invoked
>      before kernfs_get_tree() returns - and the refcount isn't reinit'd till
>      after that.
> 
>      To this end, cgroup_setup_root() no longer needs a ref_flags argument.
> 
>  (2) Provide a flag, CSS_CREATING, that is used to prevent concurrent access
>      to a cgroup root that is being set up and for which the superblock is
>      still being set up.  This appears to be necessary to hide the fact that
>      the cgroup is accessible before the superblock is fully initialised.
> 
>  (3) Set CSS_CREATING in cgroup1_get_tree() on a new cgroup_root.  This is
>      then cleared in cgroup_do_get_tree().
> 
>  (4) cgroup_get_tree() is made to call cgroup_get() on the root it sets for a
>      v2 cgroup.  Admittedly, this doesn't do anything because CSS_NO_REF is
>      set, but it future proofs it in case this is changed in future.
> 
>  (5) cgroup_fill_super() is created and passed to kernfs_get_tree() in the
>      kernfs_fs_context struct.  This takes an extra ref on the root for the
>      superblock in the event that a superblock is created.
> 
>      struct cgroup_fs_context::root then holds a single ref on the root right
>      through till it is freed.
> 
> Note that new_root is transferred into the cgroup_fs_context as
> is_new_root, though this is probably unnecessary as it's only used to clear
> CSS_CREATING - and no one else can gain access to the root until we've
> cleared the flag.
> 

I haven't read the patch yet, but all my tests passed without any
visible problems with this series. Thanks!

https://travis-ci.org/avagin/linux/builds/475101517

Tested-by: Andrei Vagin <avagin@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>

> Fixes: b3678086951a ("kernfs, sysfs, cgroup, intel_rdt: Support fs_context")
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  include/linux/cgroup-defs.h     |    1 +
>  include/linux/cgroup.h          |    4 ++++
>  kernel/cgroup/cgroup-internal.h |    3 ++-
>  kernel/cgroup/cgroup-v1.c       |   25 +++++++++----------------
>  kernel/cgroup/cgroup.c          |   24 +++++++++++++++++++++---
>  5 files changed, 37 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 5e1694fe035b..f5da2396a809 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -52,6 +52,7 @@ enum {
>  	CSS_RELEASED	= (1 << 2), /* refcnt reached zero, released */
>  	CSS_VISIBLE	= (1 << 3), /* css is visible to userland */
>  	CSS_DYING	= (1 << 4), /* css is dying */
> +	CSS_CREATING	= (1 << 5), /* Root css is being constructed */
>  };
>  
>  /* bits in struct cgroup flags field */
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index bb0c7da50ed2..5708ad663572 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -333,6 +333,8 @@ static inline void css_get_many(struct cgroup_subsys_state *css, unsigned int n)
>   */
>  static inline bool css_tryget(struct cgroup_subsys_state *css)
>  {
> +	if (css->flags & CSS_CREATING)
> +		return false;
>  	if (!(css->flags & CSS_NO_REF))
>  		return percpu_ref_tryget(&css->refcnt);
>  	return true;
> @@ -350,6 +352,8 @@ static inline bool css_tryget(struct cgroup_subsys_state *css)
>   */
>  static inline bool css_tryget_online(struct cgroup_subsys_state *css)
>  {
> +	if (css->flags & CSS_CREATING)
> +		return false;
>  	if (!(css->flags & CSS_NO_REF))
>  		return percpu_ref_tryget_live(&css->refcnt);
>  	return true;
> diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
> index ff86943ea1c8..b22c3d95d8eb 100644
> --- a/kernel/cgroup/cgroup-internal.h
> +++ b/kernel/cgroup/cgroup-internal.h
> @@ -46,6 +46,7 @@ struct cgroup_fs_context {
>  	unsigned int	flags;			/* CGRP_ROOT_* flags */
>  
>  	/* cgroup1 bits */
> +	bool		is_new_root;		/* ->root is new and needs refcnt init */
>  	bool		cpuset_clone_children;
>  	bool		none;			/* User explicitly requested empty subsystem */
>  	bool		all_ss;			/* Seen 'all' option */
> @@ -214,7 +215,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
>  
>  void cgroup_free_root(struct cgroup_root *root);
>  void init_cgroup_root(struct cgroup_fs_context *ctx);
> -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
> +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
>  int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
>  int cgroup_do_get_tree(struct fs_context *fc);
>  
> diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> index 4b189e821cad..0fbbde86a64d 100644
> --- a/kernel/cgroup/cgroup-v1.c
> +++ b/kernel/cgroup/cgroup-v1.c
> @@ -1156,7 +1156,6 @@ int cgroup1_get_tree(struct fs_context *fc)
>  	struct cgroup_root *root;
>  	struct cgroup_subsys *ss;
>  	int i, ret = cgroup1_validate(fc);
> -	bool new_root = false;
>  
>  	if (ret)
>  		return ret;
> @@ -1261,12 +1260,19 @@ int cgroup1_get_tree(struct fs_context *fc)
>  		ret = -ENOMEM;
>  		goto err_unlock;
>  	}
> -	new_root = true;
>  	ctx->root = root;
> +	ctx->is_new_root = true;
>  
>  	init_cgroup_root(ctx);
>  
> -	ret = cgroup_setup_root(root, ctx->subsys_mask, PERCPU_REF_INIT_DEAD);
> +	/*
> +	 * There's a race window after we release cgroup_mutex and before
> +	 * allocating a superblock. Make sure a concurrent process won't be
> +	 * able to re-use the root during this window by setting CSS_CREATING.
> +	 */
> +	root->cgrp.self.flags |= CSS_CREATING;
> +
> +	ret = cgroup_setup_root(root, ctx->subsys_mask);
>  	if (ret)
>  		goto err_unlock;
>  
> @@ -1275,19 +1281,6 @@ int cgroup1_get_tree(struct fs_context *fc)
>  
>  	ret = cgroup_do_get_tree(fc);
>  
> -	/*
> -	 * There's a race window after we release cgroup_mutex and before
> -	 * allocating a superblock. Make sure a concurrent process won't
> -	 * be able to re-use the root during this window by delaying the
> -	 * initialization of root refcnt.
> -	 */
> -	if (new_root) {
> -		mutex_lock(&cgroup_mutex);
> -		percpu_ref_reinit(&root->cgrp.self.refcnt);
> -		mutex_unlock(&cgroup_mutex);
> -		cgroup_get(&root->cgrp);
> -	}
> -
>  	/*
>  	 * If @pinned_sb, we're reusing an existing root and holding an
>  	 * extra ref on its sb.  Mount is complete.  Put the extra ref.
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 2e5150412ae0..091e7eca3661 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1909,7 +1909,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx)
>  		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
>  }
>  
> -int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
> +int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
>  {
>  	LIST_HEAD(tmp_links);
>  	struct cgroup *root_cgrp = &root->cgrp;
> @@ -1926,7 +1926,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
>  	root_cgrp->ancestor_ids[0] = ret;
>  
>  	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
> -			      ref_flags, GFP_KERNEL);
> +			      0, GFP_KERNEL);
>  	if (ret)
>  		goto out;
>  
> @@ -2010,12 +2010,23 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
>  	return ret;
>  }
>  
> +static int cgroup_fill_super(struct super_block *sb, struct fs_context *fc)
> +{
> +	struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
> +
> +	mutex_lock(&cgroup_mutex);
> +	cgroup_get(&ctx->root->cgrp);
> +	mutex_unlock(&cgroup_mutex);
> +	return 0;
> +}
> +
>  int cgroup_do_get_tree(struct fs_context *fc)
>  {
>  	struct cgroup_fs_context *ctx = cgroup_fc2context(fc);
>  	int ret;
>  
>  	ctx->kfc.root = ctx->root->kf_root;
> +	ctx->kfc.fill_super = cgroup_fill_super;
>  
>  	ret = kernfs_get_tree(fc);
>  	if (ret < 0)
> @@ -2046,6 +2057,12 @@ int cgroup_do_get_tree(struct fs_context *fc)
>  
>  	if (ctx->version == 2)
>  		apply_cgroup_root_flags(ctx->flags);
> +
> +	if (ctx->is_new_root) {
> +		mutex_lock(&cgroup_mutex);
> +		ctx->root->cgrp.self.flags &= ~CSS_CREATING;
> +		mutex_unlock(&cgroup_mutex);
> +	}
>  	ret = 0;
>  out_cgrp:
>  	return ret;
> @@ -2071,6 +2088,7 @@ static int cgroup_get_tree(struct fs_context *fc)
>  		cgroup_get_live(&cgrp_dfl_root.cgrp);
>  
>  		ctx->root = &cgrp_dfl_root;
> +		cgroup_get(&ctx->root->cgrp);
>  		return cgroup_do_get_tree(fc);
>  
>  	default:
> @@ -5420,7 +5438,7 @@ int __init cgroup_init(void)
>  	hash_add(css_set_table, &init_css_set.hlist,
>  		 css_set_hash(init_css_set.subsys));
>  
> -	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
> +	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
>  
>  	mutex_unlock(&cgroup_mutex);
>  
> 

  reply	other threads:[~2019-01-04  8:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-03 23:44 [PATCH 1/5] cgroup2: Always apply root flags on successful superblock obtainance David Howells
2019-01-03 23:44 ` [PATCH 2/5] kernfs: Provide a fill_super hook for the subclass filesystem David Howells
2019-01-03 23:44 ` [PATCH 3/5] cgroup: Fix refcounting David Howells
2019-01-04  8:16   ` Andrei Vagin [this message]
2019-01-05  6:37   ` Al Viro
2019-01-05  7:21     ` David Howells
2019-01-05 16:05     ` Al Viro
2019-01-03 23:44 ` [PATCH 4/5] cgroup: refcount fixes David Howells
2019-01-03 23:56   ` David Howells
2019-01-03 23:44 ` [PATCH 5/5] percpu: Kill off PERCPU_REF_INIT_DEAD and percpu_ref_reinit() David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190104081638.GA17865@gmail.com \
    --to=avagin@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.