All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes
@ 2015-07-09 20:39 Tejun Heo
  2015-07-09 20:39 ` [PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Tejun Heo @ 2015-07-09 20:39 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team

During 4.2-rc1 merge window, e48453c386f3 ("block, cgroup: implement
policy-specific per-blkcg data") got merged.  It had several bugs and
a322baad1003 ("block/blk-cgroup.c: free per-blkcg data when freeing
the blkcg") fixed one of them but there are still a couple remaining.

* blkcg_policy[] iteration during css_alloc isn't protected.

* Lazy allocation scheme is broken and policy data may be missing or
  an old policy data may get reused by the wrong policy.

This patchset contains the following four patches to fix the above
issues.

 0001-blkcg-allow-blkcg_pol_mutex-to-be-grabbed-from-cgrou.patch
 0002-blkcg-blkcg_css_alloc-should-grab-blkcg_pol_mutex-wh.patch
 0003-blkcg-implement-all_blkcgs-list.patch
 0004-blkcg-fix-blkcg_policy_data-allocation-bug.patch

0001 and 0003 are prep patches.  0002 and 0004 are the fixes.

This patchset is on top of block/for-linus a322baad1003
("block/blk-cgroup.c: free per-blkcg data when freeing the blkcg") and
available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-cpd-fixes

diffstat follows, thanks.

 block/blk-cgroup.c         |  131 +++++++++++++++++++++++++--------------------
 include/linux/blk-cgroup.h |   11 +--
 2 files changed, 78 insertions(+), 64 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods
  2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
@ 2015-07-09 20:39 ` Tejun Heo
  2015-07-09 20:39 ` [PATCH 2/4] blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] Tejun Heo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2015-07-09 20:39 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team, Tejun Heo

blkcg_pol_mutex primarily protects the blkcg_policy array.  It also
protects cgroup file type [un]registration during policy addition /
removal.  This puts blkcg_pol_mutex outside cgroup internal
synchronization and in turn makes it impossible to grab from blkcg's
cgroup methods as that leads to cyclic dependency.

Another problematic dependency arising from this is through cgroup
interface file deactivation.  Removing a cftype requires removing all
files of the type which in turn involves draining all on-going
invocations of the file methods.  This means that an interface file
implementation can't grab blkcg_pol_mutex as draining can lead to AA
deadlock.

blkcg_reset_stats() is already in this situation.  It currently
trylocks blkcg_pol_mutex and then unwinds and retries the whole
operation on failure, which is cumbersome at best.  It has a lengthy
comment explaining how cgroup internal synchronization is involved and
expected to be updated but as explained above this doesn't need cgroup
internal locking to deadlock.  It's a self-contained AA deadlock.

The described circular dependencies can be easily broken by moving
cftype [un]registration out of blkcg_pol_mutex and protect them with
an outer mutex.  This patch introduces blkcg_pol_register_mutex which
wraps entire policy [un]registration including cftype operations and
shrinks blkcg_pol_mutex critical section.  This also makes the trylock
dancing in blkcg_reset_stats() unnecessary.  Removed.

This patch is necessary for the following blkcg_policy_data allocation
bug fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/blk-cgroup.c | 40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 5e2723f..2ff74ff 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -29,6 +29,14 @@
 
 #define MAX_KEY_LEN 100
 
+/*
+ * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
+ * blkcg_pol_register_mutex nests outside of it and synchronizes entire
+ * policy [un]register operations including cgroup file additions /
+ * removals.  Putting cgroup file registration outside blkcg_pol_mutex
+ * allows grabbing it from cgroup callbacks.
+ */
+static DEFINE_MUTEX(blkcg_pol_register_mutex);
 static DEFINE_MUTEX(blkcg_pol_mutex);
 
 struct blkcg blkcg_root;
@@ -453,20 +461,7 @@ static int blkcg_reset_stats(struct cgroup_subsys_state *css,
 	struct blkcg_gq *blkg;
 	int i;
 
-	/*
-	 * XXX: We invoke cgroup_add/rm_cftypes() under blkcg_pol_mutex
-	 * which ends up putting cgroup's internal cgroup_tree_mutex under
-	 * it; however, cgroup_tree_mutex is nested above cgroup file
-	 * active protection and grabbing blkcg_pol_mutex from a cgroup
-	 * file operation creates a possible circular dependency.  cgroup
-	 * internal locking is planned to go through further simplification
-	 * and this issue should go away soon.  For now, let's trylock
-	 * blkcg_pol_mutex and restart the write on failure.
-	 *
-	 * http://lkml.kernel.org/g/5363C04B.4010400@oracle.com
-	 */
-	if (!mutex_trylock(&blkcg_pol_mutex))
-		return restart_syscall();
+	mutex_lock(&blkcg_pol_mutex);
 	spin_lock_irq(&blkcg->lock);
 
 	/*
@@ -1190,6 +1185,7 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 	if (WARN_ON(pol->pd_size < sizeof(struct blkg_policy_data)))
 		return -EINVAL;
 
+	mutex_lock(&blkcg_pol_register_mutex);
 	mutex_lock(&blkcg_pol_mutex);
 
 	/* find an empty slot */
@@ -1198,19 +1194,23 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 		if (!blkcg_policy[i])
 			break;
 	if (i >= BLKCG_MAX_POLS)
-		goto out_unlock;
+		goto err_unlock;
 
 	/* register and update blkgs */
 	pol->plid = i;
 	blkcg_policy[i] = pol;
+	mutex_unlock(&blkcg_pol_mutex);
 
 	/* everything is in place, add intf files for the new policy */
 	if (pol->cftypes)
 		WARN_ON(cgroup_add_legacy_cftypes(&blkio_cgrp_subsys,
 						  pol->cftypes));
-	ret = 0;
-out_unlock:
+	mutex_unlock(&blkcg_pol_register_mutex);
+	return 0;
+
+err_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
+	mutex_unlock(&blkcg_pol_register_mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkcg_policy_register);
@@ -1223,7 +1223,7 @@ EXPORT_SYMBOL_GPL(blkcg_policy_register);
  */
 void blkcg_policy_unregister(struct blkcg_policy *pol)
 {
-	mutex_lock(&blkcg_pol_mutex);
+	mutex_lock(&blkcg_pol_register_mutex);
 
 	if (WARN_ON(blkcg_policy[pol->plid] != pol))
 		goto out_unlock;
@@ -1233,8 +1233,10 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
 		cgroup_rm_cftypes(pol->cftypes);
 
 	/* unregister and update blkgs */
+	mutex_lock(&blkcg_pol_mutex);
 	blkcg_policy[pol->plid] = NULL;
-out_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
+out_unlock:
+	mutex_unlock(&blkcg_pol_register_mutex);
 }
 EXPORT_SYMBOL_GPL(blkcg_policy_unregister);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[]
  2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
  2015-07-09 20:39 ` [PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods Tejun Heo
@ 2015-07-09 20:39 ` Tejun Heo
  2015-07-09 20:39 ` [PATCH 3/4] blkcg: implement all_blkcgs list Tejun Heo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2015-07-09 20:39 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team, Tejun Heo

An entry in blkcg_policy[] is stable while there are non-bypassing
in-flight IOs on a request_queue which has the policy activated.  This
is why most derefs of blkcg_policy[] don't need explicit locking;
however, blkcg_css_alloc() isn't invoked from IO path and thus doesn't
have this protection and may race policies being added and removed.

Fix it by adding explicit blkcg_pol_mutex protection around
blkcg_policy[] iteration in blkcg_css_alloc().

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/blk-cgroup.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 2ff74ff..05b893d 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -844,6 +844,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		goto free_blkcg;
 	}
 
+	mutex_lock(&blkcg_pol_mutex);
+
 	for (i = 0; i < BLKCG_MAX_POLS ; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
 		struct blkcg_policy_data *cpd;
@@ -860,6 +862,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		BUG_ON(blkcg->pd[i]);
 		cpd = kzalloc(pol->cpd_size, GFP_KERNEL);
 		if (!cpd) {
+			mutex_unlock(&blkcg_pol_mutex);
 			ret = ERR_PTR(-ENOMEM);
 			goto free_pd_blkcg;
 		}
@@ -868,6 +871,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		pol->cpd_init_fn(blkcg);
 	}
 
+	mutex_unlock(&blkcg_pol_mutex);
 done:
 	spin_lock_init(&blkcg->lock);
 	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_ATOMIC);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] blkcg: implement all_blkcgs list
  2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
  2015-07-09 20:39 ` [PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods Tejun Heo
  2015-07-09 20:39 ` [PATCH 2/4] blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] Tejun Heo
@ 2015-07-09 20:39 ` Tejun Heo
  2015-07-09 20:39 ` [PATCH 4/4] blkcg: fix blkcg_policy_data allocation bug Tejun Heo
  2015-07-10 14:28 ` [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2015-07-09 20:39 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team, Tejun Heo

Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
protected by blkcg_pol_mutex.  This will be used to fix
blkcg_policy_data allocation bug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/blk-cgroup.c         | 17 ++++++++++++-----
 include/linux/blk-cgroup.h |  1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 05b893d..42ff436 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -46,6 +46,8 @@ struct cgroup_subsys_state * const blkcg_root_css = &blkcg_root.css;
 
 static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
 
+static LIST_HEAD(all_blkcgs);		/* protected by blkcg_pol_mutex */
+
 static bool blkcg_policy_enabled(struct request_queue *q,
 				 const struct blkcg_policy *pol)
 {
@@ -817,6 +819,10 @@ static void blkcg_css_free(struct cgroup_subsys_state *css)
 {
 	struct blkcg *blkcg = css_to_blkcg(css);
 
+	mutex_lock(&blkcg_pol_mutex);
+	list_del(&blkcg->all_blkcgs_node);
+	mutex_unlock(&blkcg_pol_mutex);
+
 	if (blkcg != &blkcg_root) {
 		int i;
 
@@ -833,6 +839,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 	struct cgroup_subsys_state *ret;
 	int i;
 
+	mutex_lock(&blkcg_pol_mutex);
+
 	if (!parent_css) {
 		blkcg = &blkcg_root;
 		goto done;
@@ -844,8 +852,6 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		goto free_blkcg;
 	}
 
-	mutex_lock(&blkcg_pol_mutex);
-
 	for (i = 0; i < BLKCG_MAX_POLS ; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
 		struct blkcg_policy_data *cpd;
@@ -862,7 +868,6 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		BUG_ON(blkcg->pd[i]);
 		cpd = kzalloc(pol->cpd_size, GFP_KERNEL);
 		if (!cpd) {
-			mutex_unlock(&blkcg_pol_mutex);
 			ret = ERR_PTR(-ENOMEM);
 			goto free_pd_blkcg;
 		}
@@ -871,7 +876,6 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 		pol->cpd_init_fn(blkcg);
 	}
 
-	mutex_unlock(&blkcg_pol_mutex);
 done:
 	spin_lock_init(&blkcg->lock);
 	INIT_RADIX_TREE(&blkcg->blkg_tree, GFP_ATOMIC);
@@ -879,14 +883,17 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css)
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&blkcg->cgwb_list);
 #endif
+	list_add_tail(&blkcg->all_blkcgs_node, &all_blkcgs);
+
+	mutex_unlock(&blkcg_pol_mutex);
 	return &blkcg->css;
 
 free_pd_blkcg:
 	for (i--; i >= 0; i--)
 		kfree(blkcg->pd[i]);
-
 free_blkcg:
 	kfree(blkcg);
+	mutex_unlock(&blkcg_pol_mutex);
 	return ret;
 }
 
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 58cfab8..cf3e7bc 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -47,6 +47,7 @@ struct blkcg {
 
 	struct blkcg_policy_data	*pd[BLKCG_MAX_POLS];
 
+	struct list_head		all_blkcgs_node;
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct list_head		cgwb_list;
 #endif
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] blkcg: fix blkcg_policy_data allocation bug
  2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
                   ` (2 preceding siblings ...)
  2015-07-09 20:39 ` [PATCH 3/4] blkcg: implement all_blkcgs list Tejun Heo
@ 2015-07-09 20:39 ` Tejun Heo
  2015-07-10 14:28 ` [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2015-07-09 20:39 UTC (permalink / raw)
  To: axboe; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team, Tejun Heo

e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg
data") updated per-blkcg policy data to be dynamically allocated.
When a policy is registered, its policy data aren't created.  Instead,
when the policy is activated on a queue, the policy data are allocated
if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
This is buggy.  Consider the following scenario.

1. A blkcg is created.  No blkg's attached yet.

2. The policy is registered.  No policy data is allocated.

3. The policy is activated on a queue.  As the above blkcg doesn't
   have any blkg's, it won't allocate the matching blkcg_policy_data.

4. An IO is issued from the blkcg and blkg is created and the blkcg
   still doesn't have the matching policy data allocated.

With cfq-iosched, this leads to an oops.

It also doesn't free policy data on policy unregistration assuming
that freeing of all policy data on blkcg destruction should take care
of it; however, this also is incorrect.

1. A blkcg has policy data.

2. The policy gets unregistered but the policy data remains.

3. Another policy gets registered on the same slot.

4. Later, the new policy tries to allocate policy data on the previous
   blkcg but the slot is already occupied and gets skipped.  The
   policy ends up operating on the policy data of the previous policy.

There's no reason to manage blkcg_policy_data lazily.  The reason we
do lazy allocation of blkg's is that the number of all possible blkg's
is the product of cgroups and block devices which can reach a
surprising level.  blkcg_policy_data is contrained by the number of
cgroups and shouldn't be a problem.

This patch makes blkcg_policy_data to be allocated for all existing
blkcg's on policy registration and freed on unregistration and removes
blkcg_policy_data handling from policy [de]activation paths.  This
makes that blkcg_policy_data are created and removed with the policy
they belong to and fixes the above described problems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data")
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
---
 block/blk-cgroup.c         | 78 +++++++++++++++++++++++++---------------------
 include/linux/blk-cgroup.h | 10 ++----
 2 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 42ff436..9da02c0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1048,10 +1048,8 @@ int blkcg_activate_policy(struct request_queue *q,
 			  const struct blkcg_policy *pol)
 {
 	LIST_HEAD(pds);
-	LIST_HEAD(cpds);
 	struct blkcg_gq *blkg;
 	struct blkg_policy_data *pd, *nd;
-	struct blkcg_policy_data *cpd, *cnd;
 	int cnt = 0, ret;
 
 	if (blkcg_policy_enabled(q, pol))
@@ -1064,10 +1062,7 @@ int blkcg_activate_policy(struct request_queue *q,
 		cnt++;
 	spin_unlock_irq(q->queue_lock);
 
-	/*
-	 * Allocate per-blkg and per-blkcg policy data
-	 * for all existing blkgs.
-	 */
+	/* allocate per-blkg policy data for all existing blkgs */
 	while (cnt--) {
 		pd = kzalloc_node(pol->pd_size, GFP_KERNEL, q->node);
 		if (!pd) {
@@ -1075,15 +1070,6 @@ int blkcg_activate_policy(struct request_queue *q,
 			goto out_free;
 		}
 		list_add_tail(&pd->alloc_node, &pds);
-
-		if (!pol->cpd_size)
-			continue;
-		cpd = kzalloc_node(pol->cpd_size, GFP_KERNEL, q->node);
-		if (!cpd) {
-			ret = -ENOMEM;
-			goto out_free;
-		}
-		list_add_tail(&cpd->alloc_node, &cpds);
 	}
 
 	/*
@@ -1093,32 +1079,17 @@ int blkcg_activate_policy(struct request_queue *q,
 	spin_lock_irq(q->queue_lock);
 
 	list_for_each_entry(blkg, &q->blkg_list, q_node) {
-		if (WARN_ON(list_empty(&pds)) ||
-		    WARN_ON(pol->cpd_size && list_empty(&cpds))) {
+		if (WARN_ON(list_empty(&pds))) {
 			/* umm... this shouldn't happen, just abort */
 			ret = -ENOMEM;
 			goto out_unlock;
 		}
-		cpd = list_first_entry(&cpds, struct blkcg_policy_data,
-				       alloc_node);
-		list_del_init(&cpd->alloc_node);
 		pd = list_first_entry(&pds, struct blkg_policy_data, alloc_node);
 		list_del_init(&pd->alloc_node);
 
 		/* grab blkcg lock too while installing @pd on @blkg */
 		spin_lock(&blkg->blkcg->lock);
 
-		if (!pol->cpd_size)
-			goto no_cpd;
-		if (!blkg->blkcg->pd[pol->plid]) {
-			/* Per-policy per-blkcg data */
-			blkg->blkcg->pd[pol->plid] = cpd;
-			cpd->plid = pol->plid;
-			pol->cpd_init_fn(blkg->blkcg);
-		} else { /* must free it as it has already been extracted */
-			kfree(cpd);
-		}
-no_cpd:
 		blkg->pd[pol->plid] = pd;
 		pd->blkg = blkg;
 		pd->plid = pol->plid;
@@ -1135,8 +1106,6 @@ int blkcg_activate_policy(struct request_queue *q,
 	blk_queue_bypass_end(q);
 	list_for_each_entry_safe(pd, nd, &pds, alloc_node)
 		kfree(pd);
-	list_for_each_entry_safe(cpd, cnd, &cpds, alloc_node)
-		kfree(cpd);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(blkcg_activate_policy);
@@ -1191,6 +1160,7 @@ EXPORT_SYMBOL_GPL(blkcg_deactivate_policy);
  */
 int blkcg_policy_register(struct blkcg_policy *pol)
 {
+	struct blkcg *blkcg;
 	int i, ret;
 
 	if (WARN_ON(pol->pd_size < sizeof(struct blkg_policy_data)))
@@ -1207,9 +1177,27 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 	if (i >= BLKCG_MAX_POLS)
 		goto err_unlock;
 
-	/* register and update blkgs */
+	/* register @pol */
 	pol->plid = i;
-	blkcg_policy[i] = pol;
+	blkcg_policy[pol->plid] = pol;
+
+	/* allocate and install cpd's */
+	if (pol->cpd_size) {
+		list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) {
+			struct blkcg_policy_data *cpd;
+
+			cpd = kzalloc(pol->cpd_size, GFP_KERNEL);
+			if (!cpd) {
+				mutex_unlock(&blkcg_pol_mutex);
+				goto err_free_cpds;
+			}
+
+			blkcg->pd[pol->plid] = cpd;
+			cpd->plid = pol->plid;
+			pol->cpd_init_fn(blkcg);
+		}
+	}
+
 	mutex_unlock(&blkcg_pol_mutex);
 
 	/* everything is in place, add intf files for the new policy */
@@ -1219,6 +1207,14 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 	mutex_unlock(&blkcg_pol_register_mutex);
 	return 0;
 
+err_free_cpds:
+	if (pol->cpd_size) {
+		list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) {
+			kfree(blkcg->pd[pol->plid]);
+			blkcg->pd[pol->plid] = NULL;
+		}
+	}
+	blkcg_policy[pol->plid] = NULL;
 err_unlock:
 	mutex_unlock(&blkcg_pol_mutex);
 	mutex_unlock(&blkcg_pol_register_mutex);
@@ -1234,6 +1230,8 @@ EXPORT_SYMBOL_GPL(blkcg_policy_register);
  */
 void blkcg_policy_unregister(struct blkcg_policy *pol)
 {
+	struct blkcg *blkcg;
+
 	mutex_lock(&blkcg_pol_register_mutex);
 
 	if (WARN_ON(blkcg_policy[pol->plid] != pol))
@@ -1243,9 +1241,17 @@ void blkcg_policy_unregister(struct blkcg_policy *pol)
 	if (pol->cftypes)
 		cgroup_rm_cftypes(pol->cftypes);
 
-	/* unregister and update blkgs */
+	/* remove cpds and unregister */
 	mutex_lock(&blkcg_pol_mutex);
+
+	if (pol->cpd_size) {
+		list_for_each_entry(blkcg, &all_blkcgs, all_blkcgs_node) {
+			kfree(blkcg->pd[pol->plid]);
+			blkcg->pd[pol->plid] = NULL;
+		}
+	}
 	blkcg_policy[pol->plid] = NULL;
+
 	mutex_unlock(&blkcg_pol_mutex);
 out_unlock:
 	mutex_unlock(&blkcg_pol_register_mutex);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index cf3e7bc..1b62d76 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -89,18 +89,12 @@ struct blkg_policy_data {
  * Policies that need to keep per-blkcg data which is independent
  * from any request_queue associated to it must specify its size
  * with the cpd_size field of the blkcg_policy structure and
- * embed a blkcg_policy_data in it. blkcg core allocates
- * policy-specific per-blkcg structures lazily the first time
- * they are actually needed, so it handles them together with
- * blkgs. cpd_init() is invoked to let each policy handle
- * per-blkcg data.
+ * embed a blkcg_policy_data in it.  cpd_init() is invoked to let
+ * each policy handle per-blkcg data.
  */
 struct blkcg_policy_data {
 	/* the policy id this per-policy data belongs to */
 	int				plid;
-
-	/* used during policy activation */
-	struct list_head		alloc_node;
 };
 
 /* association between a blk cgroup and a request queue */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes
  2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
                   ` (3 preceding siblings ...)
  2015-07-09 20:39 ` [PATCH 4/4] blkcg: fix blkcg_policy_data allocation bug Tejun Heo
@ 2015-07-10 14:28 ` Jens Axboe
  4 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2015-07-10 14:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: vgoyal, linux-kernel, avanzini.arianna, kernel-team

On 07/09/2015 02:39 PM, Tejun Heo wrote:
> During 4.2-rc1 merge window, e48453c386f3 ("block, cgroup: implement
> policy-specific per-blkcg data") got merged.  It had several bugs and
> a322baad1003 ("block/blk-cgroup.c: free per-blkcg data when freeing
> the blkcg") fixed one of them but there are still a couple remaining.
>
> * blkcg_policy[] iteration during css_alloc isn't protected.
>
> * Lazy allocation scheme is broken and policy data may be missing or
>    an old policy data may get reused by the wrong policy.
>
> This patchset contains the following four patches to fix the above
> issues.
>
>   0001-blkcg-allow-blkcg_pol_mutex-to-be-grabbed-from-cgrou.patch
>   0002-blkcg-blkcg_css_alloc-should-grab-blkcg_pol_mutex-wh.patch
>   0003-blkcg-implement-all_blkcgs-list.patch
>   0004-blkcg-fix-blkcg_policy_data-allocation-bug.patch
>
> 0001 and 0003 are prep patches.  0002 and 0004 are the fixes.
>
> This patchset is on top of block/for-linus a322baad1003
> ("block/blk-cgroup.c: free per-blkcg data when freeing the blkcg") and
> available in the following git branch.
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-blkcg-cpd-fixes
>
> diffstat follows, thanks.

Applied, thanks Tejun.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-10 14:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-09 20:39 [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Tejun Heo
2015-07-09 20:39 ` [PATCH 1/4] blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods Tejun Heo
2015-07-09 20:39 ` [PATCH 2/4] blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] Tejun Heo
2015-07-09 20:39 ` [PATCH 3/4] blkcg: implement all_blkcgs list Tejun Heo
2015-07-09 20:39 ` [PATCH 4/4] blkcg: fix blkcg_policy_data allocation bug Tejun Heo
2015-07-10 14:28 ` [PATCHSET block/for-linus] blkcg: blkcg_policy_data fixes Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.