- * [PATCH 01/13] blkcg: fix ref count issue with bio_blkcg() using task_css
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-27 20:54   ` Josef Bacik
  2018-11-26 21:19 ` [PATCH 02/13] blkcg: update blkg_lookup_create() to do locking Dennis Zhou
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
The bio_blkcg() function turns out to be inconsistent and consequently
dangerous to use. The first part returns a blkcg where a reference is
owned by the bio meaning it does not need to be rcu protected. However,
the third case, the last line, is problematic:
	return css_to_blkcg(task_css(current, io_cgrp_id));
This can race against task migration and the cgroup dying. It is also
semantically different as it must be called rcu protected and is
susceptible to failure when trying to get a reference to it.
This patch adds association ahead of calling bio_blkcg() rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg(). In blk-iolatency, association is
moved above the bio_blkcg() call to ensure it will not return %NULL.
BFQ uses the old bio_blkcg() function, but I do not want to address it
in this series due to the complexity. I have created a private version
documenting the inconsistency and noting not to use it.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bfq-cgroup.c         |  4 +-
 block/bfq-iosched.c        |  2 +-
 block/bio.c                | 10 +++-
 block/blk-iolatency.c      |  2 +-
 include/linux/blk-cgroup.h | 98 ++++++++++++++++++++++++++++++++++----
 5 files changed, 102 insertions(+), 14 deletions(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index a7a1712632b0..c6113af31960 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -642,7 +642,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 	uint64_t serial_nr;
 
 	rcu_read_lock();
-	serial_nr = bio_blkcg(bio)->css.serial_nr;
+	serial_nr = __bio_blkcg(bio)->css.serial_nr;
 
 	/*
 	 * Check whether blkcg has changed.  The condition may trigger
@@ -651,7 +651,7 @@ void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
 	if (unlikely(!bfqd) || likely(bic->blkcg_serial_nr == serial_nr))
 		goto out;
 
-	bfqg = __bfq_bic_change_cgroup(bfqd, bic, bio_blkcg(bio));
+	bfqg = __bfq_bic_change_cgroup(bfqd, bic, __bio_blkcg(bio));
 	/*
 	 * Update blkg_path for bfq_log_* functions. We cache this
 	 * path, and update it here, for the following
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 67b22c924aee..3d1f319fe977 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4384,7 +4384,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 
 	rcu_read_lock();
 
-	bfqg = bfq_find_set_group(bfqd, bio_blkcg(bio));
+	bfqg = bfq_find_set_group(bfqd, __bio_blkcg(bio));
 	if (!bfqg) {
 		bfqq = &bfqd->oom_bfqq;
 		goto out;
diff --git a/block/bio.c b/block/bio.c
index 03895cc0d74a..7528c2324319 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1990,13 +1990,19 @@ int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
  *
  * This function takes an extra reference of @blkcg_css which will be put
  * when @bio is released.  The caller must own @bio and is responsible for
- * synchronizing calls to this function.
+ * synchronizing calls to this function.  If @blkcg_css is NULL, a call to
+ * blkcg_get_css() finds the current css from the kthread or task.
  */
 int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
 {
 	if (unlikely(bio->bi_css))
 		return -EBUSY;
-	css_get(blkcg_css);
+
+	if (blkcg_css)
+		css_get(blkcg_css);
+	else
+		blkcg_css = blkcg_get_css();
+
 	bio->bi_css = blkcg_css;
 	return 0;
 }
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 5f7f1773be61..fe0c4ca312ff 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -481,8 +481,8 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 		return;
 
 	rcu_read_lock();
+	bio_associate_blkcg(bio, NULL);
 	blkcg = bio_blkcg(bio);
-	bio_associate_blkcg(bio, &blkcg->css);
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
 		spin_lock_irq(&q->queue_lock);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index a9e2e2037129..db8214047486 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -227,22 +227,103 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 		   char *input, struct blkg_conf_ctx *ctx);
 void blkg_conf_finish(struct blkg_conf_ctx *ctx);
 
+/**
+ * blkcg_css - find the current css
+ *
+ * Find the css associated with either the kthread or the current task.
+ * This may return a dying css, so it is up to the caller to use tryget logic
+ * to confirm it is alive and well.
+ */
+static inline struct cgroup_subsys_state *blkcg_css(void)
+{
+	struct cgroup_subsys_state *css;
+
+	css = kthread_blkcg();
+	if (css)
+		return css;
+	return task_css(current, io_cgrp_id);
+}
+
+/**
+ * blkcg_get_css - find and get a reference to the css
+ *
+ * Find the css associated with either the kthread or the current task.
+ * This takes a reference on the blkcg which will need to be managed by the
+ * caller.
+ */
+static inline struct cgroup_subsys_state *blkcg_get_css(void)
+{
+	struct cgroup_subsys_state *css;
+
+	rcu_read_lock();
+
+	css = kthread_blkcg();
+	if (css) {
+		css_get(css);
+	} else {
+		/*
+		 * This is a bit complicated.  It is possible task_css is seeing
+		 * an old css pointer here.  This is caused by the current
+		 * thread migrating away from this cgroup and this cgroup dying.
+		 * css_tryget() will fail when trying to take a ref on a cgroup
+		 * that's ref count has hit 0.
+		 *
+		 * Therefore, if it does fail, this means current must have
+		 * been swapped away already and this is waiting for it to
+		 * propagate on the polling cpu.  Hence the use of cpu_relax().
+		 */
+		while (true) {
+			css = task_css(current, io_cgrp_id);
+			if (likely(css_tryget(css)))
+				break;
+			cpu_relax();
+		}
+	}
+
+	rcu_read_unlock();
+
+	return css;
+}
 
 static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
 {
 	return css ? container_of(css, struct blkcg, css) : NULL;
 }
 
-static inline struct blkcg *bio_blkcg(struct bio *bio)
+/**
+ * __bio_blkcg - internal, inconsistent version to get blkcg
+ *
+ * DO NOT USE.
+ * This function is inconsistent and consequently is dangerous to use.  The
+ * first part of the function returns a blkcg where a reference is owned by the
+ * bio.  This means it does not need to be rcu protected as it cannot go away
+ * with the bio owning a reference to it.  However, the latter potentially gets
+ * it from task_css().  This can race against task migration and the cgroup
+ * dying.  It is also semantically different as it must be called rcu protected
+ * and is susceptible to failure when trying to get a reference to it.
+ * Therefore, it is not ok to assume that *_get() will always succeed on the
+ * blkcg returned here.
+ */
+static inline struct blkcg *__bio_blkcg(struct bio *bio)
 {
-	struct cgroup_subsys_state *css;
+	if (bio && bio->bi_css)
+		return css_to_blkcg(bio->bi_css);
+	return css_to_blkcg(blkcg_css());
+}
 
+/**
+ * bio_blkcg - grab the blkcg associated with a bio
+ * @bio: target bio
+ *
+ * This returns the blkcg associated with a bio, %NULL if not associated.
+ * Callers are expected to either handle %NULL or know association has been
+ * done prior to calling this.
+ */
+static inline struct blkcg *bio_blkcg(struct bio *bio)
+{
 	if (bio && bio->bi_css)
 		return css_to_blkcg(bio->bi_css);
-	css = kthread_blkcg();
-	if (css)
-		return css_to_blkcg(css);
-	return css_to_blkcg(task_css(current, io_cgrp_id));
+	return NULL;
 }
 
 static inline bool blk_cgroup_congested(void)
@@ -710,10 +791,10 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 	bool throtl = false;
 
 	rcu_read_lock();
-	blkcg = bio_blkcg(bio);
 
 	/* associate blkcg if bio hasn't attached one */
-	bio_associate_blkcg(bio, &blkcg->css);
+	bio_associate_blkcg(bio, NULL);
+	blkcg = bio_blkcg(bio);
 
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
@@ -835,6 +916,7 @@ static inline int blkcg_activate_policy(struct request_queue *q,
 static inline void blkcg_deactivate_policy(struct request_queue *q,
 					   const struct blkcg_policy *pol) { }
 
+static inline struct blkcg *__bio_blkcg(struct bio *bio) { return NULL; }
 static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; }
 
 static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg,
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 01/13] blkcg: fix ref count issue with bio_blkcg() using task_css
  2018-11-26 21:19 ` [PATCH 01/13] blkcg: fix ref count issue with bio_blkcg() using task_css Dennis Zhou
@ 2018-11-27 20:54   ` Josef Bacik
  0 siblings, 0 replies; 29+ messages in thread
From: Josef Bacik @ 2018-11-27 20:54 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:34PM -0500, Dennis Zhou wrote:
> The bio_blkcg() function turns out to be inconsistent and consequently
> dangerous to use. The first part returns a blkcg where a reference is
> owned by the bio meaning it does not need to be rcu protected. However,
> the third case, the last line, is problematic:
> 
> 	return css_to_blkcg(task_css(current, io_cgrp_id));
> 
> This can race against task migration and the cgroup dying. It is also
> semantically different as it must be called rcu protected and is
> susceptible to failure when trying to get a reference to it.
> 
> This patch adds association ahead of calling bio_blkcg() rather than
> after. This makes association a required and explicit step along the
> code paths for calling bio_blkcg(). In blk-iolatency, association is
> moved above the bio_blkcg() call to ensure it will not return %NULL.
> 
> BFQ uses the old bio_blkcg() function, but I do not want to address it
> in this series due to the complexity. I have created a private version
> documenting the inconsistency and noting not to use it.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply	[flat|nested] 29+ messages in thread 
 
- * [PATCH 02/13] blkcg: update blkg_lookup_create() to do locking
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
  2018-11-26 21:19 ` [PATCH 01/13] blkcg: fix ref count issue with bio_blkcg() using task_css Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-27 20:56   ` Josef Bacik
  2018-11-26 21:19 ` [PATCH 03/13] blkcg: convert blkg_lookup_create() to find closest blkg Dennis Zhou
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
To know when to create a blkg, the general pattern is to do a
blkg_lookup() and if that fails, lock and do the lookup again, and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.
This changes blkg_lookup_create() to do locking and implement this
pattern. The old blkg_lookup_create() is renamed to
__blkg_lookup_create().  If a call site wants to do its own error
handling or already owns the queue lock, they can use
__blkg_lookup_create(). This will be used in upcoming patches.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 block/blk-cgroup.c         | 30 +++++++++++++++++++++++++++---
 block/blk-iolatency.c      |  2 +-
 include/linux/blk-cgroup.h |  4 +++-
 3 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 63d226a084cd..b1ec98eeeae0 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -249,7 +249,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 }
 
 /**
- * blkg_lookup_create - lookup blkg, try to create one if not there
+ * __blkg_lookup_create - lookup blkg, try to create one if not there
  * @blkcg: blkcg of interest
  * @q: request_queue of interest
  *
@@ -262,8 +262,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
  * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
  * dead and bypassing, returns ERR_PTR(-EBUSY).
  */
-struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
-				    struct request_queue *q)
+struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
+				      struct request_queue *q)
 {
 	struct blkcg_gq *blkg;
 
@@ -293,6 +293,30 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 	}
 }
 
+/**
+ * blkg_lookup_create - find or create a blkg
+ * @blkcg: target block cgroup
+ * @q: target request_queue
+ *
+ * This looks up or creates the blkg representing the unique pair
+ * of the blkcg and the request_queue.
+ */
+struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
+				    struct request_queue *q)
+{
+	struct blkcg_gq *blkg = blkg_lookup(blkcg, q);
+
+	if (unlikely(!blkg)) {
+		spin_lock_irq(&q->queue_lock);
+
+		blkg = __blkg_lookup_create(blkcg, q);
+
+		spin_unlock_irq(&q->queue_lock);
+	}
+
+	return blkg;
+}
+
 static void blkg_destroy(struct blkcg_gq *blkg)
 {
 	struct blkcg *blkcg = blkg->blkcg;
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index fe0c4ca312ff..e6f68f15dee9 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -486,7 +486,7 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
 		spin_lock_irq(&q->queue_lock);
-		blkg = blkg_lookup_create(blkcg, q);
+		blkg = __blkg_lookup_create(blkcg, q);
 		if (IS_ERR(blkg))
 			blkg = NULL;
 		spin_unlock_irq(&q->queue_lock);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index db8214047486..49f399cfb0b6 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -181,6 +181,8 @@ extern struct cgroup_subsys_state * const blkcg_root_css;
 
 struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
 				      struct request_queue *q, bool update_hint);
+struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
+				      struct request_queue *q);
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 				    struct request_queue *q);
 int blkcg_init_queue(struct request_queue *q);
@@ -799,7 +801,7 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
 		spin_lock_irq(&q->queue_lock);
-		blkg = blkg_lookup_create(blkcg, q);
+		blkg = __blkg_lookup_create(blkcg, q);
 		if (IS_ERR(blkg))
 			blkg = NULL;
 		spin_unlock_irq(&q->queue_lock);
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 02/13] blkcg: update blkg_lookup_create() to do locking
  2018-11-26 21:19 ` [PATCH 02/13] blkcg: update blkg_lookup_create() to do locking Dennis Zhou
@ 2018-11-27 20:56   ` Josef Bacik
  0 siblings, 0 replies; 29+ messages in thread
From: Josef Bacik @ 2018-11-27 20:56 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:35PM -0500, Dennis Zhou wrote:
> To know when to create a blkg, the general pattern is to do a
> blkg_lookup() and if that fails, lock and do the lookup again, and if
> that fails finally create. It doesn't make much sense for everyone who
> wants to do creation to write this themselves.
> 
> This changes blkg_lookup_create() to do locking and implement this
> pattern. The old blkg_lookup_create() is renamed to
> __blkg_lookup_create().  If a call site wants to do its own error
> handling or already owns the queue lock, they can use
> __blkg_lookup_create(). This will be used in upcoming patches.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Acked-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  block/blk-cgroup.c         | 30 +++++++++++++++++++++++++++---
>  block/blk-iolatency.c      |  2 +-
>  include/linux/blk-cgroup.h |  4 +++-
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 63d226a084cd..b1ec98eeeae0 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -249,7 +249,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>  }
>  
>  /**
> - * blkg_lookup_create - lookup blkg, try to create one if not there
> + * __blkg_lookup_create - lookup blkg, try to create one if not there
>   * @blkcg: blkcg of interest
>   * @q: request_queue of interest
>   *
> @@ -262,8 +262,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>   * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
>   * dead and bypassing, returns ERR_PTR(-EBUSY).
>   */
> -struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
> -				    struct request_queue *q)
> +struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
> +				      struct request_queue *q)
>  {
>  	struct blkcg_gq *blkg;
>  
> @@ -293,6 +293,30 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
>  	}
>  }
>  
> +/**
> + * blkg_lookup_create - find or create a blkg
> + * @blkcg: target block cgroup
> + * @q: target request_queue
> + *
> + * This looks up or creates the blkg representing the unique pair
> + * of the blkcg and the request_queue.
> + */
> +struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
> +				    struct request_queue *q)
> +{
> +	struct blkcg_gq *blkg = blkg_lookup(blkcg, q);
> +
> +	if (unlikely(!blkg)) {
> +		spin_lock_irq(&q->queue_lock);
> +
> +		blkg = __blkg_lookup_create(blkcg, q);
> +
> +		spin_unlock_irq(&q->queue_lock);
No extra spaces here please, otherwise you can add
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply	[flat|nested] 29+ messages in thread
 
- * [PATCH 03/13] blkcg: convert blkg_lookup_create() to find closest blkg
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
  2018-11-26 21:19 ` [PATCH 01/13] blkcg: fix ref count issue with bio_blkcg() using task_css Dennis Zhou
  2018-11-26 21:19 ` [PATCH 02/13] blkcg: update blkg_lookup_create() to do locking Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-27 21:01   ` Josef Bacik
  2018-11-26 21:19 ` [PATCH 04/13] blkcg: introduce common blkg association logic Dennis Zhou
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
There are several scenarios where blkg_lookup_create() can fail such as
the blkcg dying, request_queue is dying, or simply being OOM. Most
handle this by simply falling back to the q->root_blkg and calling it a
day.
This patch implements the notion of closest blkg. During
blkg_lookup_create(), if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest() is introduced and used
during association so a bio is always attached to a blkg.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bio.c                | 17 ++++++++++-------
 block/blk-cgroup.c         | 23 ++++++++++++++++-------
 block/blk-iolatency.c      | 14 ++------------
 block/blk-throttle.c       |  4 +---
 include/linux/blk-cgroup.h | 24 +++++++++++++++---------
 5 files changed, 44 insertions(+), 38 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 7528c2324319..edc8a73b98d5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2009,21 +2009,24 @@ int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
 EXPORT_SYMBOL_GPL(bio_associate_blkcg);
 
 /**
- * bio_associate_blkg - associate a bio with the specified blkg
+ * bio_associate_blkg - associate a bio with the a blkg
  * @bio: target bio
  * @blkg: the blkg to associate
  *
- * Associate @bio with the blkg specified by @blkg.  This is the queue specific
- * blkcg information associated with the @bio, a reference will be taken on the
- * @blkg and will be freed when the bio is freed.
+ * This tries to associate @bio with the specified @blkg.  Association failure
+ * is handled by walking up the blkg tree.  Therefore, the blkg associated can
+ * be anything between @blkg and the root_blkg.  This situation only happens
+ * when a cgroup is dying and then the remaining bios will spill to the closest
+ * alive blkg.
+ *
+ * A reference will be taken on the @blkg and will be released when @bio is
+ * freed.
  */
 int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
 {
 	if (unlikely(bio->bi_blkg))
 		return -EBUSY;
-	if (!blkg_try_get(blkg))
-		return -ENODEV;
-	bio->bi_blkg = blkg;
+	bio->bi_blkg = blkg_try_get_closest(blkg);
 	return 0;
 }
 
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index b1ec98eeeae0..dfd984bbed27 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -258,9 +258,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
  * that all non-root blkg's have access to the parent blkg.  This function
  * should be called under RCU read lock and @q->queue_lock.
  *
- * Returns pointer to the looked up or created blkg on success, ERR_PTR()
- * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
- * dead and bypassing, returns ERR_PTR(-EBUSY).
+ * Returns the blkg or the closest blkg if blkg_create() fails as it walks
+ * down from root.
  */
 struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
 				      struct request_queue *q)
@@ -276,19 +275,29 @@ struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg,
 
 	/*
 	 * Create blkgs walking down from blkcg_root to @blkcg, so that all
-	 * non-root blkgs have access to their parents.
+	 * non-root blkgs have access to their parents.  Returns the closest
+	 * blkg to the intended blkg should blkg_create() fail.
 	 */
 	while (true) {
 		struct blkcg *pos = blkcg;
 		struct blkcg *parent = blkcg_parent(blkcg);
-
-		while (parent && !__blkg_lookup(parent, q, false)) {
+		struct blkcg_gq *ret_blkg = q->root_blkg;
+
+		while (parent) {
+			blkg = __blkg_lookup(parent, q, false);
+			if (blkg) {
+				/* remember closest blkg */
+				ret_blkg = blkg;
+				break;
+			}
 			pos = parent;
 			parent = blkcg_parent(parent);
 		}
 
 		blkg = blkg_create(pos, q, NULL);
-		if (pos == blkcg || IS_ERR(blkg))
+		if (IS_ERR(blkg))
+			return ret_blkg;
+		if (pos == blkcg)
 			return blkg;
 	}
 }
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index e6f68f15dee9..46e86c34cf79 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -483,21 +483,11 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 	rcu_read_lock();
 	bio_associate_blkcg(bio, NULL);
 	blkcg = bio_blkcg(bio);
-	blkg = blkg_lookup(blkcg, q);
-	if (unlikely(!blkg)) {
-		spin_lock_irq(&q->queue_lock);
-		blkg = __blkg_lookup_create(blkcg, q);
-		if (IS_ERR(blkg))
-			blkg = NULL;
-		spin_unlock_irq(&q->queue_lock);
-	}
-	if (!blkg)
-		goto out;
-
+	blkg = blkg_lookup_create(blkcg, q);
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
 	bio_associate_blkg(bio, blkg);
-out:
 	rcu_read_unlock();
+
 	while (blkg && blkg->parent) {
 		struct iolatency_grp *iolat = blkg_to_lat(blkg);
 		if (!iolat) {
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 8f0a104770ee..d648d6720f46 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2118,9 +2118,7 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
 static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
 {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
-	/* fallback to root_blkg if we fail to get a blkg ref */
-	if (bio->bi_css && (bio_associate_blkg(bio, tg_to_blkg(tg)) == -ENODEV))
-		bio_associate_blkg(bio, bio->bi_disk->queue->root_blkg);
+	bio_associate_blkg(bio, tg_to_blkg(tg));
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
 #endif
 }
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 49f399cfb0b6..9d796a4f8ef0 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -545,6 +545,20 @@ static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
 	return NULL;
 }
 
+/**
+ * blkg_try_get_closest - try and get a blkg ref on the closet blkg
+ * @blkg: blkg to get
+ *
+ * This walks up the blkg tree to find the closest non-dying blkg and returns
+ * the blkg that it did association with as it may not be the passed in blkg.
+ */
+static inline struct blkcg_gq *blkg_try_get_closest(struct blkcg_gq *blkg)
+{
+	while (!atomic_inc_not_zero(&blkg->refcnt))
+		blkg = blkg->parent;
+
+	return blkg;
+}
 
 void __blkg_release_rcu(struct rcu_head *rcu);
 
@@ -797,15 +811,7 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 	/* associate blkcg if bio hasn't attached one */
 	bio_associate_blkcg(bio, NULL);
 	blkcg = bio_blkcg(bio);
-
-	blkg = blkg_lookup(blkcg, q);
-	if (unlikely(!blkg)) {
-		spin_lock_irq(&q->queue_lock);
-		blkg = __blkg_lookup_create(blkcg, q);
-		if (IS_ERR(blkg))
-			blkg = NULL;
-		spin_unlock_irq(&q->queue_lock);
-	}
+	blkg = blkg_lookup_create(blkcg, q);
 
 	throtl = blk_throtl_bio(q, blkg, bio);
 
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 03/13] blkcg: convert blkg_lookup_create() to find closest blkg
  2018-11-26 21:19 ` [PATCH 03/13] blkcg: convert blkg_lookup_create() to find closest blkg Dennis Zhou
@ 2018-11-27 21:01   ` Josef Bacik
  0 siblings, 0 replies; 29+ messages in thread
From: Josef Bacik @ 2018-11-27 21:01 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:36PM -0500, Dennis Zhou wrote:
> There are several scenarios where blkg_lookup_create() can fail such as
> the blkcg dying, request_queue is dying, or simply being OOM. Most
> handle this by simply falling back to the q->root_blkg and calling it a
> day.
> 
> This patch implements the notion of closest blkg. During
> blkg_lookup_create(), if it fails to create, return the closest blkg
> found or the q->root_blkg. blkg_try_get_closest() is introduced and used
> during association so a bio is always attached to a blkg.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply	[flat|nested] 29+ messages in thread 
 
- * [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (2 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 03/13] blkcg: convert blkg_lookup_create() to find closest blkg Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-27 21:04   ` Josef Bacik
                     ` (2 more replies)
  2018-11-26 21:19 ` [PATCH 05/13] blkcg: associate blkg when associating a device Dennis Zhou
                   ` (9 subsequent siblings)
  13 siblings, 3 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
There are 3 ways blkg association can happen: association with the
current css, with the page css (swap), or from the wbc css (writeback).
This patch handles how association is done for the first case where we
are associating bsaed on the current css. If there is already a blkg
associated, the css will be reused and association will be redone as the
request_queue may have changed.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 block/bio.c           | 81 ++++++++++++++++++++++++++++++++++++++-----
 block/blk-iolatency.c | 10 ++----
 block/blk-throttle.c  |  6 ++--
 include/linux/bio.h   |  5 ++-
 4 files changed, 81 insertions(+), 21 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index edc8a73b98d5..de0133329b71 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2009,7 +2009,34 @@ int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
 EXPORT_SYMBOL_GPL(bio_associate_blkcg);
 
 /**
- * bio_associate_blkg - associate a bio with the a blkg
+ * bio_has_queue - required check for blkg association
+ * @bio: target bio
+ *
+ * A blkg represents the relationship between a blkcg and a request_queue.
+ * If there is no request_queue, there is no blkg and therefore nothing to
+ * associate with.
+ */
+static inline bool bio_has_queue(struct bio *bio)
+{
+	return bio->bi_disk && bio->bi_disk->queue;
+}
+
+/**
+ * bio_disassociate_blkg - puts back the blkg reference if associated
+ * @bio: target bio
+ *
+ * Helper to disassociate the blkg from @bio if a blkg is associated.
+ */
+void bio_disassociate_blkg(struct bio *bio)
+{
+	if (bio->bi_blkg) {
+		blkg_put(bio->bi_blkg);
+		bio->bi_blkg = NULL;
+	}
+}
+
+/**
+ * __bio_associate_blkg - associate a bio with the a blkg
  * @bio: target bio
  * @blkg: the blkg to associate
  *
@@ -2022,12 +2049,48 @@ EXPORT_SYMBOL_GPL(bio_associate_blkcg);
  * A reference will be taken on the @blkg and will be released when @bio is
  * freed.
  */
-int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
+static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
 {
-	if (unlikely(bio->bi_blkg))
-		return -EBUSY;
+	if (bio->bi_blkg)
+		bio_disassociate_blkg(bio);
+
 	bio->bi_blkg = blkg_try_get_closest(blkg);
-	return 0;
+}
+
+/**
+ * bio_associate_blkg - associate a bio with a blkg from q
+ * @bio: target bio
+ *
+ * Associate @bio with the blkg found from the bio's css and request_queue.
+ * If one is not found, bio_lookup_blkg() creates the blkg.  If a blkg is
+ * already associated, the css is reused and association redone as the
+ * request_queue may have changed.
+ */
+void bio_associate_blkg(struct bio *bio)
+{
+	struct request_queue *q;
+	struct blkcg *blkcg;
+	struct blkcg_gq *blkg;
+
+	if (!bio_has_queue(bio))
+		return;
+
+	q = bio->bi_disk->queue;
+
+	rcu_read_lock();
+
+	bio_associate_blkcg(bio, NULL);
+	blkcg = bio_blkcg(bio);
+
+	if (!blkcg->css.parent) {
+		__bio_associate_blkg(bio, q->root_blkg);
+	} else {
+		blkg = blkg_lookup_create(blkcg, q);
+
+		__bio_associate_blkg(bio, blkg);
+	}
+
+	rcu_read_unlock();
 }
 
 /**
@@ -2040,10 +2103,7 @@ void bio_disassociate_task(struct bio *bio)
 		css_put(bio->bi_css);
 		bio->bi_css = NULL;
 	}
-	if (bio->bi_blkg) {
-		blkg_put(bio->bi_blkg);
-		bio->bi_blkg = NULL;
-	}
+	bio_disassociate_blkg(bio);
 }
 
 /**
@@ -2055,6 +2115,9 @@ void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
 {
 	if (src->bi_css)
 		WARN_ON(bio_associate_blkcg(dst, src->bi_css));
+
+	if (src->bi_blkg)
+		__bio_associate_blkg(dst, src->bi_blkg);
 }
 EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
 #endif /* CONFIG_BLK_CGROUP */
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 46e86c34cf79..cdbd10564e66 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -472,21 +472,15 @@ static void check_scale_change(struct iolatency_grp *iolat)
 static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 {
 	struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
-	struct blkcg *blkcg;
 	struct blkcg_gq *blkg;
-	struct request_queue *q = rqos->q;
 	bool issue_as_root = bio_issue_as_root_blkg(bio);
 
 	if (!blk_iolatency_enabled(blkiolat))
 		return;
 
-	rcu_read_lock();
-	bio_associate_blkcg(bio, NULL);
-	blkcg = bio_blkcg(bio);
-	blkg = blkg_lookup_create(blkcg, q);
+	bio_associate_blkg(bio);
+	blkg = bio->bi_blkg;
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
-	bio_associate_blkg(bio, blkg);
-	rcu_read_unlock();
 
 	while (blkg && blkg->parent) {
 		struct iolatency_grp *iolat = blkg_to_lat(blkg);
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index d648d6720f46..228c3a007ebc 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2115,10 +2115,10 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
 }
 #endif
 
-static void blk_throtl_assoc_bio(struct throtl_grp *tg, struct bio *bio)
+static void blk_throtl_assoc_bio(struct bio *bio)
 {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
-	bio_associate_blkg(bio, tg_to_blkg(tg));
+	bio_associate_blkg(bio);
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
 #endif
 }
@@ -2143,7 +2143,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 
 	throtl_update_latency_buckets(td);
 
-	blk_throtl_assoc_bio(tg, bio);
+	blk_throtl_assoc_bio(bio);
 	blk_throtl_update_idletime(tg);
 
 	sq = &tg->service_queue;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 056fb627edb3..62715a5a4f32 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -511,12 +511,15 @@ static inline int bio_associate_blkcg_from_page(struct bio *bio,
 
 #ifdef CONFIG_BLK_CGROUP
 int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
-int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg);
+void bio_disassociate_blkg(struct bio *bio);
+void bio_associate_blkg(struct bio *bio);
 void bio_disassociate_task(struct bio *bio);
 void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
 #else	/* CONFIG_BLK_CGROUP */
 static inline int bio_associate_blkcg(struct bio *bio,
 			struct cgroup_subsys_state *blkcg_css) { return 0; }
+static inline void bio_disassociate_blkg(struct bio *bio) { }
+static inline void bio_associate_blkg(struct bio *bio) { }
 static inline void bio_disassociate_task(struct bio *bio) { }
 static inline void bio_clone_blkcg_association(struct bio *dst,
 			struct bio *src) { }
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-26 21:19 ` [PATCH 04/13] blkcg: introduce common blkg association logic Dennis Zhou
@ 2018-11-27 21:04   ` Josef Bacik
  2018-11-29 15:49   ` Tejun Heo
  2018-11-30  9:52   ` Christoph Hellwig
  2 siblings, 0 replies; 29+ messages in thread
From: Josef Bacik @ 2018-11-27 21:04 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:37PM -0500, Dennis Zhou wrote:
> There are 3 ways blkg association can happen: association with the
> current css, with the page css (swap), or from the wbc css (writeback).
> 
> This patch handles how association is done for the first case where we
> are associating bsaed on the current css. If there is already a blkg
> associated, the css will be reused and association will be redone as the
> request_queue may have changed.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply	[flat|nested] 29+ messages in thread 
- * Re: [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-26 21:19 ` [PATCH 04/13] blkcg: introduce common blkg association logic Dennis Zhou
  2018-11-27 21:04   ` Josef Bacik
@ 2018-11-29 15:49   ` Tejun Heo
  2018-11-29 19:48     ` Dennis Zhou
  2018-11-30  9:52   ` Christoph Hellwig
  2 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2018-11-29 15:49 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:37PM -0500, Dennis Zhou wrote:
> There are 3 ways blkg association can happen: association with the
> current css, with the page css (swap), or from the wbc css (writeback).
> 
> This patch handles how association is done for the first case where we
> are associating bsaed on the current css. If there is already a blkg
> associated, the css will be reused and association will be redone as the
> request_queue may have changed.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
A minor nit below.
> +/**
> + * bio_associate_blkg - associate a bio with a blkg from q
> + * @bio: target bio
> + *
> + * Associate @bio with the blkg found from the bio's css and request_queue.
> + * If one is not found, bio_lookup_blkg() creates the blkg.  If a blkg is
> + * already associated, the css is reused and association redone as the
> + * request_queue may have changed.
> + */
> +void bio_associate_blkg(struct bio *bio)
> +{
> +	struct request_queue *q;
> +	struct blkcg *blkcg;
> +	struct blkcg_gq *blkg;
> +
> +	if (!bio_has_queue(bio))
> +		return;
So, this isn't actually necessary cuz we don't stack with
request_queues but it might be more consistent to call disassociate
before returning above so that even if sth like that happens the
function always guarantees that the blkg and bio agree.
Thanks.
-- 
tejun
^ permalink raw reply	[flat|nested] 29+ messages in thread
- * Re: [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-29 15:49   ` Tejun Heo
@ 2018-11-29 19:48     ` Dennis Zhou
  0 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-29 19:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dennis Zhou, Jens Axboe, Johannes Weiner, Josef Bacik,
	kernel-team, linux-block, cgroups, linux-kernel
On Thu, Nov 29, 2018 at 07:49:17AM -0800, Tejun Heo wrote:
> On Mon, Nov 26, 2018 at 04:19:37PM -0500, Dennis Zhou wrote:
> > There are 3 ways blkg association can happen: association with the
> > current css, with the page css (swap), or from the wbc css (writeback).
> > 
> > This patch handles how association is done for the first case where we
> > are associating bsaed on the current css. If there is already a blkg
> > associated, the css will be reused and association will be redone as the
> > request_queue may have changed.
> > 
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> 
> Acked-by: Tejun Heo <tj@kernel.org>
> 
> A minor nit below.
> 
> > +/**
> > + * bio_associate_blkg - associate a bio with a blkg from q
> > + * @bio: target bio
> > + *
> > + * Associate @bio with the blkg found from the bio's css and request_queue.
> > + * If one is not found, bio_lookup_blkg() creates the blkg.  If a blkg is
> > + * already associated, the css is reused and association redone as the
> > + * request_queue may have changed.
> > + */
> > +void bio_associate_blkg(struct bio *bio)
> > +{
> > +	struct request_queue *q;
> > +	struct blkcg *blkcg;
> > +	struct blkcg_gq *blkg;
> > +
> > +	if (!bio_has_queue(bio))
> > +		return;
> 
> So, this isn't actually necessary cuz we don't stack with
> request_queues but it might be more consistent to call disassociate
> before returning above so that even if sth like that happens the
> function always guarantees that the blkg and bio agree.
Ah. That makes sense. I've changed bio_has_queue() to be
bio_blkg_association_check(). This calls disassociate and should handle
the case a bio transfers from a request_queue device to a bio driven
device. It changes the future patches slightly as we have to remember
the css before doing the check in case we want to reassociate with the
same css.
Thanks,
Dennis
^ permalink raw reply	[flat|nested] 29+ messages in thread
 
- * Re: [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-26 21:19 ` [PATCH 04/13] blkcg: introduce common blkg association logic Dennis Zhou
  2018-11-27 21:04   ` Josef Bacik
  2018-11-29 15:49   ` Tejun Heo
@ 2018-11-30  9:52   ` Christoph Hellwig
  2018-12-03 20:58     ` Dennis Zhou
  2 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-11-30  9:52 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
>  EXPORT_SYMBOL_GPL(bio_associate_blkcg);
>  
>  /**
> - * bio_associate_blkg - associate a bio with the a blkg
> + * bio_has_queue - required check for blkg association
> + * @bio: target bio
> + *
> + * A blkg represents the relationship between a blkcg and a request_queue.
> + * If there is no request_queue, there is no blkg and therefore nothing to
> + * associate with.
> + */
> +static inline bool bio_has_queue(struct bio *bio)
> +{
> +	return bio->bi_disk && bio->bi_disk->queue;
> +}
How do you ever see a bio without a queue?  We can't even do I/O in
that case.
^ permalink raw reply	[flat|nested] 29+ messages in thread
- * Re: [PATCH 04/13] blkcg: introduce common blkg association logic
  2018-11-30  9:52   ` Christoph Hellwig
@ 2018-12-03 20:58     ` Dennis Zhou
  0 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-12-03 20:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dennis Zhou, Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik,
	kernel-team, linux-block, cgroups, linux-kernel
Hi Christoph,
On Fri, Nov 30, 2018 at 01:52:09AM -0800, Christoph Hellwig wrote:
> >  EXPORT_SYMBOL_GPL(bio_associate_blkcg);
> >  
> >  /**
> > - * bio_associate_blkg - associate a bio with the a blkg
> > + * bio_has_queue - required check for blkg association
> > + * @bio: target bio
> > + *
> > + * A blkg represents the relationship between a blkcg and a request_queue.
> > + * If there is no request_queue, there is no blkg and therefore nothing to
> > + * associate with.
> > + */
> > +static inline bool bio_has_queue(struct bio *bio)
> > +{
> > +	return bio->bi_disk && bio->bi_disk->queue;
> > +}
> 
> How do you ever see a bio without a queue?  We can't even do I/O in
> that case.
The case I found was with the flush bio in dm which is statically
allocated in dm_alloc(). The issue issue is that bio_set_dev() is called
on a bdev that isn't opened. So, the bdev wasn't pointing to a genhd.
I've fixed the issue with the patch below, which will be added in v5.
I think I was being overly cautious with the change and have taken this
out in v5. It seems that this should be a one-off case which should work
with the patch below.
Thanks,
Dennis
---
From 3ee13402af369ee8618549b63593d68ffca574ca Mon Sep 17 00:00:00 2001
From: Dennis Zhou <dennis@kernel.org>
Date: Mon, 3 Dec 2018 10:56:34 -0800
Subject: [PATCH 05/14] dm: set flush bio device on demand
The next patch changes the macro bio_set_dev() to associate a bio with a
blkg based on the device set. However, dm creates a static bio to be
used as the basis for cloning empty flush bios on creation. This
association is with a not-opened bdev so bd_disk is %NULL. To easily get
around this, we will set the device on the static bio every time and use
that to copy to the other bios.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 drivers/md/dm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a733e4c920af..b5e996c5c709 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1417,10 +1417,14 @@ static int __send_empty_flush(struct clone_info *ci)
 	unsigned target_nr = 0;
 	struct dm_target *ti;
 
+	bio_set_dev(ci->bio, ci->io->md->bdev);
+
 	BUG_ON(bio_has_data(ci->bio));
 	while ((ti = dm_table_get_target(ci->map, target_nr++)))
 		__send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL);
 
+	bio_disassociate_blkg(ci->bio);
+
 	return 0;
 }
 
@@ -1939,7 +1943,6 @@ static struct mapped_device *alloc_dev(int minor)
 		goto bad;
 
 	bio_init(&md->flush_bio, NULL, 0);
-	bio_set_dev(&md->flush_bio, md->bdev);
 	md->flush_bio.bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC;
 
 	dm_stats_init(&md->stats);
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
 
 
- * [PATCH 05/13] blkcg: associate blkg when associating a device
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (3 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 04/13] blkcg: introduce common blkg association logic Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-29 15:53   ` Tejun Heo
  2018-11-30  9:54   ` Christoph Hellwig
  2018-11-26 21:19 ` [PATCH 06/13] blkcg: consolidate bio_issue_init() to be a part of core Dennis Zhou
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
Previously, blkg association was handled by controller specific code in
blk-throttle and blk-iolatency. However, because a blkg represents a
relationship between a blkcg and a request_queue, it makes sense to keep
the blkg->q and bio->bi_disk->queue consistent.
This patch moves association into the bio_set_dev macro. This should
cover the majority of cases where the device is set/changed keeping the
two pointers consistent.
Fallback code is added to blkcg_bio_issue_check() to catch any missing
paths. The check is safe here because bio->bi_disk->queue must exist for
a request to have made it to generic_make_request_checks().
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 block/bio.c                |  1 +
 block/blk-iolatency.c      |  4 +---
 block/blk-throttle.c       |  1 -
 fs/buffer.c                |  2 +-
 fs/ext4/page-io.c          |  2 +-
 include/linux/bio.h        |  7 +++++++
 include/linux/blk-cgroup.h | 10 ++++------
 mm/page_io.c               |  2 +-
 8 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index de0133329b71..929fd3692e4a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2092,6 +2092,7 @@ void bio_associate_blkg(struct bio *bio)
 
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(bio_associate_blkg);
 
 /**
  * bio_disassociate_task - undo bio_associate_current()
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index cdbd10564e66..e6b47c255521 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -472,14 +472,12 @@ static void check_scale_change(struct iolatency_grp *iolat)
 static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 {
 	struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
-	struct blkcg_gq *blkg;
+	struct blkcg_gq *blkg = bio->bi_blkg;
 	bool issue_as_root = bio_issue_as_root_blkg(bio);
 
 	if (!blk_iolatency_enabled(blkiolat))
 		return;
 
-	bio_associate_blkg(bio);
-	blkg = bio->bi_blkg;
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
 
 	while (blkg && blkg->parent) {
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 228c3a007ebc..1c6529df2002 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2118,7 +2118,6 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
 static void blk_throtl_assoc_bio(struct bio *bio)
 {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
-	bio_associate_blkg(bio);
 	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
 #endif
 }
diff --git a/fs/buffer.c b/fs/buffer.c
index 1286c2b95498..9661e5be87e2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3066,7 +3066,7 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	}
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
+	bio_set_dev_only(bio, bh->b_bdev);
 	bio->bi_write_hint = write_hint;
 
 	bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index db7590178dfc..cadf91a42fa5 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -376,7 +376,7 @@ static int io_submit_init_bio(struct ext4_io_submit *io,
 		return -ENOMEM;
 	wbc_init_bio(io->io_wbc, bio);
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
-	bio_set_dev(bio, bh->b_bdev);
+	bio_set_dev_only(bio, bh->b_bdev);
 	bio->bi_end_io = ext4_end_bio;
 	bio->bi_private = ext4_get_io_end(io->io_end);
 	io->io_bio = bio;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 62715a5a4f32..8bc9d9b29fd3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
 extern const char *bio_devname(struct bio *bio, char *buffer);
 
 #define bio_set_dev(bio, bdev) 			\
+do {						\
+	bio_set_dev_only(bio, bdev);		\
+	bio_associate_blkg(bio);		\
+} while (0)
+
+#define bio_set_dev_only(bio, bdev)		\
 do {						\
 	if ((bio)->bi_disk != (bdev)->bd_disk)	\
 		bio_clear_flag(bio, BIO_THROTTLED);\
@@ -497,6 +503,7 @@ do {						\
 do {						\
 	(dst)->bi_disk = (src)->bi_disk;	\
 	(dst)->bi_partno = (src)->bi_partno;	\
+	bio_clone_blkcg_association(dst, src);	\
 } while (0)
 
 #define bio_dev(bio) \
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 9d796a4f8ef0..81b45943ce29 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -802,21 +802,19 @@ static inline bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg
 static inline bool blkcg_bio_issue_check(struct request_queue *q,
 					 struct bio *bio)
 {
-	struct blkcg *blkcg;
 	struct blkcg_gq *blkg;
 	bool throtl = false;
 
 	rcu_read_lock();
 
-	/* associate blkcg if bio hasn't attached one */
-	bio_associate_blkcg(bio, NULL);
-	blkcg = bio_blkcg(bio);
-	blkg = blkg_lookup_create(blkcg, q);
+	if (!bio->bi_blkg)
+		bio_associate_blkg(bio);
+
+	blkg = bio->bi_blkg;
 
 	throtl = blk_throtl_bio(q, blkg, bio);
 
 	if (!throtl) {
-		blkg = blkg ?: q->root_blkg;
 		/*
 		 * If the bio is flagged with BIO_QUEUE_ENTERED it means this
 		 * is a split bio and we would have already accounted for the
diff --git a/mm/page_io.c b/mm/page_io.c
index 5bdfd21c1bd9..257fdd67308d 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -37,7 +37,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,
 		struct block_device *bdev;
 
 		bio->bi_iter.bi_sector = map_swap_page(page, &bdev);
-		bio_set_dev(bio, bdev);
+		bio_set_dev_only(bio, bdev);
 		bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
 		bio->bi_end_io = end_io;
 
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 05/13] blkcg: associate blkg when associating a device
  2018-11-26 21:19 ` [PATCH 05/13] blkcg: associate blkg when associating a device Dennis Zhou
@ 2018-11-29 15:53   ` Tejun Heo
  2018-11-29 19:54     ` Dennis Zhou
  2018-11-30  9:54   ` Christoph Hellwig
  1 sibling, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2018-11-29 15:53 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:38PM -0500, Dennis Zhou wrote:
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 62715a5a4f32..8bc9d9b29fd3 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
>  extern const char *bio_devname(struct bio *bio, char *buffer);
>  
>  #define bio_set_dev(bio, bdev) 			\
> +do {						\
> +	bio_set_dev_only(bio, bdev);		\
> +	bio_associate_blkg(bio);		\
> +} while (0)
> +
> +#define bio_set_dev_only(bio, bdev)		\
>  do {						\
>  	if ((bio)->bi_disk != (bdev)->bd_disk)	\
>  		bio_clear_flag(bio, BIO_THROTTLED);\
Generally looks okay to me but I'm not sure about bio_set_dev_only().
Maybe sth more explicit like bio_set_dev_without_blkg()?  Also, can
you please add comments explaining which should be used when?
Thanks.
-- 
tejun
^ permalink raw reply	[flat|nested] 29+ messages in thread
- * Re: [PATCH 05/13] blkcg: associate blkg when associating a device
  2018-11-29 15:53   ` Tejun Heo
@ 2018-11-29 19:54     ` Dennis Zhou
  0 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-29 19:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Thu, Nov 29, 2018 at 07:53:33AM -0800, Tejun Heo wrote:
> On Mon, Nov 26, 2018 at 04:19:38PM -0500, Dennis Zhou wrote:
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 62715a5a4f32..8bc9d9b29fd3 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
> >  extern const char *bio_devname(struct bio *bio, char *buffer);
> >  
> >  #define bio_set_dev(bio, bdev) 			\
> > +do {						\
> > +	bio_set_dev_only(bio, bdev);		\
> > +	bio_associate_blkg(bio);		\
> > +} while (0)
> > +
> > +#define bio_set_dev_only(bio, bdev)		\
> >  do {						\
> >  	if ((bio)->bi_disk != (bdev)->bd_disk)	\
> >  		bio_clear_flag(bio, BIO_THROTTLED);\
> 
> Generally looks okay to me but I'm not sure about bio_set_dev_only().
> Maybe sth more explicit like bio_set_dev_without_blkg()?  Also, can
> you please add comments explaining which should be used when?
> 
I've switched it to bio_set_dev_without_blkg() and added the comment
below and added a comment in the use cases. Let me know if the wording
is fine below.
+/*
+ * Bio device setters.  A blkg represents the relationship between a blkcg
+ * and a request_queue.  When the device is set for a bio, it becomes possible
+ * to do blkg association.  Importantly, a blkg also holds a pointer back to
+ * the request_queue.  In general, bio_set_dev() should be used to keep
+ * bio->bi_disk->queue and bio->bi_blkg->q consistent.  If blkg association
+ * will be done explicitly or is a bio based device, use
+ * bio_set_dev_without_blkg().
+ */
Thanks,
Dennis
^ permalink raw reply	[flat|nested] 29+ messages in thread
 
- * Re: [PATCH 05/13] blkcg: associate blkg when associating a device
  2018-11-26 21:19 ` [PATCH 05/13] blkcg: associate blkg when associating a device Dennis Zhou
  2018-11-29 15:53   ` Tejun Heo
@ 2018-11-30  9:54   ` Christoph Hellwig
  2018-12-03 22:52     ` Dennis Zhou
  1 sibling, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2018-11-30  9:54 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 62715a5a4f32..8bc9d9b29fd3 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
>  extern const char *bio_devname(struct bio *bio, char *buffer);
>  
>  #define bio_set_dev(bio, bdev) 			\
> +do {						\
> +	bio_set_dev_only(bio, bdev);		\
> +	bio_associate_blkg(bio);		\
> +} while (0)
> +
> +#define bio_set_dev_only(bio, bdev)		\
This lacks any explanation on when you would use bio_set_dev_only or
bio_set_dev.  Please document why we need both and why you'd choose or
the other.
^ permalink raw reply	[flat|nested] 29+ messages in thread
- * Re: [PATCH 05/13] blkcg: associate blkg when associating a device
  2018-11-30  9:54   ` Christoph Hellwig
@ 2018-12-03 22:52     ` Dennis Zhou
  0 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-12-03 22:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dennis Zhou, Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik,
	kernel-team, linux-block, cgroups, linux-kernel
On Fri, Nov 30, 2018 at 01:54:26AM -0800, Christoph Hellwig wrote:
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 62715a5a4f32..8bc9d9b29fd3 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx);
> >  extern const char *bio_devname(struct bio *bio, char *buffer);
> >  
> >  #define bio_set_dev(bio, bdev) 			\
> > +do {						\
> > +	bio_set_dev_only(bio, bdev);		\
> > +	bio_associate_blkg(bio);		\
> > +} while (0)
> > +
> > +#define bio_set_dev_only(bio, bdev)		\
> 
> This lacks any explanation on when you would use bio_set_dev_only or
> bio_set_dev.  Please document why we need both and why you'd choose or
> the other.
I realized after thinking about this more and checking more use cases
that it isn't as simple as swapping macro uses because many of the
callers share common bio allocation paths. I think the simplest way
forward is to have writeback and swap do reassociation and split out bio
init code in a future series. So in v5, there is only bio_set_dev().
Thanks,
Dennis
^ permalink raw reply	[flat|nested] 29+ messages in thread
 
 
- * [PATCH 06/13] blkcg: consolidate bio_issue_init() to be a part of core
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (4 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 05/13] blkcg: associate blkg when associating a device Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 07/13] blkcg: associate a blkg for pages being evicted by swap Dennis Zhou
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 block/bio.c                | 2 ++
 block/blk-iolatency.c      | 2 --
 block/blk-throttle.c       | 8 --------
 block/bounce.c             | 2 ++
 include/linux/blk-cgroup.h | 9 +++++++++
 5 files changed, 13 insertions(+), 10 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 929fd3692e4a..efca5ae607a4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -611,6 +611,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_io_vec = bio_src->bi_io_vec;
 
 	bio_clone_blkcg_association(bio, bio_src);
+
+	blkcg_bio_issue_init(bio);
 }
 EXPORT_SYMBOL(__bio_clone_fast);
 
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index e6b47c255521..5a79f06a730d 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -478,8 +478,6 @@ static void blkcg_iolatency_throttle(struct rq_qos *rqos, struct bio *bio)
 	if (!blk_iolatency_enabled(blkiolat))
 		return;
 
-	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
-
 	while (blkg && blkg->parent) {
 		struct iolatency_grp *iolat = blkg_to_lat(blkg);
 		if (!iolat) {
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 1c6529df2002..1b97a73d2fb1 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2115,13 +2115,6 @@ static inline void throtl_update_latency_buckets(struct throtl_data *td)
 }
 #endif
 
-static void blk_throtl_assoc_bio(struct bio *bio)
-{
-#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
-	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
-#endif
-}
-
 bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 		    struct bio *bio)
 {
@@ -2142,7 +2135,6 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 
 	throtl_update_latency_buckets(td);
 
-	blk_throtl_assoc_bio(bio);
 	blk_throtl_update_idletime(tg);
 
 	sq = &tg->service_queue;
diff --git a/block/bounce.c b/block/bounce.c
index 559c55bda040..e77a44a80ac2 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -279,6 +279,8 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
 
 	bio_clone_blkcg_association(bio, bio_src);
 
+	blkcg_bio_issue_init(bio);
+
 	return bio;
 }
 
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 81b45943ce29..58eb91a9aa8d 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -799,6 +799,12 @@ static inline bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg
 				  struct bio *bio) { return false; }
 #endif
 
+
+static inline void blkcg_bio_issue_init(struct bio *bio)
+{
+	bio_issue_init(&bio->bi_issue, bio_sectors(bio));
+}
+
 static inline bool blkcg_bio_issue_check(struct request_queue *q,
 					 struct bio *bio)
 {
@@ -826,6 +832,8 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 		blkg_rwstat_add(&blkg->stat_ios, bio->bi_opf, 1);
 	}
 
+	blkcg_bio_issue_init(bio);
+
 	rcu_read_unlock();
 	return !throtl;
 }
@@ -932,6 +940,7 @@ static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; }
 static inline void blkg_get(struct blkcg_gq *blkg) { }
 static inline void blkg_put(struct blkcg_gq *blkg) { }
 
+static inline void blkcg_bio_issue_init(struct bio *bio) { }
 static inline bool blkcg_bio_issue_check(struct request_queue *q,
 					 struct bio *bio) { return true; }
 
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 07/13] blkcg: associate a blkg for pages being evicted by swap
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (5 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 06/13] blkcg: consolidate bio_issue_init() to be a part of core Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 08/13] blkcg: associate writeback bios with a blkg Dennis Zhou
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bio.c         | 62 +++++++++++++++++++++++++++------------------
 include/linux/bio.h |  6 ++---
 mm/page_io.c        |  2 +-
 3 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index efca5ae607a4..4600808ac606 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1958,30 +1958,6 @@ EXPORT_SYMBOL(bioset_init_from_src);
 
 #ifdef CONFIG_BLK_CGROUP
 
-#ifdef CONFIG_MEMCG
-/**
- * bio_associate_blkcg_from_page - associate a bio with the page's blkcg
- * @bio: target bio
- * @page: the page to lookup the blkcg from
- *
- * Associate @bio with the blkcg from @page's owning memcg.  This works like
- * every other associate function wrt references.
- */
-int bio_associate_blkcg_from_page(struct bio *bio, struct page *page)
-{
-	struct cgroup_subsys_state *blkcg_css;
-
-	if (unlikely(bio->bi_css))
-		return -EBUSY;
-	if (!page->mem_cgroup)
-		return 0;
-	blkcg_css = cgroup_get_e_css(page->mem_cgroup->css.cgroup,
-				     &io_cgrp_subsys);
-	bio->bi_css = blkcg_css;
-	return 0;
-}
-#endif /* CONFIG_MEMCG */
-
 /**
  * bio_associate_blkcg - associate a bio with the specified blkcg
  * @bio: target bio
@@ -2059,6 +2035,44 @@ static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
 	bio->bi_blkg = blkg_try_get_closest(blkg);
 }
 
+static void __bio_associate_blkg_from_css(struct bio *bio,
+					  struct cgroup_subsys_state *css)
+{
+	struct blkcg_gq *blkg;
+
+	rcu_read_lock();
+
+	blkg = blkg_lookup_create(css_to_blkcg(css), bio->bi_disk->queue);
+	__bio_associate_blkg(bio, blkg);
+
+	rcu_read_unlock();
+}
+
+#ifdef CONFIG_MEMCG
+/**
+ * bio_associate_blkg_from_page - associate a bio with the page's blkg
+ * @bio: target bio
+ * @page: the page to lookup the blkcg from
+ *
+ * Associate @bio with the blkg from @page's owning memcg and the respective
+ * request_queue.  This works like every other associate function wrt
+ * references.
+ */
+void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
+{
+	struct cgroup_subsys_state *css;
+
+	if (unlikely(bio->bi_css))
+		return;
+	if (!bio_has_queue(bio) || !page->mem_cgroup)
+		return;
+
+	css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
+	bio->bi_css = css;
+	__bio_associate_blkg_from_css(bio, css);
+}
+#endif /* CONFIG_MEMCG */
+
 /**
  * bio_associate_blkg - associate a bio with a blkg from q
  * @bio: target bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 8bc9d9b29fd3..ad322232db3e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -510,10 +510,10 @@ do {						\
 	disk_devt((bio)->bi_disk)
 
 #if defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP)
-int bio_associate_blkcg_from_page(struct bio *bio, struct page *page);
+void bio_associate_blkg_from_page(struct bio *bio, struct page *page);
 #else
-static inline int bio_associate_blkcg_from_page(struct bio *bio,
-						struct page *page) {  return 0; }
+static inline void bio_associate_blkg_from_page(struct bio *bio,
+						struct page *page) { }
 #endif
 
 #ifdef CONFIG_BLK_CGROUP
diff --git a/mm/page_io.c b/mm/page_io.c
index 257fdd67308d..409dcd438488 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -339,7 +339,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		goto out;
 	}
 	bio->bi_opf = REQ_OP_WRITE | REQ_SWAP | wbc_to_write_flags(wbc);
-	bio_associate_blkcg_from_page(bio, page);
+	bio_associate_blkg_from_page(bio, page);
 	count_swpout_vm_event(page);
 	set_page_writeback(page);
 	unlock_page(page);
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 08/13] blkcg: associate writeback bios with a blkg
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (6 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 07/13] blkcg: associate a blkg for pages being evicted by swap Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 09/13] blkcg: remove bio->bi_css and instead use bio->bi_blkg Dennis Zhou
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
One of the goals of this series is to remove a separate reference to
the css of the bio. This can and should be accessed via bio_blkcg. In
this patch, wbc_init_bio() now requires a bio to have a device
associated with it.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 Documentation/admin-guide/cgroup-v2.rst |  8 +++++---
 block/bio.c                             | 21 +++++++++++++++++++++
 fs/buffer.c                             | 10 +++++-----
 fs/ext4/page-io.c                       |  2 +-
 include/linux/bio.h                     |  5 +++++
 include/linux/writeback.h               |  5 +++--
 6 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 476722b7b636..baf19bf28385 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1879,8 +1879,10 @@ following two functions.
 
   wbc_init_bio(@wbc, @bio)
 	Should be called for each bio carrying writeback data and
-	associates the bio with the inode's owner cgroup.  Can be
-	called anytime between bio allocation and submission.
+	associates the bio with the inode's owner cgroup and the
+	corresponding request queue.  This must be called after
+	a queue (device) has been associated with the bio and
+	before submission.
 
   wbc_account_io(@wbc, @page, @bytes)
 	Should be called for each data segment being written out.
@@ -1899,7 +1901,7 @@ the configuration, the bio may be executed at a lower priority and if
 the writeback session is holding shared resources, e.g. a journal
 entry, may lead to priority inversion.  There is no one easy solution
 for the problem.  Filesystems can try to work around specific problem
-cases by skipping wbc_init_bio() or using bio_associate_blkcg()
+cases by skipping wbc_init_bio() and using bio_associate_blkg()
 directly.
 
 
diff --git a/block/bio.c b/block/bio.c
index 4600808ac606..e9e930b6a10c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2048,6 +2048,27 @@ static void __bio_associate_blkg_from_css(struct bio *bio,
 	rcu_read_unlock();
 }
 
+/**
+ * bio_associate_blkg_from_css - associate a bio with a specified css
+ * @bio: target bio
+ * @css: target css
+ *
+ * Associate @bio with the blkg found by combining the css's blkg and the
+ * request_queue of the @bio.  This takes a reference on the css that will
+ * be put upon freeing of @bio.
+ */
+void bio_associate_blkg_from_css(struct bio *bio,
+				 struct cgroup_subsys_state *css)
+{
+	if (!bio_has_queue(bio))
+		return;
+
+	css_get(css);
+	bio->bi_css = css;
+	__bio_associate_blkg_from_css(bio, css);
+}
+EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
+
 #ifdef CONFIG_MEMCG
 /**
  * bio_associate_blkg_from_page - associate a bio with the page's blkg
diff --git a/fs/buffer.c b/fs/buffer.c
index 9661e5be87e2..45452bd55c8e 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3060,11 +3060,6 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	 */
 	bio = bio_alloc(GFP_NOIO, 1);
 
-	if (wbc) {
-		wbc_init_bio(wbc, bio);
-		wbc_account_io(wbc, bh->b_page, bh->b_size);
-	}
-
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio_set_dev_only(bio, bh->b_bdev);
 	bio->bi_write_hint = write_hint;
@@ -3084,6 +3079,11 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 		op_flags |= REQ_PRIO;
 	bio_set_op_attrs(bio, op, op_flags);
 
+	if (wbc) {
+		wbc_init_bio(wbc, bio);
+		wbc_account_io(wbc, bh->b_page, bh->b_size);
+	}
+
 	submit_bio(bio);
 	return 0;
 }
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index cadf91a42fa5..ab383085fb5e 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -374,13 +374,13 @@ static int io_submit_init_bio(struct ext4_io_submit *io,
 	bio = bio_alloc(GFP_NOIO, BIO_MAX_PAGES);
 	if (!bio)
 		return -ENOMEM;
-	wbc_init_bio(io->io_wbc, bio);
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio_set_dev_only(bio, bh->b_bdev);
 	bio->bi_end_io = ext4_end_bio;
 	bio->bi_private = ext4_get_io_end(io->io_end);
 	io->io_bio = bio;
 	io->io_next_block = bh->b_blocknr;
+	wbc_init_bio(io->io_wbc, bio);
 	return 0;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ad322232db3e..a9958f2bcb48 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -520,6 +520,8 @@ static inline void bio_associate_blkg_from_page(struct bio *bio,
 int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
 void bio_disassociate_blkg(struct bio *bio);
 void bio_associate_blkg(struct bio *bio);
+void bio_associate_blkg_from_css(struct bio *bio,
+				 struct cgroup_subsys_state *css);
 void bio_disassociate_task(struct bio *bio);
 void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
 #else	/* CONFIG_BLK_CGROUP */
@@ -527,6 +529,9 @@ static inline int bio_associate_blkcg(struct bio *bio,
 			struct cgroup_subsys_state *blkcg_css) { return 0; }
 static inline void bio_disassociate_blkg(struct bio *bio) { }
 static inline void bio_associate_blkg(struct bio *bio) { }
+static inline void bio_associate_blkg_from_css(struct bio *bio,
+					       struct cgroup_subsys_state *css)
+{ }
 static inline void bio_disassociate_task(struct bio *bio) { }
 static inline void bio_clone_blkcg_association(struct bio *dst,
 			struct bio *src) { }
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fdfd04e348f6..738a0c24874f 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -246,7 +246,8 @@ static inline void wbc_attach_fdatawrite_inode(struct writeback_control *wbc,
  *
  * @bio is a part of the writeback in progress controlled by @wbc.  Perform
  * writeback specific initialization.  This is used to apply the cgroup
- * writeback context.
+ * writeback context.  Must be called after the bio has been associated with
+ * a device.
  */
 static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
 {
@@ -257,7 +258,7 @@ static inline void wbc_init_bio(struct writeback_control *wbc, struct bio *bio)
 	 * regular writeback instead of writing things out itself.
 	 */
 	if (wbc->wb)
-		bio_associate_blkcg(bio, wbc->wb->blkcg_css);
+		bio_associate_blkg_from_css(bio, wbc->wb->blkcg_css);
 }
 
 #else	/* CONFIG_CGROUP_WRITEBACK */
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 09/13] blkcg: remove bio->bi_css and instead use bio->bi_blkg
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (7 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 08/13] blkcg: associate writeback bios with a blkg Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 10/13] blkcg: remove additional reference to the css Dennis Zhou
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
Prior patches ensured that any bio that interacts with a request_queue
is properly associated with a blkg. This makes bio->bi_css unnecessary
as blkg maintains a reference to blkcg already.
This removes the bio field bi_css and transfers corresponding uses to
access via bi_blkg.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bio.c                | 56 +++++++-------------------------------
 block/bounce.c             |  2 +-
 drivers/block/loop.c       |  5 ++--
 drivers/md/raid0.c         |  2 +-
 include/linux/bio.h        | 11 +++-----
 include/linux/blk-cgroup.h |  8 +++---
 include/linux/blk_types.h  |  7 +++--
 kernel/trace/blktrace.c    |  4 +--
 8 files changed, 29 insertions(+), 66 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index e9e930b6a10c..94922069c3d8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -610,7 +610,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	bio->bi_iter = bio_src->bi_iter;
 	bio->bi_io_vec = bio_src->bi_io_vec;
 
-	bio_clone_blkcg_association(bio, bio_src);
+	bio_clone_blkg_association(bio, bio_src);
 
 	blkcg_bio_issue_init(bio);
 }
@@ -1958,34 +1958,6 @@ EXPORT_SYMBOL(bioset_init_from_src);
 
 #ifdef CONFIG_BLK_CGROUP
 
-/**
- * bio_associate_blkcg - associate a bio with the specified blkcg
- * @bio: target bio
- * @blkcg_css: css of the blkcg to associate
- *
- * Associate @bio with the blkcg specified by @blkcg_css.  Block layer will
- * treat @bio as if it were issued by a task which belongs to the blkcg.
- *
- * This function takes an extra reference of @blkcg_css which will be put
- * when @bio is released.  The caller must own @bio and is responsible for
- * synchronizing calls to this function.  If @blkcg_css is NULL, a call to
- * blkcg_get_css() finds the current css from the kthread or task.
- */
-int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css)
-{
-	if (unlikely(bio->bi_css))
-		return -EBUSY;
-
-	if (blkcg_css)
-		css_get(blkcg_css);
-	else
-		blkcg_css = blkcg_get_css();
-
-	bio->bi_css = blkcg_css;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(bio_associate_blkcg);
-
 /**
  * bio_has_queue - required check for blkg association
  * @bio: target bio
@@ -2008,6 +1980,8 @@ static inline bool bio_has_queue(struct bio *bio)
 void bio_disassociate_blkg(struct bio *bio)
 {
 	if (bio->bi_blkg) {
+		/* a ref is always taken on css */
+		css_put(&bio_blkcg(bio)->css);
 		blkg_put(bio->bi_blkg);
 		bio->bi_blkg = NULL;
 	}
@@ -2064,7 +2038,6 @@ void bio_associate_blkg_from_css(struct bio *bio,
 		return;
 
 	css_get(css);
-	bio->bi_css = css;
 	__bio_associate_blkg_from_css(bio, css);
 }
 EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
@@ -2083,13 +2056,10 @@ void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 {
 	struct cgroup_subsys_state *css;
 
-	if (unlikely(bio->bi_css))
-		return;
 	if (!bio_has_queue(bio) || !page->mem_cgroup)
 		return;
 
 	css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
-	bio->bi_css = css;
 	__bio_associate_blkg_from_css(bio, css);
 }
 #endif /* CONFIG_MEMCG */
@@ -2116,8 +2086,7 @@ void bio_associate_blkg(struct bio *bio)
 
 	rcu_read_lock();
 
-	bio_associate_blkcg(bio, NULL);
-	blkcg = bio_blkcg(bio);
+	blkcg = css_to_blkcg(blkcg_get_css());
 
 	if (!blkcg->css.parent) {
 		__bio_associate_blkg(bio, q->root_blkg);
@@ -2137,27 +2106,22 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
  */
 void bio_disassociate_task(struct bio *bio)
 {
-	if (bio->bi_css) {
-		css_put(bio->bi_css);
-		bio->bi_css = NULL;
-	}
 	bio_disassociate_blkg(bio);
 }
 
 /**
- * bio_clone_blkcg_association - clone blkcg association from src to dst bio
+ * bio_clone_blkg_association - clone blkg association from src to dst bio
  * @dst: destination bio
  * @src: source bio
  */
-void bio_clone_blkcg_association(struct bio *dst, struct bio *src)
+void bio_clone_blkg_association(struct bio *dst, struct bio *src)
 {
-	if (src->bi_css)
-		WARN_ON(bio_associate_blkcg(dst, src->bi_css));
-
-	if (src->bi_blkg)
+	if (src->bi_blkg) {
+		css_get(&bio_blkcg(src)->css);
 		__bio_associate_blkg(dst, src->bi_blkg);
+	}
 }
-EXPORT_SYMBOL_GPL(bio_clone_blkcg_association);
+EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
 #endif /* CONFIG_BLK_CGROUP */
 
 static void __init biovec_init_slabs(void)
diff --git a/block/bounce.c b/block/bounce.c
index e77a44a80ac2..8f35c28a0cef 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -277,7 +277,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src, gfp_t gfp_mask,
 		}
 	}
 
-	bio_clone_blkcg_association(bio, bio_src);
+	bio_clone_blkg_association(bio, bio_src);
 
 	blkcg_bio_issue_init(bio);
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 176ab1f28eca..0770004616de 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -77,6 +77,7 @@
 #include <linux/falloc.h>
 #include <linux/uio.h>
 #include <linux/ioprio.h>
+#include <linux/blk-cgroup.h>
 
 #include "loop.h"
 
@@ -1820,8 +1821,8 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	/* always use the first bio's css */
 #ifdef CONFIG_BLK_CGROUP
-	if (cmd->use_aio && rq->bio && rq->bio->bi_css) {
-		cmd->css = rq->bio->bi_css;
+	if (cmd->use_aio && rq->bio && rq->bio->bi_blkg) {
+		cmd->css = &bio_blkcg(rq->bio)->css;
 		css_get(cmd->css);
 	} else
 #endif
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index ac1cffd2a09b..f3fb5bb8c82a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -542,7 +542,7 @@ static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 		    !discard_bio)
 			continue;
 		bio_chain(discard_bio, bio);
-		bio_clone_blkcg_association(discard_bio, bio);
+		bio_clone_blkg_association(discard_bio, bio);
 		if (mddev->gendisk)
 			trace_block_bio_remap(bdev_get_queue(rdev->bdev),
 				discard_bio, disk_devt(mddev->gendisk),
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a9958f2bcb48..7b8bb1365977 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -503,7 +503,7 @@ do {						\
 do {						\
 	(dst)->bi_disk = (src)->bi_disk;	\
 	(dst)->bi_partno = (src)->bi_partno;	\
-	bio_clone_blkcg_association(dst, src);	\
+	bio_clone_blkg_association(dst, src);	\
 } while (0)
 
 #define bio_dev(bio) \
@@ -517,24 +517,21 @@ static inline void bio_associate_blkg_from_page(struct bio *bio,
 #endif
 
 #ifdef CONFIG_BLK_CGROUP
-int bio_associate_blkcg(struct bio *bio, struct cgroup_subsys_state *blkcg_css);
 void bio_disassociate_blkg(struct bio *bio);
 void bio_associate_blkg(struct bio *bio);
 void bio_associate_blkg_from_css(struct bio *bio,
 				 struct cgroup_subsys_state *css);
 void bio_disassociate_task(struct bio *bio);
-void bio_clone_blkcg_association(struct bio *dst, struct bio *src);
+void bio_clone_blkg_association(struct bio *dst, struct bio *src);
 #else	/* CONFIG_BLK_CGROUP */
-static inline int bio_associate_blkcg(struct bio *bio,
-			struct cgroup_subsys_state *blkcg_css) { return 0; }
 static inline void bio_disassociate_blkg(struct bio *bio) { }
 static inline void bio_associate_blkg(struct bio *bio) { }
 static inline void bio_associate_blkg_from_css(struct bio *bio,
 					       struct cgroup_subsys_state *css)
 { }
 static inline void bio_disassociate_task(struct bio *bio) { }
-static inline void bio_clone_blkcg_association(struct bio *dst,
-			struct bio *src) { }
+static inline void bio_clone_blkg_association(struct bio *dst,
+					      struct bio *src) { }
 #endif	/* CONFIG_BLK_CGROUP */
 
 #ifdef CONFIG_HIGHMEM
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 58eb91a9aa8d..e777de0d45d1 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -308,8 +308,8 @@ static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
  */
 static inline struct blkcg *__bio_blkcg(struct bio *bio)
 {
-	if (bio && bio->bi_css)
-		return css_to_blkcg(bio->bi_css);
+	if (bio && bio->bi_blkg)
+		return bio->bi_blkg->blkcg;
 	return css_to_blkcg(blkcg_css());
 }
 
@@ -323,8 +323,8 @@ static inline struct blkcg *__bio_blkcg(struct bio *bio)
  */
 static inline struct blkcg *bio_blkcg(struct bio *bio)
 {
-	if (bio && bio->bi_css)
-		return css_to_blkcg(bio->bi_css);
+	if (bio && bio->bi_blkg)
+		return bio->bi_blkg->blkcg;
 	return NULL;
 }
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index c0ba1a038ff3..46c005d601ac 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -174,10 +174,11 @@ struct bio {
 	void			*bi_private;
 #ifdef CONFIG_BLK_CGROUP
 	/*
-	 * Optional css associated with this bio.  Put on bio
-	 * release.  Read comment on top of bio_associate_current().
+	 * Represents the association of the css and request_queue for the bio.
+	 * If a bio goes direct to device, it will not have a blkg as it will
+	 * not have a request_queue associated with it.  The reference is put
+	 * on release of the bio.
 	 */
-	struct cgroup_subsys_state *bi_css;
 	struct blkcg_gq		*bi_blkg;
 	struct bio_issue	bi_issue;
 #endif
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 2868d85f1fb1..fac0ddf8a8e2 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -764,9 +764,9 @@ blk_trace_bio_get_cgid(struct request_queue *q, struct bio *bio)
 	if (!bt || !(blk_tracer_flags.val & TRACE_BLK_OPT_CGROUP))
 		return NULL;
 
-	if (!bio->bi_css)
+	if (!bio->bi_blkg)
 		return NULL;
-	return cgroup_get_kernfs_id(bio->bi_css->cgroup);
+	return cgroup_get_kernfs_id(bio_blkcg(bio)->css.cgroup);
 }
 #else
 static union kernfs_node_id *
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 10/13] blkcg: remove additional reference to the css
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (8 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 09/13] blkcg: remove bio->bi_css and instead use bio->bi_blkg Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 11/13] blkcg: remove bio_disassociate_task() Dennis Zhou
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.
Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bio.c                | 60 +++++++++++++++++++++-----------------
 include/linux/blk-cgroup.h | 41 --------------------------
 include/linux/cgroup.h     |  2 ++
 kernel/cgroup/cgroup.c     | 48 ++++++++++++++++++++++++------
 4 files changed, 74 insertions(+), 77 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 94922069c3d8..f713aa236ac5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1980,8 +1980,6 @@ static inline bool bio_has_queue(struct bio *bio)
 void bio_disassociate_blkg(struct bio *bio)
 {
 	if (bio->bi_blkg) {
-		/* a ref is always taken on css */
-		css_put(&bio_blkcg(bio)->css);
 		blkg_put(bio->bi_blkg);
 		bio->bi_blkg = NULL;
 	}
@@ -2009,27 +2007,40 @@ static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
 	bio->bi_blkg = blkg_try_get_closest(blkg);
 }
 
+/**
+ * __bio_associate_blkg_from_css - associate a bio with a specified css
+ * @bio: target bio
+ * @css: target css
+ *
+ * Associate @bio with the blkg found by combining the css's blkg and the
+ * request_queue of the @bio.  This falls back to the queue's root_blkg if
+ * the association fails with the css.
+ */
 static void __bio_associate_blkg_from_css(struct bio *bio,
 					  struct cgroup_subsys_state *css)
 {
+	struct request_queue *q = bio->bi_disk->queue;
 	struct blkcg_gq *blkg;
 
 	rcu_read_lock();
 
-	blkg = blkg_lookup_create(css_to_blkcg(css), bio->bi_disk->queue);
+	if (!css || !css->parent)
+		blkg = q->root_blkg;
+	else
+		blkg = blkg_lookup_create(css_to_blkcg(css), q);
+
 	__bio_associate_blkg(bio, blkg);
 
 	rcu_read_unlock();
 }
 
 /**
- * bio_associate_blkg_from_css - associate a bio with a specified css
+ * bio_associate_blkg_from_css - wrapper to call __bio_associate_blkg_from_css
  * @bio: target bio
  * @css: target css
  *
- * Associate @bio with the blkg found by combining the css's blkg and the
- * request_queue of the @bio.  This takes a reference on the css that will
- * be put upon freeing of @bio.
+ * Association can only happen if a request_queue exists, so check before
+ * preceding.
  */
 void bio_associate_blkg_from_css(struct bio *bio,
 				 struct cgroup_subsys_state *css)
@@ -2037,7 +2048,6 @@ void bio_associate_blkg_from_css(struct bio *bio,
 	if (!bio_has_queue(bio))
 		return;
 
-	css_get(css);
 	__bio_associate_blkg_from_css(bio, css);
 }
 EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
@@ -2049,8 +2059,8 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css);
  * @page: the page to lookup the blkcg from
  *
  * Associate @bio with the blkg from @page's owning memcg and the respective
- * request_queue.  This works like every other associate function wrt
- * references.
+ * request_queue.  If cgroup_e_css returns NULL, fall back to the queue's
+ * root_blkg.
  */
 void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 {
@@ -2059,8 +2069,12 @@ void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 	if (!bio_has_queue(bio) || !page->mem_cgroup)
 		return;
 
-	css = cgroup_get_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
+	rcu_read_lock();
+
+	css = cgroup_e_css(page->mem_cgroup->css.cgroup, &io_cgrp_subsys);
 	__bio_associate_blkg_from_css(bio, css);
+
+	rcu_read_unlock();
 }
 #endif /* CONFIG_MEMCG */
 
@@ -2075,26 +2089,20 @@ void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
  */
 void bio_associate_blkg(struct bio *bio)
 {
-	struct request_queue *q;
-	struct blkcg *blkcg;
-	struct blkcg_gq *blkg;
+	struct cgroup_subsys_state *css;
 
 	if (!bio_has_queue(bio))
 		return;
 
-	q = bio->bi_disk->queue;
-
 	rcu_read_lock();
 
-	blkcg = css_to_blkcg(blkcg_get_css());
-
-	if (!blkcg->css.parent) {
-		__bio_associate_blkg(bio, q->root_blkg);
-	} else {
-		blkg = blkg_lookup_create(blkcg, q);
+	/* if a css has already been assigned, continue using it */
+	if (bio->bi_blkg)
+		css = &bio_blkcg(bio)->css;
+	else
+		css = blkcg_css();
 
-		__bio_associate_blkg(bio, blkg);
-	}
+	__bio_associate_blkg_from_css(bio, css);
 
 	rcu_read_unlock();
 }
@@ -2116,10 +2124,8 @@ void bio_disassociate_task(struct bio *bio)
  */
 void bio_clone_blkg_association(struct bio *dst, struct bio *src)
 {
-	if (src->bi_blkg) {
-		css_get(&bio_blkcg(src)->css);
+	if (src->bi_blkg)
 		__bio_associate_blkg(dst, src->bi_blkg);
-	}
 }
 EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
 #endif /* CONFIG_BLK_CGROUP */
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index e777de0d45d1..6f3bb5e82a57 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -246,47 +246,6 @@ static inline struct cgroup_subsys_state *blkcg_css(void)
 	return task_css(current, io_cgrp_id);
 }
 
-/**
- * blkcg_get_css - find and get a reference to the css
- *
- * Find the css associated with either the kthread or the current task.
- * This takes a reference on the blkcg which will need to be managed by the
- * caller.
- */
-static inline struct cgroup_subsys_state *blkcg_get_css(void)
-{
-	struct cgroup_subsys_state *css;
-
-	rcu_read_lock();
-
-	css = kthread_blkcg();
-	if (css) {
-		css_get(css);
-	} else {
-		/*
-		 * This is a bit complicated.  It is possible task_css is seeing
-		 * an old css pointer here.  This is caused by the current
-		 * thread migrating away from this cgroup and this cgroup dying.
-		 * css_tryget() will fail when trying to take a ref on a cgroup
-		 * that's ref count has hit 0.
-		 *
-		 * Therefore, if it does fail, this means current must have
-		 * been swapped away already and this is waiting for it to
-		 * propagate on the polling cpu.  Hence the use of cpu_relax().
-		 */
-		while (true) {
-			css = task_css(current, io_cgrp_id);
-			if (likely(css_tryget(css)))
-				break;
-			cpu_relax();
-		}
-	}
-
-	rcu_read_unlock();
-
-	return css;
-}
-
 static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css)
 {
 	return css ? container_of(css, struct blkcg, css) : NULL;
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9d12757a65b0..9968332cceed 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -93,6 +93,8 @@ extern struct css_set init_css_set;
 
 bool css_has_online_children(struct cgroup_subsys_state *css);
 struct cgroup_subsys_state *css_from_id(int id, struct cgroup_subsys *ss);
+struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgroup,
+					 struct cgroup_subsys *ss);
 struct cgroup_subsys_state *cgroup_get_e_css(struct cgroup *cgroup,
 					     struct cgroup_subsys *ss);
 struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry,
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6aaf5dd5383b..8b79318810ad 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -493,7 +493,7 @@ static struct cgroup_subsys_state *cgroup_tryget_css(struct cgroup *cgrp,
 }
 
 /**
- * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem
+ * cgroup_e_css_by_mask - obtain a cgroup's effective css for the specified ss
  * @cgrp: the cgroup of interest
  * @ss: the subsystem of interest (%NULL returns @cgrp->self)
  *
@@ -502,8 +502,8 @@ static struct cgroup_subsys_state *cgroup_tryget_css(struct cgroup *cgrp,
  * enabled.  If @ss is associated with the hierarchy @cgrp is on, this
  * function is guaranteed to return non-NULL css.
  */
-static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
-						struct cgroup_subsys *ss)
+static struct cgroup_subsys_state *cgroup_e_css_by_mask(struct cgroup *cgrp,
+							struct cgroup_subsys *ss)
 {
 	lockdep_assert_held(&cgroup_mutex);
 
@@ -523,6 +523,35 @@ static struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
 	return cgroup_css(cgrp, ss);
 }
 
+/**
+ * cgroup_e_css - obtain a cgroup's effective css for the specified subsystem
+ * @cgrp: the cgroup of interest
+ * @ss: the subsystem of interest
+ *
+ * Find and get the effective css of @cgrp for @ss.  The effective css is
+ * defined as the matching css of the nearest ancestor including self which
+ * has @ss enabled.  If @ss is not mounted on the hierarchy @cgrp is on,
+ * the root css is returned, so this function always returns a valid css.
+ *
+ * The returned css is not guaranteed to be online, and therefore it is the
+ * callers responsiblity to tryget a reference for it.
+ */
+struct cgroup_subsys_state *cgroup_e_css(struct cgroup *cgrp,
+					 struct cgroup_subsys *ss)
+{
+	struct cgroup_subsys_state *css;
+
+	do {
+		css = cgroup_css(cgrp, ss);
+
+		if (css)
+			return css;
+		cgrp = cgroup_parent(cgrp);
+	} while (cgrp);
+
+	return init_css_set.subsys[ss->id];
+}
+
 /**
  * cgroup_get_e_css - get a cgroup's effective css for the specified subsystem
  * @cgrp: the cgroup of interest
@@ -605,10 +634,11 @@ EXPORT_SYMBOL_GPL(of_css);
  *
  * Should be called under cgroup_[tree_]mutex.
  */
-#define for_each_e_css(css, ssid, cgrp)					\
-	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	\
-		if (!((css) = cgroup_e_css(cgrp, cgroup_subsys[(ssid)]))) \
-			;						\
+#define for_each_e_css(css, ssid, cgrp)					    \
+	for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT; (ssid)++)	    \
+		if (!((css) = cgroup_e_css_by_mask(cgrp,		    \
+						   cgroup_subsys[(ssid)]))) \
+			;						    \
 		else
 
 /**
@@ -1007,7 +1037,7 @@ static struct css_set *find_existing_css_set(struct css_set *old_cset,
 			 * @ss is in this hierarchy, so we want the
 			 * effective css from @cgrp.
 			 */
-			template[i] = cgroup_e_css(cgrp, ss);
+			template[i] = cgroup_e_css_by_mask(cgrp, ss);
 		} else {
 			/*
 			 * @ss is not in this hierarchy, so we don't want
@@ -3024,7 +3054,7 @@ static int cgroup_apply_control(struct cgroup *cgrp)
 		return ret;
 
 	/*
-	 * At this point, cgroup_e_css() results reflect the new csses
+	 * At this point, cgroup_e_css_by_mask() results reflect the new csses
 	 * making the following cgroup_update_dfl_csses() properly update
 	 * css associations of all tasks in the subtree.
 	 */
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 11/13] blkcg: remove bio_disassociate_task()
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (9 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 10/13] blkcg: remove additional reference to the css Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-29 15:54   ` Tejun Heo
  2018-11-26 21:19 ` [PATCH 12/13] blkcg: change blkg reference counting to use percpu_ref Dennis Zhou
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
Now that a bio only holds a blkg reference, so clean up is simply
putting back that reference. Remove bio_disassociate_task() as it just
calls bio_disassociate_blkg() and call the latter directly.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 block/bio.c         | 11 +----------
 include/linux/bio.h |  2 --
 2 files changed, 1 insertion(+), 12 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index f713aa236ac5..41cc2ead39f5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -244,7 +244,7 @@ struct bio_vec *bvec_alloc(gfp_t gfp_mask, int nr, unsigned long *idx,
 
 void bio_uninit(struct bio *bio)
 {
-	bio_disassociate_task(bio);
+	bio_disassociate_blkg(bio);
 }
 EXPORT_SYMBOL(bio_uninit);
 
@@ -2108,15 +2108,6 @@ void bio_associate_blkg(struct bio *bio)
 }
 EXPORT_SYMBOL_GPL(bio_associate_blkg);
 
-/**
- * bio_disassociate_task - undo bio_associate_current()
- * @bio: target bio
- */
-void bio_disassociate_task(struct bio *bio)
-{
-	bio_disassociate_blkg(bio);
-}
-
 /**
  * bio_clone_blkg_association - clone blkg association from src to dst bio
  * @dst: destination bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7b8bb1365977..419fdf5f8ab6 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -521,7 +521,6 @@ void bio_disassociate_blkg(struct bio *bio);
 void bio_associate_blkg(struct bio *bio);
 void bio_associate_blkg_from_css(struct bio *bio,
 				 struct cgroup_subsys_state *css);
-void bio_disassociate_task(struct bio *bio);
 void bio_clone_blkg_association(struct bio *dst, struct bio *src);
 #else	/* CONFIG_BLK_CGROUP */
 static inline void bio_disassociate_blkg(struct bio *bio) { }
@@ -529,7 +528,6 @@ static inline void bio_associate_blkg(struct bio *bio) { }
 static inline void bio_associate_blkg_from_css(struct bio *bio,
 					       struct cgroup_subsys_state *css)
 { }
-static inline void bio_disassociate_task(struct bio *bio) { }
 static inline void bio_clone_blkg_association(struct bio *dst,
 					      struct bio *src) { }
 #endif	/* CONFIG_BLK_CGROUP */
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 11/13] blkcg: remove bio_disassociate_task()
  2018-11-26 21:19 ` [PATCH 11/13] blkcg: remove bio_disassociate_task() Dennis Zhou
@ 2018-11-29 15:54   ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2018-11-29 15:54 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:44PM -0500, Dennis Zhou wrote:
> Now that a bio only holds a blkg reference, so clean up is simply
> putting back that reference. Remove bio_disassociate_task() as it just
> calls bio_disassociate_blkg() and call the latter directly.
> 
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
-- 
tejun
^ permalink raw reply	[flat|nested] 29+ messages in thread 
 
- * [PATCH 12/13] blkcg: change blkg reference counting to use percpu_ref
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (10 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 11/13] blkcg: remove bio_disassociate_task() Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-26 21:19 ` [PATCH 13/13] blkcg: rename blkg_try_get() to blkg_tryget() Dennis Zhou
  2018-11-27 21:10 ` [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Josef Bacik
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
Every bio is now associated with a blkg putting blkg_get, blkg_try_get,
and blkg_put on the hot path. Switch over the refcnt in blkg to use
percpu_ref.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c         | 41 ++++++++++++++++++++++++++++++++++++--
 include/linux/blk-cgroup.h | 15 +++++---------
 2 files changed, 44 insertions(+), 12 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index dfd984bbed27..64ce424a78fd 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -81,6 +81,37 @@ static void blkg_free(struct blkcg_gq *blkg)
 	kfree(blkg);
 }
 
+static void __blkg_release(struct rcu_head *rcu)
+{
+	struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
+
+	percpu_ref_exit(&blkg->refcnt);
+
+	/* release the blkcg and parent blkg refs this blkg has been holding */
+	css_put(&blkg->blkcg->css);
+	if (blkg->parent)
+		blkg_put(blkg->parent);
+
+	wb_congested_put(blkg->wb_congested);
+
+	blkg_free(blkg);
+}
+
+/*
+ * A group is RCU protected, but having an rcu lock does not mean that one
+ * can access all the fields of blkg and assume these are valid.  For
+ * example, don't try to follow throtl_data and request queue links.
+ *
+ * Having a reference to blkg under an rcu allows accesses to only values
+ * local to groups like group stats and group rate limits.
+ */
+static void blkg_release(struct percpu_ref *ref)
+{
+	struct blkcg_gq *blkg = container_of(ref, struct blkcg_gq, refcnt);
+
+	call_rcu(&blkg->rcu_head, __blkg_release);
+}
+
 /**
  * blkg_alloc - allocate a blkg
  * @blkcg: block cgroup the new blkg is associated with
@@ -107,7 +138,6 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q,
 	blkg->q = q;
 	INIT_LIST_HEAD(&blkg->q_node);
 	blkg->blkcg = blkcg;
-	atomic_set(&blkg->refcnt, 1);
 
 	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
@@ -207,6 +237,11 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 		blkg_get(blkg->parent);
 	}
 
+	ret = percpu_ref_init(&blkg->refcnt, blkg_release, 0,
+			      GFP_NOWAIT | __GFP_NOWARN);
+	if (ret)
+		goto err_cancel_ref;
+
 	/* invoke per-policy init */
 	for (i = 0; i < BLKCG_MAX_POLS; i++) {
 		struct blkcg_policy *pol = blkcg_policy[i];
@@ -239,6 +274,8 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
 	blkg_put(blkg);
 	return ERR_PTR(ret);
 
+err_cancel_ref:
+	percpu_ref_exit(&blkg->refcnt);
 err_put_congested:
 	wb_congested_put(wb_congested);
 err_put_css:
@@ -369,7 +406,7 @@ static void blkg_destroy(struct blkcg_gq *blkg)
 	 * Put the reference taken at the time of creation so that when all
 	 * queues are gone, group can be destroyed.
 	 */
-	blkg_put(blkg);
+	percpu_ref_kill(&blkg->refcnt);
 }
 
 /**
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 6f3bb5e82a57..fc23bb758566 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -123,7 +123,7 @@ struct blkcg_gq {
 	struct blkcg_gq			*parent;
 
 	/* reference count */
-	atomic_t			refcnt;
+	struct percpu_ref		refcnt;
 
 	/* is this blkg online? protected by both blkcg and q locks */
 	bool				online;
@@ -486,8 +486,7 @@ static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen)
  */
 static inline void blkg_get(struct blkcg_gq *blkg)
 {
-	WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
-	atomic_inc(&blkg->refcnt);
+	percpu_ref_get(&blkg->refcnt);
 }
 
 /**
@@ -499,7 +498,7 @@ static inline void blkg_get(struct blkcg_gq *blkg)
  */
 static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
 {
-	if (atomic_inc_not_zero(&blkg->refcnt))
+	if (percpu_ref_tryget(&blkg->refcnt))
 		return blkg;
 	return NULL;
 }
@@ -513,23 +512,19 @@ static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
  */
 static inline struct blkcg_gq *blkg_try_get_closest(struct blkcg_gq *blkg)
 {
-	while (!atomic_inc_not_zero(&blkg->refcnt))
+	while (!percpu_ref_tryget(&blkg->refcnt))
 		blkg = blkg->parent;
 
 	return blkg;
 }
 
-void __blkg_release_rcu(struct rcu_head *rcu);
-
 /**
  * blkg_put - put a blkg reference
  * @blkg: blkg to put
  */
 static inline void blkg_put(struct blkcg_gq *blkg)
 {
-	WARN_ON_ONCE(atomic_read(&blkg->refcnt) <= 0);
-	if (atomic_dec_and_test(&blkg->refcnt))
-		call_rcu(&blkg->rcu_head, __blkg_release_rcu);
+	percpu_ref_put(&blkg->refcnt);
 }
 
 /**
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * [PATCH 13/13] blkcg: rename blkg_try_get() to blkg_tryget()
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (11 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 12/13] blkcg: change blkg reference counting to use percpu_ref Dennis Zhou
@ 2018-11-26 21:19 ` Dennis Zhou
  2018-11-27 21:10 ` [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Josef Bacik
  13 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-26 21:19 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik
  Cc: kernel-team, linux-block, cgroups, linux-kernel, Dennis Zhou
blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or %NULL.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 block/bio.c                |  2 +-
 block/blk-cgroup.c         |  3 +--
 block/blk-iolatency.c      |  2 +-
 include/linux/blk-cgroup.h | 12 +++++-------
 4 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 41cc2ead39f5..434a55bc310e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -2004,7 +2004,7 @@ static void __bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg)
 	if (bio->bi_blkg)
 		bio_disassociate_blkg(bio);
 
-	bio->bi_blkg = blkg_try_get_closest(blkg);
+	bio->bi_blkg = blkg_tryget_closest(blkg);
 }
 
 /**
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 64ce424a78fd..8fc588754861 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1738,8 +1738,7 @@ void blkcg_maybe_throttle_current(void)
 	blkg = blkg_lookup(blkcg, q);
 	if (!blkg)
 		goto out;
-	blkg = blkg_try_get(blkg);
-	if (!blkg)
+	if (!blkg_tryget(blkg))
 		goto out;
 	rcu_read_unlock();
 
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 5a79f06a730d..0b14c3d57769 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -698,7 +698,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
 		 * We could be exiting, don't access the pd unless we have a
 		 * ref on the blkg.
 		 */
-		if (!blkg_try_get(blkg))
+		if (!blkg_tryget(blkg))
 			continue;
 
 		iolat = blkg_to_lat(blkg);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index fc23bb758566..3741f5b78ffd 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -490,27 +490,25 @@ static inline void blkg_get(struct blkcg_gq *blkg)
 }
 
 /**
- * blkg_try_get - try and get a blkg reference
+ * blkg_tryget - try and get a blkg reference
  * @blkg: blkg to get
  *
  * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
  * of freeing this blkg, so we can only use it if the refcnt is not zero.
  */
-static inline struct blkcg_gq *blkg_try_get(struct blkcg_gq *blkg)
+static inline bool blkg_tryget(struct blkcg_gq *blkg)
 {
-	if (percpu_ref_tryget(&blkg->refcnt))
-		return blkg;
-	return NULL;
+	return percpu_ref_tryget(&blkg->refcnt);
 }
 
 /**
- * blkg_try_get_closest - try and get a blkg ref on the closet blkg
+ * blkg_tryget_closest - try and get a blkg ref on the closet blkg
  * @blkg: blkg to get
  *
  * This walks up the blkg tree to find the closest non-dying blkg and returns
  * the blkg that it did association with as it may not be the passed in blkg.
  */
-static inline struct blkcg_gq *blkg_try_get_closest(struct blkcg_gq *blkg)
+static inline struct blkcg_gq *blkg_tryget_closest(struct blkcg_gq *blkg)
 {
 	while (!percpu_ref_tryget(&blkg->refcnt))
 		blkg = blkg->parent;
-- 
2.17.1
^ permalink raw reply related	[flat|nested] 29+ messages in thread
- * Re: [PATCH 00/13 v4] block: always associate blkg and refcount cleanup
  2018-11-26 21:19 [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Dennis Zhou
                   ` (12 preceding siblings ...)
  2018-11-26 21:19 ` [PATCH 13/13] blkcg: rename blkg_try_get() to blkg_tryget() Dennis Zhou
@ 2018-11-27 21:10 ` Josef Bacik
  2018-11-27 22:15   ` Dennis Zhou
  13 siblings, 1 reply; 29+ messages in thread
From: Josef Bacik @ 2018-11-27 21:10 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Jens Axboe, Tejun Heo, Johannes Weiner, Josef Bacik, kernel-team,
	linux-block, cgroups, linux-kernel
On Mon, Nov 26, 2018 at 04:19:33PM -0500, Dennis Zhou wrote:
> Hi everyone,
> 
> This is respin of v3 [1] with fixes for the errors reported in [2] and
> [3]. v3 was reverted in [4].
> 
> The issue in [3] was that bio->bi_disk->queue and blkg->q were out
> of sync. So when I changed blk_get_rl() to use blkg->q, the wrong queue
> was returned and elevator from q->elevator->type threw a NPE. Note, with
> v4.21, the old block stack was removed and so this patch was dropped. I
> did backport this to v4.20 and verified this series does not encounter
> the error.
> 
> The biggest changes in v4 are when association occurs and clearly
> defining the cases where association should happen.
>   1. Association is now done when the device is set to keep blkg->q and
>      bio->bi_disk->queue in sync.
>   2. When a bio is submitted directly to the device, it will not be
>      associated with a blkg. This is because a blkg represents the
>      relationship between a blkcg and a request_queue. Going directly to
>      the device means the request_queue may not exist meaning no blkg
>      will exist.
> 
> The patch updating blk_get_rl() was dropped (v3 10/12). The patch to
> always associate a blkg from v3 (v3 04/12) was fixed and split into
> patches 0004 and 0005. 0011 is new removing bio_disassociate_task().
> 
> Summarizing the ideas of this series:
>   1. Gracefully handle blkg failure to create by walking up the blkg
>      tree rather than fall through to root.
>   2. Associate a bio with a blkg in core logic rather than per
>      controller logic.
>   3. Rather than have a css and blkg reference, hold just a blkg ref
>      as it also holds a css ref.
>   4. Switch to percpu ref counting for blkg.
> 
Hmm so reading through this series it's made me realize that iolatency is sort
of broken.  It relies on knowing if it needs to do something with the bio if
there is a blkg associated with it.  Before this patchset there wouldn't be a
blkg on the bio unless it was specifically associated.  I'm going to need to
figure out a different way to tag bio's to indicate that blk-iolatency should
care about it.  Probably add a bio flag or something.  Thanks,
Josef
^ permalink raw reply	[flat|nested] 29+ messages in thread
- * Re: [PATCH 00/13 v4] block: always associate blkg and refcount cleanup
  2018-11-27 21:10 ` [PATCH 00/13 v4] block: always associate blkg and refcount cleanup Josef Bacik
@ 2018-11-27 22:15   ` Dennis Zhou
  0 siblings, 0 replies; 29+ messages in thread
From: Dennis Zhou @ 2018-11-27 22:15 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Dennis Zhou, Jens Axboe, Tejun Heo, Johannes Weiner, kernel-team,
	linux-block, cgroups, linux-kernel
On Tue, Nov 27, 2018 at 04:10:01PM -0500, Josef Bacik wrote:
> On Mon, Nov 26, 2018 at 04:19:33PM -0500, Dennis Zhou wrote:
> > Hi everyone,
> > 
> > This is respin of v3 [1] with fixes for the errors reported in [2] and
> > [3]. v3 was reverted in [4].
> > 
> > The issue in [3] was that bio->bi_disk->queue and blkg->q were out
> > of sync. So when I changed blk_get_rl() to use blkg->q, the wrong queue
> > was returned and elevator from q->elevator->type threw a NPE. Note, with
> > v4.21, the old block stack was removed and so this patch was dropped. I
> > did backport this to v4.20 and verified this series does not encounter
> > the error.
> > 
> > The biggest changes in v4 are when association occurs and clearly
> > defining the cases where association should happen.
> >   1. Association is now done when the device is set to keep blkg->q and
> >      bio->bi_disk->queue in sync.
> >   2. When a bio is submitted directly to the device, it will not be
> >      associated with a blkg. This is because a blkg represents the
> >      relationship between a blkcg and a request_queue. Going directly to
> >      the device means the request_queue may not exist meaning no blkg
> >      will exist.
> > 
> > The patch updating blk_get_rl() was dropped (v3 10/12). The patch to
> > always associate a blkg from v3 (v3 04/12) was fixed and split into
> > patches 0004 and 0005. 0011 is new removing bio_disassociate_task().
> > 
> > Summarizing the ideas of this series:
> >   1. Gracefully handle blkg failure to create by walking up the blkg
> >      tree rather than fall through to root.
> >   2. Associate a bio with a blkg in core logic rather than per
> >      controller logic.
> >   3. Rather than have a css and blkg reference, hold just a blkg ref
> >      as it also holds a css ref.
> >   4. Switch to percpu ref counting for blkg.
> > 
> 
> Hmm so reading through this series it's made me realize that iolatency is sort
> of broken.  It relies on knowing if it needs to do something with the bio if
> there is a blkg associated with it.  Before this patchset there wouldn't be a
I don't think there is anything wrong with blk-iolatency. blk-iolatency
piggybacks off of the rq_qos hooks which when throttle gets called means
submit has happened on the bio. As a byproduct, all bios that
blk-iolatency sees has a blkcg associated with it from
blkcg_bio_issue_check(). This lets blk-iolatency associate a blkg in the
throttle hook - blkcg_iolatency_throttle().
Order of functions called:
  generic_make_request()
    generic_make_request_checks() <- associates blkcg
    make_request_fn() (eventually blk_mq_make_request)
      rq_qos_throttle()
        blkcg_iolatency_throttle() <- associate blkg based on blkcg
blk-throttle is another story, it kind of does association really high
up, but lets the block layer manage the request_queue and attribute it
all to the first blkg seen. I believe this is just the top level blkg
(same css, first request_queue). Disclaimer, I haven't read through much
of the blk-throttle code.
> blkg on the bio unless it was specifically associated.  I'm going to need to
> figure out a different way to tag bio's to indicate that blk-iolatency should
> care about it.  Probably add a bio flag or something.  Thanks,
I think the interaction gets a little confusing with stacked block
devices, but regardless, shouldn't blk-iolatency care about every bio
that comes through anyway? I think this would just translate to
throttling at each request_queue and not just the entrance queue.
If a bio has a device with no request_queue, I don't think it will ever
reach blk-iolatency because the bio goes straight to device and a
requirement to get to call rq_qos_throttle is to have a request_queue.
Thanks,
Dennis
^ permalink raw reply	[flat|nested] 29+ messages in thread