From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Josef Bacik <josef-DigfWCa+lFGyeJad7bwFQA@public.gmane.org>,
Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
"Michal Koutný" <mkoutny-IBi9RG/b67k@public.gmane.org>,
"Dennis Zhou (Facebook)"
<dennisszhou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
"Waiman Long" <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: [PATCH-block v3 2/2] blk-cgroup: Flush stats at blkgs destruction path
Date: Tue, 13 Dec 2022 13:44:46 -0500 [thread overview]
Message-ID: <20221213184446.50181-3-longman@redhat.com> (raw)
In-Reply-To: <20221213184446.50181-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.
To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.
Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
block/blk-cgroup.c | 15 +++++++++++++++
include/linux/cgroup.h | 1 +
kernel/cgroup/rstat.c | 18 ++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ca28306aa1b1..ddd27a714d3e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,6 +1084,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
*/
static void blkcg_destroy_blkgs(struct blkcg *blkcg)
{
+ int cpu;
+
/*
* blkcg_destroy_blkgs() shouldn't be called with all the blkcg
* references gone.
@@ -1093,6 +1095,19 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg)
might_sleep();
+ /*
+ * Flush all the non-empty percpu lockless lists so as to release
+ * the blkg references held by those lists which, in turn, may
+ * allow the blkgs to be freed and release their references to
+ * blkcg speeding up its freeing.
+ */
+ for_each_possible_cpu(cpu) {
+ struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+ if (!llist_empty(lhead))
+ cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
+ }
+
spin_lock_irq(&blkcg->lock);
while (!hlist_empty(&blkcg->blkg_list)) {
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..6c4e66b3fa84 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
/*
* Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..2e44be44351f 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
spin_unlock_irq(&cgroup_rstat_lock);
}
+/**
+ * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
+{
+ raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+ raw_spin_lock_irq(cpu_lock);
+ css->ss->css_rstat_flush(css, cpu);
+ raw_spin_unlock_irq(cpu_lock);
+}
+
int cgroup_rstat_init(struct cgroup *cgrp)
{
int cpu;
--
2.31.1
WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <longman@redhat.com>
To: Jens Axboe <axboe@kernel.dk>, Tejun Heo <tj@kernel.org>,
Josef Bacik <josef@toxicpanda.com>,
Zefan Li <lizefan.x@bytedance.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>
Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
"Michal Koutný" <mkoutny@suse.com>,
"Dennis Zhou (Facebook)" <dennisszhou@gmail.com>,
"Waiman Long" <longman@redhat.com>
Subject: [PATCH-block v3 2/2] blk-cgroup: Flush stats at blkgs destruction path
Date: Tue, 13 Dec 2022 13:44:46 -0500 [thread overview]
Message-ID: <20221213184446.50181-3-longman@redhat.com> (raw)
In-Reply-To: <20221213184446.50181-1-longman@redhat.com>
As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.
To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
block/blk-cgroup.c | 15 +++++++++++++++
include/linux/cgroup.h | 1 +
kernel/cgroup/rstat.c | 18 ++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ca28306aa1b1..ddd27a714d3e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,6 +1084,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
*/
static void blkcg_destroy_blkgs(struct blkcg *blkcg)
{
+ int cpu;
+
/*
* blkcg_destroy_blkgs() shouldn't be called with all the blkcg
* references gone.
@@ -1093,6 +1095,19 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg)
might_sleep();
+ /*
+ * Flush all the non-empty percpu lockless lists so as to release
+ * the blkg references held by those lists which, in turn, may
+ * allow the blkgs to be freed and release their references to
+ * blkcg speeding up its freeing.
+ */
+ for_each_possible_cpu(cpu) {
+ struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+ if (!llist_empty(lhead))
+ cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
+ }
+
spin_lock_irq(&blkcg->lock);
while (!hlist_empty(&blkcg->blkg_list)) {
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..6c4e66b3fa84 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
/*
* Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..2e44be44351f 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
spin_unlock_irq(&cgroup_rstat_lock);
}
+/**
+ * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
+{
+ raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+ raw_spin_lock_irq(cpu_lock);
+ css->ss->css_rstat_flush(css, cpu);
+ raw_spin_unlock_irq(cpu_lock);
+}
+
int cgroup_rstat_init(struct cgroup *cgrp)
{
int cpu;
--
2.31.1
next prev parent reply other threads:[~2022-12-13 18:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-13 18:44 [PATCH-block v3 0/2] blk-cgroup: Fix potential UAF & flush rstat at blkgs destruction path Waiman Long
2022-12-13 18:44 ` Waiman Long
2022-12-13 18:44 ` [PATCH-block v3 1/2] bdi, blk-cgroup: Fix potential UAF of blkcg Waiman Long
2022-12-13 18:44 ` Waiman Long
[not found] ` <20221213184446.50181-2-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-12-13 19:29 ` Tejun Heo
2022-12-13 19:29 ` Tejun Heo
[not found] ` <Y5jSllwwBdmQ1jQz-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-12-13 19:53 ` Waiman Long
2022-12-13 19:53 ` Waiman Long
[not found] ` <34a8c4a7-a58d-63fc-4599-accf1cbb6aae-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-12-14 16:54 ` Jens Axboe
2022-12-14 16:54 ` Jens Axboe
[not found] ` <5fbaea42-14a7-27a8-cea1-3a59161ceba0-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2022-12-14 16:55 ` Waiman Long
2022-12-14 16:55 ` Waiman Long
[not found] ` <20221213184446.50181-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-12-13 18:44 ` Waiman Long [this message]
2022-12-13 18:44 ` [PATCH-block v3 2/2] blk-cgroup: Flush stats at blkgs destruction path Waiman Long
2022-12-13 19:30 ` Tejun Heo
2022-12-14 1:58 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221213184446.50181-3-longman@redhat.com \
--to=longman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=dennisszhou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=josef-DigfWCa+lFGyeJad7bwFQA@public.gmane.org \
--cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=mkoutny-IBi9RG/b67k@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.