* [PATCH-block v2 0/3] blk-cgroup: Fix potential UAF & miscellaneous cleanup
@ 2022-12-11 22:20 Waiman Long
2022-12-11 22:20 ` [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg Waiman Long
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Waiman Long @ 2022-12-11 22:20 UTC (permalink / raw)
To: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner,
Andrew Morton
Cc: cgroups, linux-block, linux-kernel, linux-mm, Michal Koutný,
Dennis Zhou (Facebook), Waiman Long
v2:
- Remove unnecessary rcu_read_{lock|unlock} from
cgroup_rstat_css_cpu_flush() in patch 3.
It was found that blkcg_destroy_blkgs() may be called with all blkcg
references gone. This may potentially cause user-after-free and so
should be fixed. The last 2 patches are miscellaneous cleanups of
commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()").
Waiman Long (3):
bdi, blk-cgroup: Fix potential UAF of blkcg
blk-cgroup: Don't flush a blkg if destroyed
blk-cgroup: Flush stats at blkgs destruction path
block/blk-cgroup.c | 26 ++++++++++++++++++++++++++
include/linux/cgroup.h | 1 +
kernel/cgroup/rstat.c | 18 ++++++++++++++++++
mm/backing-dev.c | 8 ++++++--
4 files changed, 51 insertions(+), 2 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg 2022-12-11 22:20 [PATCH-block v2 0/3] blk-cgroup: Fix potential UAF & miscellaneous cleanup Waiman Long @ 2022-12-11 22:20 ` Waiman Long 2022-12-12 22:13 ` Tejun Heo [not found] ` <20221211222058.2946830-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2022-12-11 22:20 ` [PATCH-block v2 3/3] blk-cgroup: Flush stats at blkgs destruction path Waiman Long 2 siblings, 1 reply; 12+ messages in thread From: Waiman Long @ 2022-12-11 22:20 UTC (permalink / raw) To: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton Cc: cgroups, linux-block, linux-kernel, linux-mm, Michal Koutný, Dennis Zhou (Facebook), Waiman Long, Yi Zhang Commit 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished") delayed call to blkcg_destroy_blkgs() to cgwb_release_workfn(). However, it is done after a css_put() of blkcg which may be the final put that causes the blkcg to be freed as RCU read lock isn't held. Another place where blkcg_destroy_blkgs() can be called indirectly via blkcg_unpin_online() is from the offline_css() function called from css_killed_work_fn(). Over there, the potentially final css_put() call is issued after offline_css(). By adding a css_tryget() into blkcg_destroy_blkgs() and warning its failure, the following stack trace was produced in a test system on bootup. [ 34.254240] RIP: 0010:blkcg_destroy_blkgs+0x16a/0x1a0 : [ 34.339943] Call Trace: [ 34.344510] blkcg_unpin_online+0x38/0x60 [ 34.348523] cgwb_release_workfn+0x6a/0x200 [ 34.352708] process_one_work+0x1e5/0x3b0 [ 34.360758] worker_thread+0x50/0x3a0 [ 34.368447] kthread+0xd9/0x100 [ 34.376386] ret_from_fork+0x22/0x30 This confirms that a potential UAF situation can really happen in cgwb_release_workfn(). Fix that by delaying the css_put() until after the blkcg_unpin_online() call. Also use css_tryget() in blkcg_destroy_blkgs() and issue a warning if css_tryget() fails. The reproducing system can no longer produce a warning with this patch. All the runnable block/0* tests including block/027 were run successfully without failure. Fixes: 59b57717fff8 ("blkcg: delay blkg destruction until after writeback has finished") Suggested-by: Michal Koutn√Ω <mkoutny@suse.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Waiman Long <longman@redhat.com> --- block/blk-cgroup.c | 8 ++++++++ mm/backing-dev.c | 8 ++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 1bb939d3b793..21cc88349f21 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1084,6 +1084,13 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css) */ static void blkcg_destroy_blkgs(struct blkcg *blkcg) { + /* + * blkcg_destroy_blkgs() shouldn't be called with all the blkcg + * references gone. + */ + if (WARN_ON_ONCE(!css_tryget(&blkcg->css))) + return; + might_sleep(); spin_lock_irq(&blkcg->lock); @@ -1110,6 +1117,7 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg) } spin_unlock_irq(&blkcg->lock); + css_put(&blkcg->css); } /** diff --git a/mm/backing-dev.c b/mm/backing-dev.c index c30419a5e119..36f75b072325 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -390,11 +390,15 @@ static void cgwb_release_workfn(struct work_struct *work) wb_shutdown(wb); css_put(wb->memcg_css); - css_put(wb->blkcg_css); mutex_unlock(&wb->bdi->cgwb_release_mutex); - /* triggers blkg destruction if no online users left */ + /* + * Triggers blkg destruction if no online users left + * The final blkcg css_put() has to be done after blkcg_unpin_online() + * to avoid use-after-free. + */ blkcg_unpin_online(wb->blkcg_css); + css_put(wb->blkcg_css); fprop_local_destroy_percpu(&wb->memcg_completions); -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg 2022-12-11 22:20 ` [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg Waiman Long @ 2022-12-12 22:13 ` Tejun Heo [not found] ` <Y5enmzQM7BIiEv9n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2022-12-12 22:13 UTC (permalink / raw) To: Waiman Long Cc: Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups, linux-block, linux-kernel, linux-mm, Michal Koutný, Dennis Zhou (Facebook), Yi Zhang On Sun, Dec 11, 2022 at 05:20:56PM -0500, Waiman Long wrote: > static void blkcg_destroy_blkgs(struct blkcg *blkcg) > { > + /* > + * blkcg_destroy_blkgs() shouldn't be called with all the blkcg > + * references gone. > + */ > + if (WARN_ON_ONCE(!css_tryget(&blkcg->css))) > + return; Wouldn't it make more sense to use percpu_ref_is_zero()? It's not like the obtained extra reference does anything, right? Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <Y5enmzQM7BIiEv9n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg [not found] ` <Y5enmzQM7BIiEv9n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> @ 2022-12-12 22:16 ` Waiman Long 0 siblings, 0 replies; 12+ messages in thread From: Waiman Long @ 2022-12-12 22:16 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-block-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Koutný, Dennis Zhou (Facebook), Yi Zhang On 12/12/22 17:13, Tejun Heo wrote: > On Sun, Dec 11, 2022 at 05:20:56PM -0500, Waiman Long wrote: >> static void blkcg_destroy_blkgs(struct blkcg *blkcg) >> { >> + /* >> + * blkcg_destroy_blkgs() shouldn't be called with all the blkcg >> + * references gone. >> + */ >> + if (WARN_ON_ONCE(!css_tryget(&blkcg->css))) >> + return; > Wouldn't it make more sense to use percpu_ref_is_zero()? It's not like the > obtained extra reference does anything, right? Yes, that makes sense. Will incorporate the change in the next version. Thanks, Longman ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20221211222058.2946830-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed [not found] ` <20221211222058.2946830-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2022-12-11 22:20 ` Waiman Long 2022-12-12 12:59 ` Michal Koutný 0 siblings, 1 reply; 12+ messages in thread From: Waiman Long @ 2022-12-11 22:20 UTC (permalink / raw) To: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-block-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Koutný, Dennis Zhou (Facebook), Waiman Long Before commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"), blkg's stats is only flushed if they are online. In addition, the stat flushing of blkgs in blkcg_rstat_flush() includes propagating the rstat data to its parent. However, if a blkg has been destroyed (offline), the validity of its parent may be questionable. For safety, revert back to the old behavior by ignoring offline blkg's. Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- block/blk-cgroup.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 21cc88349f21..c466aef0d467 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -885,6 +885,12 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) WRITE_ONCE(bisc->lqueued, false); + /* Don't flush its stats if blkg is offline */ + if (unlikely(!blkg->online)) { + percpu_ref_put(&blkg->refcnt); + continue; + } + /* fetch the current per-cpu values */ do { seq = u64_stats_fetch_begin(&bisc->sync); -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed 2022-12-11 22:20 ` [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed Waiman Long @ 2022-12-12 12:59 ` Michal Koutný 2022-12-12 14:58 ` Waiman Long 2022-12-12 22:16 ` Tejun Heo 0 siblings, 2 replies; 12+ messages in thread From: Michal Koutný @ 2022-12-12 12:59 UTC (permalink / raw) To: Waiman Long Cc: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups, linux-block, linux-kernel, linux-mm, Dennis Zhou (Facebook) [-- Attachment #1: Type: text/plain, Size: 1262 bytes --] Hello. On Sun, Dec 11, 2022 at 05:20:57PM -0500, Waiman Long <longman@redhat.com> wrote: > Before commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"), > blkg's stats is only flushed if they are online. I'm not sure I follow -- css_release_work_fn/cgroup_rstat_flush may be called on an offlined blkcg (offlined!=released). There's no invariant ensuring offlined blkcg won't be flushed. (There is only current situation when there is no reader of io data that'd need them flushed [1].) > In addition, the stat flushing of blkgs in blkcg_rstat_flush() > includes propagating the rstat data to its parent. However, if a blkg > has been destroyed (offline), the validity of its parent may be > questionable. Parents won't be freed (neither offlined) before children (see css_killed_work_fn). It should be regularly OK to pass data into a parent of an offlined blkcg. > For safety, revert back to the old behavior by ignoring offline > blkg's. I don't know if this is a good reasoning. If you argue that offlined children needn't be taken into parent's account, then I think it's more efficient to exclude the offlined blkcgs from update. (With the caveat I have in [1].) Regards, Michal [1] https://lore.kernel.org/r/YqEfNgUc8jxlAq8D@blackbook/ [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed 2022-12-12 12:59 ` Michal Koutný @ 2022-12-12 14:58 ` Waiman Long 2022-12-12 22:16 ` Tejun Heo 1 sibling, 0 replies; 12+ messages in thread From: Waiman Long @ 2022-12-12 14:58 UTC (permalink / raw) To: Michal Koutný Cc: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups, linux-block, linux-kernel, linux-mm, Dennis Zhou (Facebook) On 12/12/22 07:59, Michal Koutný wrote: > Hello. > > On Sun, Dec 11, 2022 at 05:20:57PM -0500, Waiman Long <longman@redhat.com> wrote: >> Before commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"), >> blkg's stats is only flushed if they are online. > I'm not sure I follow -- css_release_work_fn/cgroup_rstat_flush may be > called on an offlined blkcg (offlined!=released). There's no invariant > ensuring offlined blkcg won't be flushed. (There is only current > situation when there is no reader of io data that'd need them flushed > [1].) The original cgroup_rstat_flush() iterates the list of blkg's in the blkg_list. A blkg will be removed from the list when it is offlined. This patch just reverts its behavior to that previous behavior. > >> In addition, the stat flushing of blkgs in blkcg_rstat_flush() >> includes propagating the rstat data to its parent. However, if a blkg >> has been destroyed (offline), the validity of its parent may be >> questionable. > Parents won't be freed (neither offlined) before children (see > css_killed_work_fn). It should be regularly OK to pass data into a > parent of an offlined blkcg. I guess it is likely to be safe to flush an offline blkg. I am just being conservative in case there is any corner case where it may be a problem which I haven't foreseen. > >> For safety, revert back to the old behavior by ignoring offline >> blkg's. > I don't know if this is a good reasoning. If you argue that offlined > children needn't be taken into parent's account, then I think it's more > efficient to exclude the offlined blkcgs from update. (With the caveat I > have in [1].) It is possible that a blkg may be updated before it becomes offline, but the flush isn't done in time before that happens. The next patch will catch some of that. Cheers, Longman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed 2022-12-12 12:59 ` Michal Koutný 2022-12-12 14:58 ` Waiman Long @ 2022-12-12 22:16 ` Tejun Heo [not found] ` <Y5eoQ8UBN8Xpc+Wp-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 1 sibling, 1 reply; 12+ messages in thread From: Tejun Heo @ 2022-12-12 22:16 UTC (permalink / raw) To: Michal Koutný Cc: Waiman Long, Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups, linux-block, linux-kernel, linux-mm, Dennis Zhou (Facebook) On Mon, Dec 12, 2022 at 01:59:53PM +0100, Michal Koutný wrote: > Hello. > > On Sun, Dec 11, 2022 at 05:20:57PM -0500, Waiman Long <longman@redhat.com> wrote: > > Before commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"), > > blkg's stats is only flushed if they are online. > > I'm not sure I follow -- css_release_work_fn/cgroup_rstat_flush may be > called on an offlined blkcg (offlined!=released). There's no invariant > ensuring offlined blkcg won't be flushed. (There is only current > situation when there is no reader of io data that'd need them flushed > [1].) > > > In addition, the stat flushing of blkgs in blkcg_rstat_flush() > > includes propagating the rstat data to its parent. However, if a blkg > > has been destroyed (offline), the validity of its parent may be > > questionable. > > Parents won't be freed (neither offlined) before children (see > css_killed_work_fn). It should be regularly OK to pass data into a > parent of an offlined blkcg. > > > For safety, revert back to the old behavior by ignoring offline > > blkg's. > > I don't know if this is a good reasoning. If you argue that offlined > children needn't be taken into parent's account, then I think it's more > efficient to exclude the offlined blkcgs from update. (With the caveat I > have in [1].) Yeah, I'm not sure about this patch either. Doesn't seem necessary. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <Y5eoQ8UBN8Xpc+Wp-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]
* Re: [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed [not found] ` <Y5eoQ8UBN8Xpc+Wp-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> @ 2022-12-13 0:21 ` Waiman Long 0 siblings, 0 replies; 12+ messages in thread From: Waiman Long @ 2022-12-13 0:21 UTC (permalink / raw) To: Tejun Heo, Michal Koutný Cc: Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-block-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Dennis Zhou (Facebook) On 12/12/22 17:16, Tejun Heo wrote: > On Mon, Dec 12, 2022 at 01:59:53PM +0100, Michal Koutný wrote: >> Hello. >> >> On Sun, Dec 11, 2022 at 05:20:57PM -0500, Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >>> Before commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"), >>> blkg's stats is only flushed if they are online. >> I'm not sure I follow -- css_release_work_fn/cgroup_rstat_flush may be >> called on an offlined blkcg (offlined!=released). There's no invariant >> ensuring offlined blkcg won't be flushed. (There is only current >> situation when there is no reader of io data that'd need them flushed >> [1].) >> >>> In addition, the stat flushing of blkgs in blkcg_rstat_flush() >>> includes propagating the rstat data to its parent. However, if a blkg >>> has been destroyed (offline), the validity of its parent may be >>> questionable. >> Parents won't be freed (neither offlined) before children (see >> css_killed_work_fn). It should be regularly OK to pass data into a >> parent of an offlined blkcg. >> >>> For safety, revert back to the old behavior by ignoring offline >>> blkg's. >> I don't know if this is a good reasoning. If you argue that offlined >> children needn't be taken into parent's account, then I think it's more >> efficient to exclude the offlined blkcgs from update. (With the caveat I >> have in [1].) > Yeah, I'm not sure about this patch either. Doesn't seem necessary. I wrote this patch because I am not totally sure it is safe to flush offline blkgs. Since both you and Michal don't see any problem with it. I am fine taking it out. Cheers, Longman ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH-block v2 3/3] blk-cgroup: Flush stats at blkgs destruction path 2022-12-11 22:20 [PATCH-block v2 0/3] blk-cgroup: Fix potential UAF & miscellaneous cleanup Waiman Long 2022-12-11 22:20 ` [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg Waiman Long [not found] ` <20221211222058.2946830-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2022-12-11 22:20 ` Waiman Long [not found] ` <20221211222058.2946830-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 12+ messages in thread From: Waiman Long @ 2022-12-11 22:20 UTC (permalink / raw) To: Jens Axboe, Tejun Heo, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton Cc: cgroups, linux-block, linux-kernel, linux-mm, Michal Koutný, Dennis Zhou (Facebook), Waiman Long As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg from being freed until some other events cause cgroup_rstat_flush() to be called to flush out the pending blkcg stats. To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() function to flush stats for a given css and cpu and call it at the blkgs destruction path, blkcg_destroy_blkgs(), whenever there are still some pending stats to be flushed. This will ensure that blkcg reference count can reach 0 ASAP. Signed-off-by: Waiman Long <longman@redhat.com> --- block/blk-cgroup.c | 12 ++++++++++++ include/linux/cgroup.h | 1 + kernel/cgroup/rstat.c | 18 ++++++++++++++++++ 3 files changed, 31 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index c466aef0d467..534f3baeb84a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1090,6 +1090,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css) */ static void blkcg_destroy_blkgs(struct blkcg *blkcg) { + int cpu; + /* * blkcg_destroy_blkgs() shouldn't be called with all the blkcg * references gone. @@ -1099,6 +1101,16 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg) might_sleep(); + /* + * Flush all the non-empty percpu lockless lists. + */ + for_each_possible_cpu(cpu) { + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); + + if (!llist_empty(lhead)) + cgroup_rstat_css_cpu_flush(&blkcg->css, cpu); + } + spin_lock_irq(&blkcg->lock); while (!hlist_empty(&blkcg->blkg_list)) { diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528bd44b59e2..6c4e66b3fa84 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp); void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu); /* * Basic resource stats. diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 793ecff29038..2e44be44351f 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void) spin_unlock_irq(&cgroup_rstat_lock); } +/** + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu + * @css: target css to be flush + * @cpu: the cpu that holds the stats to be flush + * + * A lightweight rstat flush operation for a given css and cpu. + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock + * isn't used. + */ +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu) +{ + raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + + raw_spin_lock_irq(cpu_lock); + css->ss->css_rstat_flush(css, cpu); + raw_spin_unlock_irq(cpu_lock); +} + int cgroup_rstat_init(struct cgroup *cgrp) { int cpu; -- 2.31.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <20221211222058.2946830-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH-block v2 3/3] blk-cgroup: Flush stats at blkgs destruction path [not found] ` <20221211222058.2946830-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2022-12-12 22:24 ` Tejun Heo 2022-12-13 0:19 ` Waiman Long 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2022-12-12 22:24 UTC (permalink / raw) To: Waiman Long Cc: Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-block-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Koutný, Dennis Zhou (Facebook) On Sun, Dec 11, 2022 at 05:20:58PM -0500, Waiman Long wrote: > As noted by Michal, the blkg_iostat_set's in the lockless list > hold reference to blkg's to protect against their removal. Those > blkg's hold reference to blkcg. When a cgroup is being destroyed, > cgroup_rstat_flush() is only called at css_release_work_fn() which is > called when the blkcg reference count reaches 0. This circular dependency > will prevent blkcg from being freed until some other events cause > cgroup_rstat_flush() to be called to flush out the pending blkcg stats. > > To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() > function to flush stats for a given css and cpu and call it at the blkgs > destruction path, blkcg_destroy_blkgs(), whenever there are still some > pending stats to be flushed. This will ensure that blkcg reference > count can reach 0 ASAP. > > Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> But a nit below > + /* > + * Flush all the non-empty percpu lockless lists. > + */ Can you please explain the deadlock that's being avoided in the above comment? ie. it should say why this flush is necessary. > + for_each_possible_cpu(cpu) { > + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); > + > + if (!llist_empty(lhead)) > + cgroup_rstat_css_cpu_flush(&blkcg->css, cpu); > + } Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH-block v2 3/3] blk-cgroup: Flush stats at blkgs destruction path 2022-12-12 22:24 ` Tejun Heo @ 2022-12-13 0:19 ` Waiman Long 0 siblings, 0 replies; 12+ messages in thread From: Waiman Long @ 2022-12-13 0:19 UTC (permalink / raw) To: Tejun Heo Cc: Jens Axboe, Josef Bacik, Zefan Li, Johannes Weiner, Andrew Morton, cgroups, linux-block, linux-kernel, linux-mm, Michal Koutný, Dennis Zhou (Facebook) On 12/12/22 17:24, Tejun Heo wrote: > On Sun, Dec 11, 2022 at 05:20:58PM -0500, Waiman Long wrote: >> As noted by Michal, the blkg_iostat_set's in the lockless list >> hold reference to blkg's to protect against their removal. Those >> blkg's hold reference to blkcg. When a cgroup is being destroyed, >> cgroup_rstat_flush() is only called at css_release_work_fn() which is >> called when the blkcg reference count reaches 0. This circular dependency >> will prevent blkcg from being freed until some other events cause >> cgroup_rstat_flush() to be called to flush out the pending blkcg stats. >> >> To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() >> function to flush stats for a given css and cpu and call it at the blkgs >> destruction path, blkcg_destroy_blkgs(), whenever there are still some >> pending stats to be flushed. This will ensure that blkcg reference >> count can reach 0 ASAP. >> >> Signed-off-by: Waiman Long <longman@redhat.com> > Acked-by: Tejun Heo <tj@kernel.org> > > But a nit below > >> + /* >> + * Flush all the non-empty percpu lockless lists. >> + */ > Can you please explain the deadlock that's being avoided in the above > comment? ie. it should say why this flush is necessary. Sure. I will expand the comment to elaborate a bit more. Cheers, Longman > >> + for_each_possible_cpu(cpu) { >> + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); >> + >> + if (!llist_empty(lhead)) >> + cgroup_rstat_css_cpu_flush(&blkcg->css, cpu); >> + } ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-12-13 0:21 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-11 22:20 [PATCH-block v2 0/3] blk-cgroup: Fix potential UAF & miscellaneous cleanup Waiman Long
2022-12-11 22:20 ` [PATCH-block v2 1/3] bdi, blk-cgroup: Fix potential UAF of blkcg Waiman Long
2022-12-12 22:13 ` Tejun Heo
[not found] ` <Y5enmzQM7BIiEv9n-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-12-12 22:16 ` Waiman Long
[not found] ` <20221211222058.2946830-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-12-11 22:20 ` [PATCH-block v2 2/3] blk-cgroup: Don't flush a blkg if destroyed Waiman Long
2022-12-12 12:59 ` Michal Koutný
2022-12-12 14:58 ` Waiman Long
2022-12-12 22:16 ` Tejun Heo
[not found] ` <Y5eoQ8UBN8Xpc+Wp-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-12-13 0:21 ` Waiman Long
2022-12-11 22:20 ` [PATCH-block v2 3/3] blk-cgroup: Flush stats at blkgs destruction path Waiman Long
[not found] ` <20221211222058.2946830-4-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2022-12-12 22:24 ` Tejun Heo
2022-12-13 0:19 ` Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox