* [Bug] race condition at rebind_subsystems()
@ 2022-06-30 10:52 Jing-Ting Wu
2022-07-15 11:59 ` Michal Koutný
0 siblings, 1 reply; 7+ messages in thread
From: Jing-Ting Wu @ 2022-06-30 10:52 UTC (permalink / raw)
To: Johannes Weiner
Cc: Tejun Heo, Zefan Li, Matthias Brugger, cgroups, linux-kernel,
linux-arm-kernel, linux-mediatek, Shakeel Butt, wsd_upstream,
lixiong.liu, wenju.xu, jonathan.jmchen
Hi Johannes
We find the KE(kernel exception) happened when test the reboot test
case in T SW version with kernel-5.15.
The issue is unable to handle kernel paging request at virtual address.
Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().
Use the wrong css to get css->ss->css_rstat_flush should get a wrong
address.
The call stack as following:
kthread
-> worker_thread
-> process_one_work
-> flush_memcg_stats_dwork
-> __mem_cgroup_flush_stats
-> cgroup_rstat_flush_irqsafe
-> cgroup_rstat_flush_locked
Detail:
static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool
may_sleep)
{
...
rcu_read_lock();
list_for_each_entry_rcu(css, &pos->rstat_css_list,
rstat_css_node)
css->ss->css_rstat_flush(css, cpu);
rcu_read_unlock();
...
}
Because the content of css->ss is not a function address,
once the css_rstat_flush is called, kernel exception will happen.
When the issue happened, the list of pos->rstat_css_list at group A is
empty.
There must be racing conditions.
From reviewing code, we find rebind_subsystems() is no lock held when
move css object to other list.
int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
{
...
if (ss->css_rstat_flush) {
list_del_rcu(&css->rstat_css_node);
list_add_rcu(&css->rstat_css_node,
&dcgrp->rstat_css_list);
}
...
}
If we held a css object from A group list, at same time this css object
was moved to B group list.
Because current pos is in B’s list, link list was link the pos->next to
B’s head, so the pos->member will never equal A’s head, then the B’s
head(cgroup_root->cgroup->rstat_css_list) will be treated as css
node(css->rstat_css_node).
list_for_each_entry_rcu() use the container_of() to get css address,
and it treated the address of [cgroup_root->cgroup->rstat_css_list -
rstat_css_node] to be a css address.
cgroup_rstat_flush_locked() use the wrong css address to do css->ss-
>css_rstat_flush, then the wrong function address will be jump.
#define list_for_each_entry_rcu(pos, head, member,
cond...) \
for (__list_check_rcu(dummy, ## cond, 0), \
pos = list_entry_rcu((head)->next, typeof(*pos), member); \
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
We look the patch of move css object from A list to B list is merged by
following link:
https://android.googlesource.com/kernel/common/+/a7df69b81aac5bdeb5c5aef9addd680ce22feebf%5E%21/#F0
Do you have any suggestion for this issue?
Thank you.
Best Regards,
Jing-Ting Wu
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [Bug] race condition at rebind_subsystems() 2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu @ 2022-07-15 11:59 ` Michal Koutný [not found] ` <20220715115938.GA8646-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Michal Koutný @ 2022-07-15 11:59 UTC (permalink / raw) To: Jing-Ting Wu Cc: Johannes Weiner, Tejun Heo, Zefan Li, Matthias Brugger, cgroups, linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt, wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen Hello Jing-Ting. On Thu, Jun 30, 2022 at 06:52:10PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote: > Root cause: > The rebind_subsystems() is no lock held when move css object from A > list to B list,then let B's head be treated as css node at > list_for_each_entry_rcu(). Nice outcome of the analysis. > The call stack as following: > kthread > -> worker_thread > -> process_one_work > -> flush_memcg_stats_dwork > -> __mem_cgroup_flush_stats > -> cgroup_rstat_flush_irqsafe > -> cgroup_rstat_flush_locked > > Detail: > static void cgroup_rstat_flush_locked(struct cgroup *cgrp, bool > may_sleep) > { > ... > rcu_read_lock(); > list_for_each_entry_rcu(css, &pos->rstat_css_list, > rstat_css_node) > css->ss->css_rstat_flush(css, cpu); > rcu_read_unlock(); > ... > } > > Because the content of css->ss is not a function address, > once the css_rstat_flush is called, kernel exception will happen. I agree with your analysis. The list_for_each_entry_rcu() is not accounting for the move from A to B and the head's rstat_css_list is invalid in the cgroup_rstat_flush() context. > When the issue happened, the list of pos->rstat_css_list at group A is > empty. > There must be racing conditions. > > From reviewing code, we find rebind_subsystems() is no lock held when > move css object to other list. > > int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) > { > ... > if (ss->css_rstat_flush) { > list_del_rcu(&css->rstat_css_node); > list_add_rcu(&css->rstat_css_node, > &dcgrp->rstat_css_list); > } > ... > } > > If we held a css object from A group list, at same time this css object > was moved to B group list. > [...] > Do you have any suggestion for this issue? The css->rstat_css_node should not be modified if there are possible RCU readers elsewhere. One way to fix this would be to insert synchronize_rcu() after list_del_rcu() and before list_add_rcu(). (A further alternative (I've heard about) would be to utilize 'nulls' RCU lists [1] to make the move between lists detectable.) But as I'm looking at it from distance, it may be simpler and sufficient to just take cgroup_rstat_lock around the list migration (the nesting under cgroup_mutex that's held with rebind_subsystems() is fine). HTH, Michal [1] https://www.kernel.org/doc/html/latest/RCU/rculist_nulls.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20220715115938.GA8646-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>]
* Re: [Bug] race condition at rebind_subsystems() [not found] ` <20220715115938.GA8646-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> @ 2022-07-15 16:47 ` Tejun Heo [not found] ` <YtGaP+e35DZYSQf0-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Tejun Heo @ 2022-07-15 16:47 UTC (permalink / raw) To: Michal Koutný Cc: Jing-Ting Wu, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Shakeel Butt, wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, lixiong.liu-NuS5LvNUpcJWk0Htik3J/w, wenju.xu-NuS5LvNUpcJWk0Htik3J/w, jonathan.jmchen-NuS5LvNUpcJWk0Htik3J/w (resending, I messed up the message header, sorry) Hello, On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote: > The css->rstat_css_node should not be modified if there are possible RCU > readers elsewhere. > One way to fix this would be to insert synchronize_rcu() after > list_del_rcu() and before list_add_rcu(). > (A further alternative (I've heard about) would be to utilize 'nulls' > RCU lists [1] to make the move between lists detectable.) > > But as I'm looking at it from distance, it may be simpler and sufficient > to just take cgroup_rstat_lock around the list migration (the nesting > under cgroup_mutex that's held with rebind_subsystems() is fine). synchronize_rcu() prolly is the better fit here given how that list_node's usage, but yeah, great find. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <YtGaP+e35DZYSQf0-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>]
* Re: [Bug] race condition at rebind_subsystems() [not found] ` <YtGaP+e35DZYSQf0-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org> @ 2022-07-18 7:44 ` Jing-Ting Wu 2022-07-19 13:35 ` Michal Koutný 0 siblings, 1 reply; 7+ messages in thread From: Jing-Ting Wu @ 2022-07-18 7:44 UTC (permalink / raw) To: Tejun Heo, Michal Koutný Cc: Johannes Weiner, Zefan Li, Matthias Brugger, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Shakeel Butt, wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, lixiong.liu-NuS5LvNUpcJWk0Htik3J/w, wenju.xu-NuS5LvNUpcJWk0Htik3J/w, jonathan.jmchen-NuS5LvNUpcJWk0Htik3J/w On Fri, 2022-07-15 at 06:47 -1000, Tejun Heo wrote: > (resending, I messed up the message header, sorry) > > Hello, > > On Fri, Jul 15, 2022 at 01:59:38PM +0200, Michal Koutný wrote: > > The css->rstat_css_node should not be modified if there are > > possible RCU > > readers elsewhere. > > One way to fix this would be to insert synchronize_rcu() after > > list_del_rcu() and before list_add_rcu(). > > (A further alternative (I've heard about) would be to utilize > > 'nulls' > > RCU lists [1] to make the move between lists detectable.) > > > > But as I'm looking at it from distance, it may be simpler and > > sufficient > > to just take cgroup_rstat_lock around the list migration (the > > nesting > > under cgroup_mutex that's held with rebind_subsystems() is fine). > > synchronize_rcu() prolly is the better fit here given how that > list_node's > usage, but yeah, great find. > > Thanks. > Hi Michal and Tejun, Thanks for your suggestion. Accroding your description, is the following patch corrent? --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1813,6 +1813,7 @@ if (ss->css_rstat_flush) { list_del_rcu(&css->rstat_css_node); + synchronize_rcu(); list_add_rcu(&css->rstat_css_node, &dcgrp->rstat_css_list); } If the patch is correct, we will add this patch to our stability test. And we will continue to observe whether the problem is solved. Thank you. Best regards, Jing-Ting Wu ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bug] race condition at rebind_subsystems() 2022-07-18 7:44 ` Jing-Ting Wu @ 2022-07-19 13:35 ` Michal Koutný [not found] ` <20220719133512.GD897-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Michal Koutný @ 2022-07-19 13:35 UTC (permalink / raw) To: Jing-Ting Wu Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups, linux-kernel, linux-arm-kernel, linux-mediatek, Shakeel Butt, wsd_upstream, lixiong.liu, wenju.xu, jonathan.jmchen On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu <jing-ting.wu@mediatek.com> wrote: > Accroding your description, is the following patch corrent? Yes, this is what I meant (i.e. grace period before invalidating the removed rstat_css_node). > If the patch is correct, we will add this patch to our stability test. > And we will continue to observe whether the problem is solved. It'd be great to have such confirmation. Thanks, Michal ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20220719133512.GD897-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>]
* Re: [Bug] race condition at rebind_subsystems() [not found] ` <20220719133512.GD897-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org> @ 2022-07-21 6:46 ` Jing-Ting Wu [not found] ` <98d738ec5d7d32441f6e62278fff32201fd948de.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Jing-Ting Wu @ 2022-07-21 6:46 UTC (permalink / raw) To: Michal Koutný Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Shakeel Butt, wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, lixiong.liu-NuS5LvNUpcJWk0Htik3J/w, wenju.xu-NuS5LvNUpcJWk0Htik3J/w, jonathan.jmchen-NuS5LvNUpcJWk0Htik3J/w On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote: > On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu < > jing-ting.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org> wrote: > > Accroding your description, is the following patch corrent? > > Yes, this is what I meant (i.e. grace period before invalidating the > removed rstat_css_node). > > > If the patch is correct, we will add this patch to our stability > > test. > > And we will continue to observe whether the problem is solved. > > It'd be great to have such confirmation. Thanks for confirm the patch. We already add patch to our stability test. We will reply mail if we have any update. > > Thanks, > Michal Best regards, Jing-Ting Wu ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <98d738ec5d7d32441f6e62278fff32201fd948de.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>]
* Re: [Bug] race condition at rebind_subsystems() [not found] ` <98d738ec5d7d32441f6e62278fff32201fd948de.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org> @ 2022-08-07 6:58 ` Jing-Ting Wu 0 siblings, 0 replies; 7+ messages in thread From: Jing-Ting Wu @ 2022-08-07 6:58 UTC (permalink / raw) To: Michal Koutný Cc: Tejun Heo, Johannes Weiner, Zefan Li, Matthias Brugger, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Shakeel Butt, wsd_upstream-NuS5LvNUpcJWk0Htik3J/w, lixiong.liu-NuS5LvNUpcJWk0Htik3J/w, wenju.xu-NuS5LvNUpcJWk0Htik3J/w, jonathan.jmchen-NuS5LvNUpcJWk0Htik3J/w On Thu, 2022-07-21 at 14:46 +0800, Jing-Ting Wu wrote: > On Tue, 2022-07-19 at 15:35 +0200, Michal Koutný wrote: > > On Mon, Jul 18, 2022 at 03:44:21PM +0800, Jing-Ting Wu < > > jing-ting.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org> wrote: > > > Accroding your description, is the following patch corrent? > > > > Yes, this is what I meant (i.e. grace period before invalidating > > the > > removed rstat_css_node). > > > > > If the patch is correct, we will add this patch to our stability > > > test. > > > And we will continue to observe whether the problem is solved. > > > > It'd be great to have such confirmation. > > Thanks for confirm the patch. > We already add patch to our stability test. > We will reply mail if we have any update. > > > > > Thanks, > > Michal > Hi, Michal Use the patch of previous mail, we pass the stability test at previous two week and still testing. I think the patch is helpful to the syndrome. It passed stability test over two week so far. (as-is: < one week failed) Do you help to upstream this patch to mainline? Thank you. Best regards, Jing-Ting Wu > > Best regards, > Jing-Ting Wu ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-07 6:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-30 10:52 [Bug] race condition at rebind_subsystems() Jing-Ting Wu
2022-07-15 11:59 ` Michal Koutný
[not found] ` <20220715115938.GA8646-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-07-15 16:47 ` Tejun Heo
[not found] ` <YtGaP+e35DZYSQf0-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2022-07-18 7:44 ` Jing-Ting Wu
2022-07-19 13:35 ` Michal Koutný
[not found] ` <20220719133512.GD897-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2022-07-21 6:46 ` Jing-Ting Wu
[not found] ` <98d738ec5d7d32441f6e62278fff32201fd948de.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
2022-08-07 6:58 ` Jing-Ting Wu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox