* [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems @ 2023-01-31 22:17 Will Deacon 2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon 0 siblings, 2 replies; 31+ messages in thread From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw) To: linux-kernel Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, cgroups Hi folks, These two patches fix a couple of CPU affinity issues involving cpusets on heterogeneous systems. A concrete example of this is running 32-bit tasks on recent arm64 SoCs, where some of the cores are only capable of 64-bit execution. The first patch (from Peter) fixes a regression introduced during the recent merge window which is causing test failures in Android where the problematic patches have been backported. The second patch fixes a longer-standing issue, which I noticed while testing fixes for the initial regression. Ideally, both of these would land together, but fixing the regression for 6.2 is my main concern. Anyway, I don't think either Peter or I would call ourselves cpuset experts (far from it!), so please have a look. Cheers, Will Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <longman@redhat.com> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org --->8 Peter Zijlstra (1): cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon (1): cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task kernel/cgroup/cpuset.c | 57 +++++++++++++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 12 deletions(-) -- 2.39.1.456.gfc5497dd1b-goog ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-01-31 22:17 [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems Will Deacon @ 2023-01-31 22:17 ` Will Deacon 2023-02-01 4:14 ` Waiman Long 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon 1 sibling, 1 reply; 31+ messages in thread From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw) To: linux-kernel Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, cgroups From: Peter Zijlstra <peterz@infradead.org> There is a difference in behaviour between CPUSET={y,n} that is now wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") relax_compatible_cpus_allowed_ptr() is calling __sched_setaffinity() unconditionally. But the underlying problem goes back a lot further, possibly to commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which switched cpuset_cpus_allowed() from cs->cpus_allowed to cs->effective_cpus. The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out all offline CPUs. For tasks that are part of a (!root) cpuset this is then later fixed up by the cpuset hotplug notifiers that re-evaluate and re-apply cs->effective_cpus, but for (normal) tasks in the root cpuset this does not happen and they will forever after be excluded from CPUs onlined later. As such, rewrite cpuset_cpus_allowed() to return a wider mask, including the offline CPUs. Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org> --- kernel/cgroup/cpuset.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index a29c0b13706b..8552cc2c586a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3683,23 +3683,52 @@ void __init cpuset_init_smp(void) BUG_ON(!cpuset_migrate_mm_wq); } +static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs) +{ + const struct cpumask *cs_mask = cs->cpus_allowed; + if (!parent_cs(cs)) + cs_mask = cpu_possible_mask; + return cs_mask; +} + +static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask) +{ + do { + cpumask_and(pmask, pmask, __cs_cpus_allowed(cs)); + cs = parent_cs(cs); + } while (cs); +} + /** * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. * @pmask: pointer to struct cpumask variable to receive cpus_allowed set. * - * Description: Returns the cpumask_var_t cpus_allowed of the cpuset - * attached to the specified @tsk. Guaranteed to return some non-empty - * subset of cpu_online_mask, even if this means going outside the - * tasks cpuset. + * Description: Returns the cpumask_var_t cpus_allowed of the cpuset attached + * to the specified @tsk. Guaranteed to return some non-empty intersection + * with cpu_online_mask, even if this means going outside the tasks cpuset. **/ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) { unsigned long flags; + struct cpuset *cs; spin_lock_irqsave(&callback_lock, flags); - guarantee_online_cpus(tsk, pmask); + rcu_read_lock(); + + cs = task_cs(tsk); + do { + cpumask_copy(pmask, task_cpu_possible_mask(tsk)); + cs_cpus_allowed(cs, pmask); + + if (cpumask_intersects(pmask, cpu_online_mask)) + break; + + cs = parent_cs(cs); + } while (cs); + + rcu_read_unlock(); spin_unlock_irqrestore(&callback_lock, flags); } -- 2.39.1.456.gfc5497dd1b-goog ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon @ 2023-02-01 4:14 ` Waiman Long 2023-02-01 9:14 ` Peter Zijlstra 2023-02-02 8:34 ` Peter Zijlstra 0 siblings, 2 replies; 31+ messages in thread From: Waiman Long @ 2023-02-01 4:14 UTC (permalink / raw) To: Will Deacon, linux-kernel Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 1/31/23 17:17, Will Deacon wrote: > From: Peter Zijlstra <peterz@infradead.org> > > There is a difference in behaviour between CPUSET={y,n} that is now > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). > > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the > user requested cpumask") relax_compatible_cpus_allowed_ptr() is > calling __sched_setaffinity() unconditionally. > > But the underlying problem goes back a lot further, possibly to > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which > switched cpuset_cpus_allowed() from cs->cpus_allowed to > cs->effective_cpus. > > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out > all offline CPUs. For tasks that are part of a (!root) cpuset this is > then later fixed up by the cpuset hotplug notifiers that re-evaluate > and re-apply cs->effective_cpus, but for (normal) tasks in the root > cpuset this does not happen and they will forever after be excluded > from CPUs onlined later. > > As such, rewrite cpuset_cpus_allowed() to return a wider mask, > including the offline CPUs. > > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") > Reported-by: Will Deacon <will@kernel.org> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck > Signed-off-by: Will Deacon <will@kernel.org> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only tracked online cpus and ignored the offline ones. It behaves more like effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus are effectively the same and track online cpus. With cpuset v2, cpus_allowed contains what the user has written into and it won't be changed until another write happen. However, what the user written may not be what the system can give it and effective_cpus is what the system decides a cpuset can use. Cpuset v2 is able to handle hotplug correctly and update the task's cpumask accordingly. So missing previously offline cpus won't happen with v2. Since v1 keeps the old behavior, previously offlined cpus are lost in the cpuset's cpus_allowed. However tasks in the root cpuset will still be fine with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only tasks in a non-root cpuset suffer this problem. It was a known issue in v1 and I believe is one of the major reasons of the cpuset v2 redesign. A major concern I have is the overhead of creating a poor man version of v2 cpus_allowed. This issue can be worked around even for cpuset v1 if it is mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask handling. Alternatively we may be able to provide a config option to make this the default for v1 without the special mount option, if necessary. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 4:14 ` Waiman Long @ 2023-02-01 9:14 ` Peter Zijlstra 2023-02-01 15:16 ` Waiman Long 2023-02-02 8:34 ` Peter Zijlstra 1 sibling, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-01 9:14 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: > On 1/31/23 17:17, Will Deacon wrote: > > From: Peter Zijlstra <peterz@infradead.org> > > > > There is a difference in behaviour between CPUSET={y,n} that is now > > wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). > > > > Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the > > user requested cpumask") relax_compatible_cpus_allowed_ptr() is > > calling __sched_setaffinity() unconditionally. > > > > But the underlying problem goes back a lot further, possibly to > > commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which > > switched cpuset_cpus_allowed() from cs->cpus_allowed to > > cs->effective_cpus. > > > > The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out > > all offline CPUs. For tasks that are part of a (!root) cpuset this is > > then later fixed up by the cpuset hotplug notifiers that re-evaluate > > and re-apply cs->effective_cpus, but for (normal) tasks in the root > > cpuset this does not happen and they will forever after be excluded > > from CPUs onlined later. > > > > As such, rewrite cpuset_cpus_allowed() to return a wider mask, > > including the offline CPUs. > > > > Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") > > Reported-by: Will Deacon <will@kernel.org> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck > > Signed-off-by: Will Deacon <will@kernel.org> > > Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only > tracked online cpus and ignored the offline ones. It behaves more like > effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and > effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus > are effectively the same and track online cpus. With cpuset v2, cpus_allowed > contains what the user has written into and it won't be changed until > another write happen. However, what the user written may not be what the > system can give it and effective_cpus is what the system decides a cpuset > can use. > > Cpuset v2 is able to handle hotplug correctly and update the task's cpumask > accordingly. So missing previously offline cpus won't happen with v2. > > Since v1 keeps the old behavior, previously offlined cpus are lost in the > cpuset's cpus_allowed. However tasks in the root cpuset will still be fine > with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only > tasks in a non-root cpuset suffer this problem. > > It was a known issue in v1 and I believe is one of the major reasons of the > cpuset v2 redesign. > > A major concern I have is the overhead of creating a poor man version of v2 > cpus_allowed. This issue can be worked around even for cpuset v1 if it is > mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask > handling. Alternatively we may be able to provide a config option to make > this the default for v1 without the special mount option, if necessary. You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT* mask offline cpus for root cgroup tasks, ever. (And the only reason it gets away with masking offline for !root is that it re-applies the mask every time it changes.) Yes it did that for a fair while -- but it is wrong and broken and a very big behavioural difference between CONFIG_CPUSET={y,n}. This must not be. Arguably cpuset-v2 is still wrong for masking offline cpus in it's effective_cpus mask, but I really didn't want to go rewrite cpuset.c for something that needs to go into /urgent *now*. Hence this minimal patch that at least lets sched_setaffinity() work as intended. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 9:14 ` Peter Zijlstra @ 2023-02-01 15:16 ` Waiman Long 2023-02-01 18:46 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-01 15:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 04:14, Peter Zijlstra wrote: > On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: >> On 1/31/23 17:17, Will Deacon wrote: >>> From: Peter Zijlstra <peterz@infradead.org> >>> >>> There is a difference in behaviour between CPUSET={y,n} that is now >>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). >>> >>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the >>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is >>> calling __sched_setaffinity() unconditionally. >>> >>> But the underlying problem goes back a lot further, possibly to >>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which >>> switched cpuset_cpus_allowed() from cs->cpus_allowed to >>> cs->effective_cpus. >>> >>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out >>> all offline CPUs. For tasks that are part of a (!root) cpuset this is >>> then later fixed up by the cpuset hotplug notifiers that re-evaluate >>> and re-apply cs->effective_cpus, but for (normal) tasks in the root >>> cpuset this does not happen and they will forever after be excluded >>> from CPUs onlined later. >>> >>> As such, rewrite cpuset_cpus_allowed() to return a wider mask, >>> including the offline CPUs. >>> >>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") >>> Reported-by: Will Deacon <will@kernel.org> >>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >>> Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck >>> Signed-off-by: Will Deacon <will@kernel.org> >> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only >> tracked online cpus and ignored the offline ones. It behaves more like >> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - cpus_allowed and >> effective_cpus. When cpuset v1 is mounted, cpus_allowed and effective_cpus >> are effectively the same and track online cpus. With cpuset v2, cpus_allowed >> contains what the user has written into and it won't be changed until >> another write happen. However, what the user written may not be what the >> system can give it and effective_cpus is what the system decides a cpuset >> can use. >> >> Cpuset v2 is able to handle hotplug correctly and update the task's cpumask >> accordingly. So missing previously offline cpus won't happen with v2. >> >> Since v1 keeps the old behavior, previously offlined cpus are lost in the >> cpuset's cpus_allowed. However tasks in the root cpuset will still be fine >> with cpu hotplug as its cpus_allowed should track cpu_online_mask. IOW, only >> tasks in a non-root cpuset suffer this problem. >> >> It was a known issue in v1 and I believe is one of the major reasons of the >> cpuset v2 redesign. >> >> A major concern I have is the overhead of creating a poor man version of v2 >> cpus_allowed. This issue can be worked around even for cpuset v1 if it is >> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask >> handling. Alternatively we may be able to provide a config option to make >> this the default for v1 without the special mount option, if necessary. > You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT* > mask offline cpus for root cgroup tasks, ever. (And the only reason it > gets away with masking offline for !root is that it re-applies the mask > every time it changes.) > > Yes it did that for a fair while -- but it is wrong and broken and a > very big behavioural difference between CONFIG_CPUSET={y,n}. This must > not be. > > Arguably cpuset-v2 is still wrong for masking offline cpus in it's > effective_cpus mask, but I really didn't want to go rewrite cpuset.c for > something that needs to go into /urgent *now*. > > Hence this minimal patch that at least lets sched_setaffinity() work as > intended. I don't object to the general idea of keeping offline cpus in a task's cpu affinity. In the case of cpu offline event, we can skip removing that offline cpu from the task's cpu affinity. That will partially solve the problem here and is also simpler. I believe a main reason why effective_cpus holds only online cpus is because of the need to detect when there is no online cpus available in a given cpuset. In this case, it will fall back to the nearest ancestors with online cpus. This offline cpu problem with cpuset v1 is a known problem for a long time. It is not a recent regression. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 15:16 ` Waiman Long @ 2023-02-01 18:46 ` Waiman Long 2023-02-01 19:14 ` Waiman Long 2023-02-01 21:10 ` Peter Zijlstra 0 siblings, 2 replies; 31+ messages in thread From: Waiman Long @ 2023-02-01 18:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 10:16, Waiman Long wrote: > On 2/1/23 04:14, Peter Zijlstra wrote: >> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: >>> On 1/31/23 17:17, Will Deacon wrote: >>>> From: Peter Zijlstra <peterz@infradead.org> >>>> >>>> There is a difference in behaviour between CPUSET={y,n} that is now >>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). >>>> >>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the >>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is >>>> calling __sched_setaffinity() unconditionally. >>>> >>>> But the underlying problem goes back a lot further, possibly to >>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which >>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to >>>> cs->effective_cpus. >>>> >>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out >>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is >>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate >>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root >>>> cpuset this does not happen and they will forever after be excluded >>>> from CPUs onlined later. >>>> >>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask, >>>> including the offline CPUs. >>>> >>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested >>>> cpumask") >>>> Reported-by: Will Deacon <will@kernel.org> >>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >>>> Link: >>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck >>>> Signed-off-by: Will Deacon <will@kernel.org> >>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only >>> tracked online cpus and ignored the offline ones. It behaves more like >>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - >>> cpus_allowed and >>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and >>> effective_cpus >>> are effectively the same and track online cpus. With cpuset v2, >>> cpus_allowed >>> contains what the user has written into and it won't be changed until >>> another write happen. However, what the user written may not be what >>> the >>> system can give it and effective_cpus is what the system decides a >>> cpuset >>> can use. >>> >>> Cpuset v2 is able to handle hotplug correctly and update the task's >>> cpumask >>> accordingly. So missing previously offline cpus won't happen with v2. >>> >>> Since v1 keeps the old behavior, previously offlined cpus are lost >>> in the >>> cpuset's cpus_allowed. However tasks in the root cpuset will still >>> be fine >>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. >>> IOW, only >>> tasks in a non-root cpuset suffer this problem. >>> >>> It was a known issue in v1 and I believe is one of the major reasons >>> of the >>> cpuset v2 redesign. >>> >>> A major concern I have is the overhead of creating a poor man >>> version of v2 >>> cpus_allowed. This issue can be worked around even for cpuset v1 if >>> it is >>> mounted with the cpuset_v2_mode option to behave more like v2 in its >>> cpumask >>> handling. Alternatively we may be able to provide a config option to >>> make >>> this the default for v1 without the special mount option, if necessary. >> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* *NOT* >> mask offline cpus for root cgroup tasks, ever. (And the only reason it >> gets away with masking offline for !root is that it re-applies the mask >> every time it changes.) >> >> Yes it did that for a fair while -- but it is wrong and broken and a >> very big behavioural difference between CONFIG_CPUSET={y,n}. This must >> not be. >> >> Arguably cpuset-v2 is still wrong for masking offline cpus in it's >> effective_cpus mask, but I really didn't want to go rewrite cpuset.c for >> something that needs to go into /urgent *now*. >> >> Hence this minimal patch that at least lets sched_setaffinity() work as >> intended. > > I don't object to the general idea of keeping offline cpus in a task's > cpu affinity. In the case of cpu offline event, we can skip removing > that offline cpu from the task's cpu affinity. That will partially > solve the problem here and is also simpler. > > I believe a main reason why effective_cpus holds only online cpus is > because of the need to detect when there is no online cpus available > in a given cpuset. In this case, it will fall back to the nearest > ancestors with online cpus. > > This offline cpu problem with cpuset v1 is a known problem for a long > time. It is not a recent regression. Note that using cpus_allowed directly in cgroup v2 may not be right because cpus_allowed may have no relationship to effective_cpus at all in some cases, e.g. root | V A (cpus_allowed = 1-4, effective_cpus = 1-4) | V B (cpus_allowed = 5-8, effective_cpus = 1-4) In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong. I wonder how often is cpu hotplug happening in those arm64 cpu systems that only have a subset of cpus that can run 32-bit programs. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 18:46 ` Waiman Long @ 2023-02-01 19:14 ` Waiman Long 2023-02-01 19:17 ` Waiman Long 2023-02-01 21:10 ` Peter Zijlstra 1 sibling, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-01 19:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 13:46, Waiman Long wrote: > On 2/1/23 10:16, Waiman Long wrote: >> On 2/1/23 04:14, Peter Zijlstra wrote: >>> On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: >>>> On 1/31/23 17:17, Will Deacon wrote: >>>>> From: Peter Zijlstra <peterz@infradead.org> >>>>> >>>>> There is a difference in behaviour between CPUSET={y,n} that is now >>>>> wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). >>>>> >>>>> Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the >>>>> user requested cpumask") relax_compatible_cpus_allowed_ptr() is >>>>> calling __sched_setaffinity() unconditionally. >>>>> >>>>> But the underlying problem goes back a lot further, possibly to >>>>> commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") >>>>> which >>>>> switched cpuset_cpus_allowed() from cs->cpus_allowed to >>>>> cs->effective_cpus. >>>>> >>>>> The problem is that for CPUSET=y cpuset_cpus_allowed() will filter >>>>> out >>>>> all offline CPUs. For tasks that are part of a (!root) cpuset this is >>>>> then later fixed up by the cpuset hotplug notifiers that re-evaluate >>>>> and re-apply cs->effective_cpus, but for (normal) tasks in the root >>>>> cpuset this does not happen and they will forever after be excluded >>>>> from CPUs onlined later. >>>>> >>>>> As such, rewrite cpuset_cpus_allowed() to return a wider mask, >>>>> including the offline CPUs. >>>>> >>>>> Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested >>>>> cpumask") >>>>> Reported-by: Will Deacon <will@kernel.org> >>>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> >>>>> Link: >>>>> https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck >>>>> Signed-off-by: Will Deacon <will@kernel.org> >>>> Before cgroup v2, cpuset had only one cpumask - cpus_allowed. It only >>>> tracked online cpus and ignored the offline ones. It behaves more like >>>> effective_cpus in cpuset v2. With v2, we have 2 cpumasks - >>>> cpus_allowed and >>>> effective_cpus. When cpuset v1 is mounted, cpus_allowed and >>>> effective_cpus >>>> are effectively the same and track online cpus. With cpuset v2, >>>> cpus_allowed >>>> contains what the user has written into and it won't be changed until >>>> another write happen. However, what the user written may not be >>>> what the >>>> system can give it and effective_cpus is what the system decides a >>>> cpuset >>>> can use. >>>> >>>> Cpuset v2 is able to handle hotplug correctly and update the task's >>>> cpumask >>>> accordingly. So missing previously offline cpus won't happen with v2. >>>> >>>> Since v1 keeps the old behavior, previously offlined cpus are lost >>>> in the >>>> cpuset's cpus_allowed. However tasks in the root cpuset will still >>>> be fine >>>> with cpu hotplug as its cpus_allowed should track cpu_online_mask. >>>> IOW, only >>>> tasks in a non-root cpuset suffer this problem. >>>> >>>> It was a known issue in v1 and I believe is one of the major >>>> reasons of the >>>> cpuset v2 redesign. >>>> >>>> A major concern I have is the overhead of creating a poor man >>>> version of v2 >>>> cpus_allowed. This issue can be worked around even for cpuset v1 if >>>> it is >>>> mounted with the cpuset_v2_mode option to behave more like v2 in >>>> its cpumask >>>> handling. Alternatively we may be able to provide a config option >>>> to make >>>> this the default for v1 without the special mount option, if >>>> necessary. >>> You're still not getting it -- even cpuset (be it v1 or v2) *MUST* >>> *NOT* >>> mask offline cpus for root cgroup tasks, ever. (And the only reason it >>> gets away with masking offline for !root is that it re-applies the mask >>> every time it changes.) >>> >>> Yes it did that for a fair while -- but it is wrong and broken and a >>> very big behavioural difference between CONFIG_CPUSET={y,n}. This must >>> not be. >>> >>> Arguably cpuset-v2 is still wrong for masking offline cpus in it's >>> effective_cpus mask, but I really didn't want to go rewrite cpuset.c >>> for >>> something that needs to go into /urgent *now*. >>> >>> Hence this minimal patch that at least lets sched_setaffinity() work as >>> intended. >> >> I don't object to the general idea of keeping offline cpus in a >> task's cpu affinity. In the case of cpu offline event, we can skip >> removing that offline cpu from the task's cpu affinity. That will >> partially solve the problem here and is also simpler. >> >> I believe a main reason why effective_cpus holds only online cpus is >> because of the need to detect when there is no online cpus available >> in a given cpuset. In this case, it will fall back to the nearest >> ancestors with online cpus. >> >> This offline cpu problem with cpuset v1 is a known problem for a long >> time. It is not a recent regression. > > Note that using cpus_allowed directly in cgroup v2 may not be right > because cpus_allowed may have no relationship to effective_cpus at all > in some cases, e.g. > > root > | > V > A (cpus_allowed = 1-4, effective_cpus = 1-4) > | > V > B (cpus_allowed = 5-8, effective_cpus = 1-4) > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is > wrong. > > I wonder how often is cpu hotplug happening in those arm64 cpu systems > that only have a subset of cpus that can run 32-bit programs. One possible solution is to use cpuset_cpus_allowed_fallback() in case none of the cpus in the current cpuset is allowed to be used to run a given task. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 19:14 ` Waiman Long @ 2023-02-01 19:17 ` Waiman Long 0 siblings, 0 replies; 31+ messages in thread From: Waiman Long @ 2023-02-01 19:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 14:14, Waiman Long wrote: > One possible solution is to use cpuset_cpus_allowed_fallback() in case > none of the cpus in the current cpuset is allowed to be used to run a > given task. It looks like we will need to enhance cpuset_cpus_allowed_fallback() to walk up cpuset tree to find one that have useable cpus. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 18:46 ` Waiman Long 2023-02-01 19:14 ` Waiman Long @ 2023-02-01 21:10 ` Peter Zijlstra 2023-02-02 3:34 ` Waiman Long 1 sibling, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-01 21:10 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote: > Note that using cpus_allowed directly in cgroup v2 may not be right because > cpus_allowed may have no relationship to effective_cpus at all in some > cases, e.g. > > root > | > V > A (cpus_allowed = 1-4, effective_cpus = 1-4) > | > V > B (cpus_allowed = 5-8, effective_cpus = 1-4) > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong. I think my patch as written does the right thing here. Since the intersection of (1-4) and (5-8) is empty it will move up the hierarchy and we'll end up with (1-4) from the cgroup side of things. So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of the root set and force it to cpu_possible_mask. Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and all it's parents. This will, in the case of B above, result in the empty mask. Then cpuset_cpus_allowed() has a loop that starts with task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if the intersection of that and cpu_online_mask is empty, moves up the hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A. Note that since we force the mask of root to cpu_possible_mask, cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code) that cpu_online_mask always has a non-empty intersection with task_cpu_possible_mask(), this loop is guaranteed to terminate with a viable mask. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 21:10 ` Peter Zijlstra @ 2023-02-02 3:34 ` Waiman Long 2023-02-03 11:50 ` Will Deacon 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-02 3:34 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 16:10, Peter Zijlstra wrote: > On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote: > >> Note that using cpus_allowed directly in cgroup v2 may not be right because >> cpus_allowed may have no relationship to effective_cpus at all in some >> cases, e.g. >> >> root >> | >> V >> A (cpus_allowed = 1-4, effective_cpus = 1-4) >> | >> V >> B (cpus_allowed = 5-8, effective_cpus = 1-4) >> >> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong. > I think my patch as written does the right thing here. Since the > intersection of (1-4) and (5-8) is empty it will move up the hierarchy > and we'll end up with (1-4) from the cgroup side of things. > > So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of > the root set and force it to cpu_possible_mask. > > Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and > all it's parents. This will, in the case of B above, result in the empty > mask. > > Then cpuset_cpus_allowed() has a loop that starts with > task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if > the intersection of that and cpu_online_mask is empty, moves up the > hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A. > > Note that since we force the mask of root to cpu_possible_mask, > cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code) > that cpu_online_mask always has a non-empty intersection with > task_cpu_possible_mask(), this loop is guaranteed to terminate with a > viable mask. I will take a closer look at that tomorrow. I will be more comfortable ack'ing that if this is specific to v1 cpuset instead of applying this in both v1 and v2 since it is only v1 that is problematic. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 3:34 ` Waiman Long @ 2023-02-03 11:50 ` Will Deacon 2023-02-03 15:13 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Will Deacon @ 2023-02-03 11:50 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote: > On 2/1/23 16:10, Peter Zijlstra wrote: > > On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote: > > > > > Note that using cpus_allowed directly in cgroup v2 may not be right because > > > cpus_allowed may have no relationship to effective_cpus at all in some > > > cases, e.g. > > > > > > root > > > | > > > V > > > A (cpus_allowed = 1-4, effective_cpus = 1-4) > > > | > > > V > > > B (cpus_allowed = 5-8, effective_cpus = 1-4) > > > > > > In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong. > > I think my patch as written does the right thing here. Since the > > intersection of (1-4) and (5-8) is empty it will move up the hierarchy > > and we'll end up with (1-4) from the cgroup side of things. > > > > So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of > > the root set and force it to cpu_possible_mask. > > > > Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and > > all it's parents. This will, in the case of B above, result in the empty > > mask. > > > > Then cpuset_cpus_allowed() has a loop that starts with > > task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if > > the intersection of that and cpu_online_mask is empty, moves up the > > hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A. > > > > Note that since we force the mask of root to cpu_possible_mask, > > cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code) > > that cpu_online_mask always has a non-empty intersection with > > task_cpu_possible_mask(), this loop is guaranteed to terminate with a > > viable mask. > > I will take a closer look at that tomorrow. I will be more comfortable > ack'ing that if this is specific to v1 cpuset instead of applying this in > both v1 and v2 since it is only v1 that is problematic. fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1. WIll ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-03 11:50 ` Will Deacon @ 2023-02-03 15:13 ` Waiman Long 2023-02-03 15:26 ` Peter Zijlstra 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-03 15:13 UTC (permalink / raw) To: Will Deacon Cc: Peter Zijlstra, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/3/23 06:50, Will Deacon wrote: > On Wed, Feb 01, 2023 at 10:34:00PM -0500, Waiman Long wrote: >> On 2/1/23 16:10, Peter Zijlstra wrote: >>> On Wed, Feb 01, 2023 at 01:46:11PM -0500, Waiman Long wrote: >>> >>>> Note that using cpus_allowed directly in cgroup v2 may not be right because >>>> cpus_allowed may have no relationship to effective_cpus at all in some >>>> cases, e.g. >>>> >>>> root >>>> | >>>> V >>>> A (cpus_allowed = 1-4, effective_cpus = 1-4) >>>> | >>>> V >>>> B (cpus_allowed = 5-8, effective_cpus = 1-4) >>>> >>>> In the case of cpuset B, passing back cpus 5-8 as the allowed_cpus is wrong. >>> I think my patch as written does the right thing here. Since the >>> intersection of (1-4) and (5-8) is empty it will move up the hierarchy >>> and we'll end up with (1-4) from the cgroup side of things. >>> >>> So the purpose of __cs_cpus_allowed() is to override the cpus_allowed of >>> the root set and force it to cpu_possible_mask. >>> >>> Then cs_cpus_allowed() computes the intersection of cs->cpus_allowed and >>> all it's parents. This will, in the case of B above, result in the empty >>> mask. >>> >>> Then cpuset_cpus_allowed() has a loop that starts with >>> task_cpu_possible_mask(), intersects that with cs_cpus_allowed() and if >>> the intersection of that and cpu_online_mask is empty, moves up the >>> hierarchy. Given cs_cpus_allowed(B) is the empty mask, we'll move to A. >>> >>> Note that since we force the mask of root to cpu_possible_mask, >>> cs_cpus_allowed(root) will be a no-op and if we guarantee (in arch code) >>> that cpu_online_mask always has a non-empty intersection with >>> task_cpu_possible_mask(), this loop is guaranteed to terminate with a >>> viable mask. >> I will take a closer look at that tomorrow. I will be more comfortable >> ack'ing that if this is specific to v1 cpuset instead of applying this in >> both v1 and v2 since it is only v1 that is problematic. > fwiw, the regression I'm seeing is with cgroup2. I haven't tried v1. I think I know where the problem is. It is due to the fact the cpuset hotplug code doesn't update cpumasks of the tasks in the top cpuset (root) at all when there is a cpu offline or online event. It is probably because for some of the tasks in the top cpuset, especially the percpu kthread, changing their cpumasks can be catastrophic. The hotplug code does update the cpumasks of the tasks that are not in the top cpuset. This problem is irrespective of whether v1 or v2 is in use. The partition code does try to update the cpumasks of the tasks in the top cpuset, but skip the percpu kthreads. My testing show that is probably OK. For safety, I agree that is better to extend the allowed cpu list to all the possible cpus (including offline ones) for now until more testings are done to find a safe way to do that. That special case should apply only to tasks in the top cpuset. For the rests, the current code should be OK and is less risky than adopting code in this patch. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-03 15:13 ` Waiman Long @ 2023-02-03 15:26 ` Peter Zijlstra 2023-02-03 15:35 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-03 15:26 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote: > I think I know where the problem is. It is due to the fact the cpuset > hotplug code doesn't update cpumasks of the tasks in the top cpuset (root) > at all when there is a cpu offline or online event. It is probably because > for some of the tasks in the top cpuset, especially the percpu kthread, > changing their cpumasks can be catastrophic. The hotplug code does update > the cpumasks of the tasks that are not in the top cpuset. This problem is > irrespective of whether v1 or v2 is in use. I've been saying this exact thing for how many mails now? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-03 15:26 ` Peter Zijlstra @ 2023-02-03 15:35 ` Waiman Long 0 siblings, 0 replies; 31+ messages in thread From: Waiman Long @ 2023-02-03 15:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/3/23 10:26, Peter Zijlstra wrote: > On Fri, Feb 03, 2023 at 10:13:10AM -0500, Waiman Long wrote: > >> I think I know where the problem is. It is due to the fact the cpuset >> hotplug code doesn't update cpumasks of the tasks in the top cpuset (root) >> at all when there is a cpu offline or online event. It is probably because >> for some of the tasks in the top cpuset, especially the percpu kthread, >> changing their cpumasks can be catastrophic. The hotplug code does update >> the cpumasks of the tasks that are not in the top cpuset. This problem is >> irrespective of whether v1 or v2 is in use. > I've been saying this exact thing for how many mails now? My bad. The fact that sched_getaffinity() masks off the offline cpus makes me thought incorrectly that tasks in the top cpuset were also updated by the hotplug code. Further testing indicates this is the case. Thanks, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-01 4:14 ` Waiman Long 2023-02-01 9:14 ` Peter Zijlstra @ 2023-02-02 8:34 ` Peter Zijlstra 2023-02-02 16:06 ` Waiman Long 1 sibling, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-02 8:34 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: > A major concern I have is the overhead of creating a poor man version of v2 > cpus_allowed. This issue can be worked around even for cpuset v1 if it is > mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask > handling. Alternatively we may be able to provide a config option to make > this the default for v1 without the special mount option, if necessary. It is equally broken for v2, it masks against effective_cpus. Not to mention it explicitly starts with cpu_online_mask. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 8:34 ` Peter Zijlstra @ 2023-02-02 16:06 ` Waiman Long 2023-02-02 19:42 ` Peter Zijlstra 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-02 16:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/2/23 03:34, Peter Zijlstra wrote: > On Tue, Jan 31, 2023 at 11:14:27PM -0500, Waiman Long wrote: > >> A major concern I have is the overhead of creating a poor man version of v2 >> cpus_allowed. This issue can be worked around even for cpuset v1 if it is >> mounted with the cpuset_v2_mode option to behave more like v2 in its cpumask >> handling. Alternatively we may be able to provide a config option to make >> this the default for v1 without the special mount option, if necessary. > It is equally broken for v2, it masks against effective_cpus. Not to > mention it explicitly starts with cpu_online_mask. After taking a close look at the patch, my understanding of what it is doing is as follows: v2: cpus_allowed will not be affected by hotplug. So the new cpuset_cpus_allowed() will return effective_cpus + offline cpus that should have been part of effective_cpus if online before masking it with allowable cpus and then go up the cpuset hierarchy if necessary. v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the current cpuset and move up the hierarchy if necessary to find a cpuset that have at least one allowable cpu. First of all, it does not take into account of the v2 partition feature that may cause it to produce incorrect result if partition is enabled somewhere. Secondly, I don't see any benefit other than having some additional offline cpu available in a task's cpumask which the scheduler will ignore anyway. v2 is able to recover a previously offlined cpu. So we don't gain any net benefit other than the going up the cpuset hierarchy part. For v1, I agree we should go up the cpuset hierarchy to find a usable cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), my current preference is to do the hierarchy climbing part in an enhanced cpuset_cpus_allowed_fallback() after an initial failure of cpuset_cpus_allowed(). That will be easier to understand than having such complexity and overhead in cpuset_cpus_allowed() alone. I will work on a patchset to do that as a counter offer. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 16:06 ` Waiman Long @ 2023-02-02 19:42 ` Peter Zijlstra 2023-02-02 20:46 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-02 19:42 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote: > After taking a close look at the patch, my understanding of what it is doing > is as follows: > > v2: cpus_allowed will not be affected by hotplug. So the new > cpuset_cpus_allowed() will return effective_cpus + offline cpus that should > have been part of effective_cpus if online before masking it with allowable > cpus and then go up the cpuset hierarchy if necessary. > > v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the > current cpuset and move up the hierarchy if necessary to find a cpuset that > have at least one allowable cpu. > > First of all, it does not take into account of the v2 partition feature that > may cause it to produce incorrect result if partition is enabled somewhere. How so? For a partition the cpus_allowed mask should be the parition CPUs. The only magical bit about partitions is that any one CPU cannot belong to two partitions and load-balancing is split. > Secondly, I don't see any benefit other than having some additional offline > cpu available in a task's cpumask which the scheduler will ignore anyway. Those CPUs can come online again -- you're *again* dismissing the true bug :/ If you filter out the offline CPUs at sched_setaffinity() time, you forever lose those CPUs, the task will never again move to those CPUs, even if they do come online after. It is really simple to reproduce this: - boot machine - offline all CPUs except one - taskset -p ffffffff $$ - online all CPUs and observe your shell (and all its decendants) being stuck to the one CPU. Do the same thing on a CPUSET=n build and note the difference (you retain the full mask). > v2 is able to recover a previously offlined cpu. So we don't gain any > net benefit other than the going up the cpuset hierarchy part. Only for !root tasks. Not even v2 will re-set the affinity of root tasks afaict. > For v1, I agree we should go up the cpuset hierarchy to find a usable > cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), > my current preference is to do the hierarchy climbing part in an enhanced > cpuset_cpus_allowed_fallback() after an initial failure of > cpuset_cpus_allowed(). That will be easier to understand than having such > complexity and overhead in cpuset_cpus_allowed() alone. > > I will work on a patchset to do that as a counter offer. We will need a small and simple patch for /urgent, or I will need to revert all your patches -- your call. I also don't tihnk you fully appreciate the ramifications of task_cpu_possible_mask(), cpuset currently gets that quite wrong. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 19:42 ` Peter Zijlstra @ 2023-02-02 20:46 ` Waiman Long 2023-02-02 20:48 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-02 20:46 UTC (permalink / raw) To: Peter Zijlstra, Tejun Heo Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On 2/2/23 14:42, Peter Zijlstra wrote: > On Thu, Feb 02, 2023 at 11:06:51AM -0500, Waiman Long wrote: > >> After taking a close look at the patch, my understanding of what it is doing >> is as follows: >> >> v2: cpus_allowed will not be affected by hotplug. So the new >> cpuset_cpus_allowed() will return effective_cpus + offline cpus that should >> have been part of effective_cpus if online before masking it with allowable >> cpus and then go up the cpuset hierarchy if necessary. >> >> v1: cpus_allowed is equivalent to v2 effective_cpus. It starts at the >> current cpuset and move up the hierarchy if necessary to find a cpuset that >> have at least one allowable cpu. >> >> First of all, it does not take into account of the v2 partition feature that >> may cause it to produce incorrect result if partition is enabled somewhere. > How so? For a partition the cpus_allowed mask should be the parition > CPUs. The only magical bit about partitions is that any one CPU cannot > belong to two partitions and load-balancing is split. There can be a child partition underneath it that uses up some of the cpus in cpus_allowed mask. So if you cascading up the cpuset tree from another child, your code will wrongly include those cpus that are dedicated to the other child partition. > >> Secondly, I don't see any benefit other than having some additional offline >> cpu available in a task's cpumask which the scheduler will ignore anyway. > Those CPUs can come online again -- you're *again* dismissing the true > bug :/ > > If you filter out the offline CPUs at sched_setaffinity() time, you > forever lose those CPUs, the task will never again move to those CPUs, > even if they do come online after. > > It is really simple to reproduce this: > > - boot machine > - offline all CPUs except one > - taskset -p ffffffff $$ > - online all CPUs > > and observe your shell (and all its decendants) being stuck to the one > CPU. Do the same thing on a CPUSET=n build and note the difference (you > retain the full mask). With the new sched_setaffinity(), the original mask should be stored in user_cpus_ptr. The cpuset code is supposed to call set_cpus_allowed_ptr() which then set the mask correctly. I will run your test and figure out why it does not work as I would have expected. > >> v2 is able to recover a previously offlined cpu. So we don't gain any >> net benefit other than the going up the cpuset hierarchy part. > Only for !root tasks. Not even v2 will re-set the affinity of root tasks > afaict. > >> For v1, I agree we should go up the cpuset hierarchy to find a usable >> cpuset. Instead of introducing such a complexity in cpuset_cpus_allowed(), >> my current preference is to do the hierarchy climbing part in an enhanced >> cpuset_cpus_allowed_fallback() after an initial failure of >> cpuset_cpus_allowed(). That will be easier to understand than having such >> complexity and overhead in cpuset_cpus_allowed() alone. >> >> I will work on a patchset to do that as a counter offer. > We will need a small and simple patch for /urgent, or I will need to > revert all your patches -- your call. > > I also don't tihnk you fully appreciate the ramifications of > task_cpu_possible_mask(), cpuset currently gets that quite wrong. OK, I don't realize the urgency of that. If it is that urgent, I will have no objection to get it in for now. We can improve it later on. So are you planning to get it into the current 6.2 rc or 6.3? Tejun, are you OK with that as you are the cgroup maintainer? Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 20:46 ` Waiman Long @ 2023-02-02 20:48 ` Tejun Heo 2023-02-02 20:53 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Tejun Heo @ 2023-02-02 20:48 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: > > > I will work on a patchset to do that as a counter offer. > > We will need a small and simple patch for /urgent, or I will need to > > revert all your patches -- your call. > > > > I also don't tihnk you fully appreciate the ramifications of > > task_cpu_possible_mask(), cpuset currently gets that quite wrong. > > OK, I don't realize the urgency of that. If it is that urgent, I will have > no objection to get it in for now. We can improve it later on. So are you > planning to get it into the current 6.2 rc or 6.3? > > Tejun, are you OK with that as you are the cgroup maintainer? Yeah, gotta fix the regression but is there currently a solution which fixes the regression but doesn't further break other stuff? Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 20:48 ` Tejun Heo @ 2023-02-02 20:53 ` Waiman Long 2023-02-02 21:05 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-02 20:53 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On 2/2/23 15:48, Tejun Heo wrote: > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: >>>> I will work on a patchset to do that as a counter offer. >>> We will need a small and simple patch for /urgent, or I will need to >>> revert all your patches -- your call. >>> >>> I also don't tihnk you fully appreciate the ramifications of >>> task_cpu_possible_mask(), cpuset currently gets that quite wrong. >> OK, I don't realize the urgency of that. If it is that urgent, I will have >> no objection to get it in for now. We can improve it later on. So are you >> planning to get it into the current 6.2 rc or 6.3? >> >> Tejun, are you OK with that as you are the cgroup maintainer? > Yeah, gotta fix the regression but is there currently a solution which fixes > the regression but doesn't further break other stuff? I believe there is a better way to do that, but it will need more time to flex out. Since cpuset_cpus_allowed() is only used by kernel/sched/core.c, Peter will be responsible if it somehow breaks other stuff. Thanks, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 20:53 ` Waiman Long @ 2023-02-02 21:05 ` Waiman Long 2023-02-02 21:50 ` Tejun Heo 0 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-02 21:05 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On 2/2/23 15:53, Waiman Long wrote: > > On 2/2/23 15:48, Tejun Heo wrote: >> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: >>>>> I will work on a patchset to do that as a counter offer. >>>> We will need a small and simple patch for /urgent, or I will need to >>>> revert all your patches -- your call. >>>> >>>> I also don't tihnk you fully appreciate the ramifications of >>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong. >>> OK, I don't realize the urgency of that. If it is that urgent, I >>> will have >>> no objection to get it in for now. We can improve it later on. So >>> are you >>> planning to get it into the current 6.2 rc or 6.3? >>> >>> Tejun, are you OK with that as you are the cgroup maintainer? >> Yeah, gotta fix the regression but is there currently a solution >> which fixes >> the regression but doesn't further break other stuff? > > I believe there is a better way to do that, but it will need more time > to flex out. Since cpuset_cpus_allowed() is only used by > kernel/sched/core.c, Peter will be responsible if it somehow breaks > other stuff. Maybe my cpuset patch that don't update task's cpumask on cpu offline event can help. However, I don't know the exact scenario where the regression happen, so it may not. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 21:05 ` Waiman Long @ 2023-02-02 21:50 ` Tejun Heo 2023-02-03 0:54 ` Waiman Long 2023-02-03 16:31 ` Will Deacon 0 siblings, 2 replies; 31+ messages in thread From: Tejun Heo @ 2023-02-02 21:50 UTC (permalink / raw) To: Waiman Long Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote: > > On 2/2/23 15:53, Waiman Long wrote: > > > > On 2/2/23 15:48, Tejun Heo wrote: > > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: > > > > > > I will work on a patchset to do that as a counter offer. > > > > > We will need a small and simple patch for /urgent, or I will need to > > > > > revert all your patches -- your call. > > > > > > > > > > I also don't tihnk you fully appreciate the ramifications of > > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong. > > > > OK, I don't realize the urgency of that. If it is that urgent, I > > > > will have > > > > no objection to get it in for now. We can improve it later on. > > > > So are you > > > > planning to get it into the current 6.2 rc or 6.3? > > > > > > > > Tejun, are you OK with that as you are the cgroup maintainer? > > > Yeah, gotta fix the regression but is there currently a solution > > > which fixes > > > the regression but doesn't further break other stuff? > > > > I believe there is a better way to do that, but it will need more time > > to flex out. Since cpuset_cpus_allowed() is only used by > > kernel/sched/core.c, Peter will be responsible if it somehow breaks > > other stuff. > > Maybe my cpuset patch that don't update task's cpumask on cpu offline event > can help. However, I don't know the exact scenario where the regression > happen, so it may not. Neither patch looks like they would break anything. That said, the patches aren't trivial and we're really close to the merge window, so I'd really appreciate if you can take a look and test a bit before we send these Linus's way. We can replace it with a better solution afterwards. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 21:50 ` Tejun Heo @ 2023-02-03 0:54 ` Waiman Long 2023-02-03 16:31 ` Will Deacon 1 sibling, 0 replies; 31+ messages in thread From: Waiman Long @ 2023-02-03 0:54 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, Will Deacon, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On 2/2/23 16:50, Tejun Heo wrote: > On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote: >> On 2/2/23 15:53, Waiman Long wrote: >>> On 2/2/23 15:48, Tejun Heo wrote: >>>> On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: >>>>>>> I will work on a patchset to do that as a counter offer. >>>>>> We will need a small and simple patch for /urgent, or I will need to >>>>>> revert all your patches -- your call. >>>>>> >>>>>> I also don't tihnk you fully appreciate the ramifications of >>>>>> task_cpu_possible_mask(), cpuset currently gets that quite wrong. >>>>> OK, I don't realize the urgency of that. If it is that urgent, I >>>>> will have >>>>> no objection to get it in for now. We can improve it later on. >>>>> So are you >>>>> planning to get it into the current 6.2 rc or 6.3? >>>>> >>>>> Tejun, are you OK with that as you are the cgroup maintainer? >>>> Yeah, gotta fix the regression but is there currently a solution >>>> which fixes >>>> the regression but doesn't further break other stuff? >>> I believe there is a better way to do that, but it will need more time >>> to flex out. Since cpuset_cpus_allowed() is only used by >>> kernel/sched/core.c, Peter will be responsible if it somehow breaks >>> other stuff. >> Maybe my cpuset patch that don't update task's cpumask on cpu offline event >> can help. However, I don't know the exact scenario where the regression >> happen, so it may not. > Neither patch looks like they would break anything. That said, the patches > aren't trivial and we're really close to the merge window, so I'd really > appreciate if you can take a look and test a bit before we send these > Linus's way. We can replace it with a better solution afterwards. OK, will do. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs 2023-02-02 21:50 ` Tejun Heo 2023-02-03 0:54 ` Waiman Long @ 2023-02-03 16:31 ` Will Deacon 1 sibling, 0 replies; 31+ messages in thread From: Will Deacon @ 2023-02-03 16:31 UTC (permalink / raw) To: Tejun Heo Cc: Waiman Long, Peter Zijlstra, linux-kernel, kernel-team, Zefan Li, Johannes Weiner, cgroups On Thu, Feb 02, 2023 at 11:50:55AM -1000, Tejun Heo wrote: > On Thu, Feb 02, 2023 at 04:05:14PM -0500, Waiman Long wrote: > > > > On 2/2/23 15:53, Waiman Long wrote: > > > > > > On 2/2/23 15:48, Tejun Heo wrote: > > > > On Thu, Feb 02, 2023 at 03:46:02PM -0500, Waiman Long wrote: > > > > > > > I will work on a patchset to do that as a counter offer. > > > > > > We will need a small and simple patch for /urgent, or I will need to > > > > > > revert all your patches -- your call. > > > > > > > > > > > > I also don't tihnk you fully appreciate the ramifications of > > > > > > task_cpu_possible_mask(), cpuset currently gets that quite wrong. > > > > > OK, I don't realize the urgency of that. If it is that urgent, I > > > > > will have > > > > > no objection to get it in for now. We can improve it later on. > > > > > So are you > > > > > planning to get it into the current 6.2 rc or 6.3? > > > > > > > > > > Tejun, are you OK with that as you are the cgroup maintainer? > > > > Yeah, gotta fix the regression but is there currently a solution > > > > which fixes > > > > the regression but doesn't further break other stuff? > > > > > > I believe there is a better way to do that, but it will need more time > > > to flex out. Since cpuset_cpus_allowed() is only used by > > > kernel/sched/core.c, Peter will be responsible if it somehow breaks > > > other stuff. > > > > Maybe my cpuset patch that don't update task's cpumask on cpu offline event > > can help. However, I don't know the exact scenario where the regression > > happen, so it may not. > > Neither patch looks like they would break anything. That said, the patches > aren't trivial and we're really close to the merge window, so I'd really > appreciate if you can take a look and test a bit before we send these > Linus's way. We can replace it with a better solution afterwards. FWIW, I tested this series in an arm64 heterogeneous setup with things like hotplug and exec()ing between 32-bit and 64-bit tasks and it all seems good. The alternative would be to revert Waiman's setaffinity changes, which I've had a go at here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=ssa-reverts and I _think_ I've rescued the UAF fix too. What do people prefer? Will ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-01-31 22:17 [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems Will Deacon 2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon @ 2023-01-31 22:17 ` Will Deacon 2023-02-01 2:22 ` Waiman Long ` (3 more replies) 1 sibling, 4 replies; 31+ messages in thread From: Will Deacon @ 2023-01-31 22:17 UTC (permalink / raw) To: linux-kernel Cc: kernel-team, Will Deacon, Peter Zijlstra, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, cgroups set_cpus_allowed_ptr() will fail with -EINVAL if the requested affinity mask is not a subset of the task_cpu_possible_mask() for the task being updated. Consequently, on a heterogeneous system with cpusets spanning the different CPU types, updates to the cgroup hierarchy can silently fail to update task affinities when the effective affinity mask for the cpuset is expanded. For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are the only cores capable of executing 32-bit tasks. Attaching a 32-bit task to a cpuset containing CPUs 0-2 will correctly affine the task to CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend the affinity mask of the 32-bit task because update_tasks_cpumask() will pass the full 0-3 mask to set_cpus_allowed_ptr(). Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater and use it to mask the 'effective_cpus' mask with the possible mask for each task being updated. Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") Signed-off-by: Will Deacon <will@kernel.org> --- Note: We wondered whether it was worth calling guarantee_online_cpus() if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that this path is only called when the effective mask changes, it didn't seem appropriate. Ultimately, if you have 32-bit tasks attached to a cpuset containing only 64-bit cpus, then the affinity is going to be forced. kernel/cgroup/cpuset.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 8552cc2c586a..f15fb0426707 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void) /** * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed + * @new_cpus: the temp variable for the new effective_cpus mask * * Iterate through each task of @cs updating its cpus_allowed to the * effective cpuset's. As this function is called with cpuset_rwsem held, * cpuset membership stays stable. */ -static void update_tasks_cpumask(struct cpuset *cs) +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) { struct css_task_iter it; struct task_struct *task; @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs) if (top_cs && (task->flags & PF_KTHREAD) && kthread_is_per_cpu(task)) continue; - set_cpus_allowed_ptr(task, cs->effective_cpus); + + cpumask_and(new_cpus, cs->effective_cpus, + task_cpu_possible_mask(task)); + set_cpus_allowed_ptr(task, new_cpus); } css_task_iter_end(&it); } @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); if (adding || deleting) - update_tasks_cpumask(parent); + update_tasks_cpumask(parent, tmp->new_cpus); /* * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary. @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); - update_tasks_cpumask(cp); + update_tasks_cpumask(cp, tmp->new_cpus); /* * On legacy hierarchy, if the effective cpumask of any non- @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) } } - update_tasks_cpumask(parent); + update_tasks_cpumask(parent, tmpmask.new_cpus); if (parent->child_ecpus_count) update_sibling_cpumasks(parent, cs, &tmpmask); @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, * as the tasks will be migrated to an ancestor. */ if (cpus_updated && !cpumask_empty(cs->cpus_allowed)) - update_tasks_cpumask(cs); + update_tasks_cpumask(cs, new_cpus); if (mems_updated && !nodes_empty(cs->mems_allowed)) update_tasks_nodemask(cs); @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs, spin_unlock_irq(&callback_lock); if (cpus_updated) - update_tasks_cpumask(cs); + update_tasks_cpumask(cs, new_cpus); if (mems_updated) update_tasks_nodemask(cs); } -- 2.39.1.456.gfc5497dd1b-goog ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon @ 2023-02-01 2:22 ` Waiman Long 2023-02-01 9:15 ` Peter Zijlstra 2023-02-01 9:27 ` Peter Zijlstra ` (2 subsequent siblings) 3 siblings, 1 reply; 31+ messages in thread From: Waiman Long @ 2023-02-01 2:22 UTC (permalink / raw) To: Will Deacon, linux-kernel Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 1/31/23 17:17, Will Deacon wrote: > set_cpus_allowed_ptr() will fail with -EINVAL if the requested > affinity mask is not a subset of the task_cpu_possible_mask() for the > task being updated. Consequently, on a heterogeneous system with cpusets > spanning the different CPU types, updates to the cgroup hierarchy can > silently fail to update task affinities when the effective affinity > mask for the cpuset is expanded. > > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are > the only cores capable of executing 32-bit tasks. Attaching a 32-bit > task to a cpuset containing CPUs 0-2 will correctly affine the task to > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend > the affinity mask of the 32-bit task because update_tasks_cpumask() will > pass the full 0-3 mask to set_cpus_allowed_ptr(). > > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater > and use it to mask the 'effective_cpus' mask with the possible mask for > each task being updated. > > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") > Signed-off-by: Will Deacon <will@kernel.org> > --- > > Note: We wondered whether it was worth calling guarantee_online_cpus() > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that > this path is only called when the effective mask changes, it didn't > seem appropriate. Ultimately, if you have 32-bit tasks attached to a > cpuset containing only 64-bit cpus, then the affinity is going to be > forced. Now I see how the sched_setaffinity() change is impacting arm64. Instead of putting in the bandage in cpuset. I would suggest doing another cpu masking in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr. Another suggestion that I have is to add a helper like has_task_cpu_possible_mask() that returns a true/false value. In this way, we only need to do an additional masking if we have some mismatched 32-bit only cpus available in the system running 32-bit binaries so that we can skip this step in most cases compiling them out in non-arm64 systems. By doing that, we may be able to remove some of the existing usages of task_cpu_possible_mask(). Thought? Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-02-01 2:22 ` Waiman Long @ 2023-02-01 9:15 ` Peter Zijlstra 2023-02-01 15:03 ` Waiman Long 0 siblings, 1 reply; 31+ messages in thread From: Peter Zijlstra @ 2023-02-01 9:15 UTC (permalink / raw) To: Waiman Long Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote: > On 1/31/23 17:17, Will Deacon wrote: > > set_cpus_allowed_ptr() will fail with -EINVAL if the requested > > affinity mask is not a subset of the task_cpu_possible_mask() for the > > task being updated. Consequently, on a heterogeneous system with cpusets > > spanning the different CPU types, updates to the cgroup hierarchy can > > silently fail to update task affinities when the effective affinity > > mask for the cpuset is expanded. > > > > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are > > the only cores capable of executing 32-bit tasks. Attaching a 32-bit > > task to a cpuset containing CPUs 0-2 will correctly affine the task to > > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend > > the affinity mask of the 32-bit task because update_tasks_cpumask() will > > pass the full 0-3 mask to set_cpus_allowed_ptr(). > > > > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater > > and use it to mask the 'effective_cpus' mask with the possible mask for > > each task being updated. > > > > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") > > Signed-off-by: Will Deacon <will@kernel.org> > > --- > > > > Note: We wondered whether it was worth calling guarantee_online_cpus() > > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that > > this path is only called when the effective mask changes, it didn't > > seem appropriate. Ultimately, if you have 32-bit tasks attached to a > > cpuset containing only 64-bit cpus, then the affinity is going to be > > forced. > > Now I see how the sched_setaffinity() change is impacting arm64. Instead of > putting in the bandage in cpuset. I would suggest doing another cpu masking > in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr. NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed. Masking the offline CPUs is *WRONG*. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-02-01 9:15 ` Peter Zijlstra @ 2023-02-01 15:03 ` Waiman Long 0 siblings, 0 replies; 31+ messages in thread From: Waiman Long @ 2023-02-01 15:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Will Deacon, linux-kernel, kernel-team, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 2/1/23 04:15, Peter Zijlstra wrote: > On Tue, Jan 31, 2023 at 09:22:44PM -0500, Waiman Long wrote: >> On 1/31/23 17:17, Will Deacon wrote: >>> set_cpus_allowed_ptr() will fail with -EINVAL if the requested >>> affinity mask is not a subset of the task_cpu_possible_mask() for the >>> task being updated. Consequently, on a heterogeneous system with cpusets >>> spanning the different CPU types, updates to the cgroup hierarchy can >>> silently fail to update task affinities when the effective affinity >>> mask for the cpuset is expanded. >>> >>> For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are >>> the only cores capable of executing 32-bit tasks. Attaching a 32-bit >>> task to a cpuset containing CPUs 0-2 will correctly affine the task to >>> CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend >>> the affinity mask of the 32-bit task because update_tasks_cpumask() will >>> pass the full 0-3 mask to set_cpus_allowed_ptr(). >>> >>> Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater >>> and use it to mask the 'effective_cpus' mask with the possible mask for >>> each task being updated. >>> >>> Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") >>> Signed-off-by: Will Deacon <will@kernel.org> >>> --- >>> >>> Note: We wondered whether it was worth calling guarantee_online_cpus() >>> if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that >>> this path is only called when the effective mask changes, it didn't >>> seem appropriate. Ultimately, if you have 32-bit tasks attached to a >>> cpuset containing only 64-bit cpus, then the affinity is going to be >>> forced. >> Now I see how the sched_setaffinity() change is impacting arm64. Instead of >> putting in the bandage in cpuset. I would suggest doing another cpu masking >> in __set_cpus_allowed_ptr() similar to what is now done for user_cpus_ptr. > NO! cpuset is *BROKEN* it has been for a while, it needs to get fixed. > > Masking the offline CPUs is *WRONG*. > This patch is not related to offline cpus at all. It is all about the 32-bit misfit cpus in some arm64 system. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon 2023-02-01 2:22 ` Waiman Long @ 2023-02-01 9:27 ` Peter Zijlstra 2023-02-03 17:55 ` Waiman Long 2023-02-06 20:21 ` Tejun Heo 3 siblings, 0 replies; 31+ messages in thread From: Peter Zijlstra @ 2023-02-01 9:27 UTC (permalink / raw) To: Will Deacon Cc: linux-kernel, kernel-team, Waiman Long, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote: > set_cpus_allowed_ptr() will fail with -EINVAL if the requested > affinity mask is not a subset of the task_cpu_possible_mask() for the > task being updated. Consequently, on a heterogeneous system with cpusets > spanning the different CPU types, updates to the cgroup hierarchy can > silently fail to update task affinities when the effective affinity > mask for the cpuset is expanded. > > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are > the only cores capable of executing 32-bit tasks. Attaching a 32-bit > task to a cpuset containing CPUs 0-2 will correctly affine the task to > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend > the affinity mask of the 32-bit task because update_tasks_cpumask() will > pass the full 0-3 mask to set_cpus_allowed_ptr(). > > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater > and use it to mask the 'effective_cpus' mask with the possible mask for > each task being updated. > > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") > Signed-off-by: Will Deacon <will@kernel.org> > --- > > Note: We wondered whether it was worth calling guarantee_online_cpus() > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that > this path is only called when the effective mask changes, it didn't > seem appropriate. Ultimately, if you have 32-bit tasks attached to a > cpuset containing only 64-bit cpus, then the affinity is going to be > forced. Right, so the case above where the cpuset is shrunk to 0-1, then the intersection between the cpuset and task_cpu_possible_mask() is empty and it currently simply fails to update mask. I argued it was probably desired to walk up the tree to find a wider parent until the intersection of {cpuset, task_cpu_possible_mask, online} becomes non-empty. But I suppose that can wait until we have more time. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon 2023-02-01 2:22 ` Waiman Long 2023-02-01 9:27 ` Peter Zijlstra @ 2023-02-03 17:55 ` Waiman Long 2023-02-06 20:21 ` Tejun Heo 3 siblings, 0 replies; 31+ messages in thread From: Waiman Long @ 2023-02-03 17:55 UTC (permalink / raw) To: Will Deacon, linux-kernel Cc: kernel-team, Peter Zijlstra, Zefan Li, Tejun Heo, Johannes Weiner, cgroups On 1/31/23 17:17, Will Deacon wrote: > set_cpus_allowed_ptr() will fail with -EINVAL if the requested > affinity mask is not a subset of the task_cpu_possible_mask() for the > task being updated. Consequently, on a heterogeneous system with cpusets > spanning the different CPU types, updates to the cgroup hierarchy can > silently fail to update task affinities when the effective affinity > mask for the cpuset is expanded. > > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are > the only cores capable of executing 32-bit tasks. Attaching a 32-bit > task to a cpuset containing CPUs 0-2 will correctly affine the task to > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend > the affinity mask of the 32-bit task because update_tasks_cpumask() will > pass the full 0-3 mask to set_cpus_allowed_ptr(). > > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater > and use it to mask the 'effective_cpus' mask with the possible mask for > each task being updated. > > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") > Signed-off-by: Will Deacon <will@kernel.org> > --- > > Note: We wondered whether it was worth calling guarantee_online_cpus() > if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that > this path is only called when the effective mask changes, it didn't > seem appropriate. Ultimately, if you have 32-bit tasks attached to a > cpuset containing only 64-bit cpus, then the affinity is going to be > forced. > > kernel/cgroup/cpuset.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 8552cc2c586a..f15fb0426707 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void) > /** > * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. > * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed > + * @new_cpus: the temp variable for the new effective_cpus mask > * > * Iterate through each task of @cs updating its cpus_allowed to the > * effective cpuset's. As this function is called with cpuset_rwsem held, > * cpuset membership stays stable. > */ > -static void update_tasks_cpumask(struct cpuset *cs) > +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) > { > struct css_task_iter it; > struct task_struct *task; > @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs) > if (top_cs && (task->flags & PF_KTHREAD) && > kthread_is_per_cpu(task)) > continue; > - set_cpus_allowed_ptr(task, cs->effective_cpus); > + > + cpumask_and(new_cpus, cs->effective_cpus, > + task_cpu_possible_mask(task)); > + set_cpus_allowed_ptr(task, new_cpus); > } > css_task_iter_end(&it); > } > @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, > spin_unlock_irq(&callback_lock); > > if (adding || deleting) > - update_tasks_cpumask(parent); > + update_tasks_cpumask(parent, tmp->new_cpus); > > /* > * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary. > @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, > WARN_ON(!is_in_v2_mode() && > !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); > > - update_tasks_cpumask(cp); > + update_tasks_cpumask(cp, tmp->new_cpus); > > /* > * On legacy hierarchy, if the effective cpumask of any non- > @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) > } > } > > - update_tasks_cpumask(parent); > + update_tasks_cpumask(parent, tmpmask.new_cpus); > > if (parent->child_ecpus_count) > update_sibling_cpumasks(parent, cs, &tmpmask); > @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, > * as the tasks will be migrated to an ancestor. > */ > if (cpus_updated && !cpumask_empty(cs->cpus_allowed)) > - update_tasks_cpumask(cs); > + update_tasks_cpumask(cs, new_cpus); > if (mems_updated && !nodes_empty(cs->mems_allowed)) > update_tasks_nodemask(cs); > > @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs, > spin_unlock_irq(&callback_lock); > > if (cpus_updated) > - update_tasks_cpumask(cs); > + update_tasks_cpumask(cs, new_cpus); > if (mems_updated) > update_tasks_nodemask(cs); > } Acked-by: Waiman Long <longman@redhat.com> This change is good for backporting to stable releases. For the latest kernel, I will prefer to centralize the masking in __set_cpus_allowed_ptr() where a scratch_mask is already being used for masking purpose. Let get this patch merged now, I will send a patch to move off the masking afterward. Cheers, Longman ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon ` (2 preceding siblings ...) 2023-02-03 17:55 ` Waiman Long @ 2023-02-06 20:21 ` Tejun Heo 3 siblings, 0 replies; 31+ messages in thread From: Tejun Heo @ 2023-02-06 20:21 UTC (permalink / raw) To: Will Deacon Cc: linux-kernel, kernel-team, Peter Zijlstra, Waiman Long, Zefan Li, Johannes Weiner, cgroups On Tue, Jan 31, 2023 at 10:17:19PM +0000, Will Deacon wrote: > set_cpus_allowed_ptr() will fail with -EINVAL if the requested > affinity mask is not a subset of the task_cpu_possible_mask() for the > task being updated. Consequently, on a heterogeneous system with cpusets > spanning the different CPU types, updates to the cgroup hierarchy can > silently fail to update task affinities when the effective affinity > mask for the cpuset is expanded. > > For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are > the only cores capable of executing 32-bit tasks. Attaching a 32-bit > task to a cpuset containing CPUs 0-2 will correctly affine the task to > CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend > the affinity mask of the 32-bit task because update_tasks_cpumask() will > pass the full 0-3 mask to set_cpus_allowed_ptr(). > > Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater > and use it to mask the 'effective_cpus' mask with the possible mask for > each task being updated. > > Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") > Signed-off-by: Will Deacon <will@kernel.org> Applied to cgroup/for-6.2-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2023-02-06 20:21 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-01-31 22:17 [PATCH 0/2] Fix broken cpuset affinity handling on heterogeneous systems Will Deacon 2023-01-31 22:17 ` [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Will Deacon 2023-02-01 4:14 ` Waiman Long 2023-02-01 9:14 ` Peter Zijlstra 2023-02-01 15:16 ` Waiman Long 2023-02-01 18:46 ` Waiman Long 2023-02-01 19:14 ` Waiman Long 2023-02-01 19:17 ` Waiman Long 2023-02-01 21:10 ` Peter Zijlstra 2023-02-02 3:34 ` Waiman Long 2023-02-03 11:50 ` Will Deacon 2023-02-03 15:13 ` Waiman Long 2023-02-03 15:26 ` Peter Zijlstra 2023-02-03 15:35 ` Waiman Long 2023-02-02 8:34 ` Peter Zijlstra 2023-02-02 16:06 ` Waiman Long 2023-02-02 19:42 ` Peter Zijlstra 2023-02-02 20:46 ` Waiman Long 2023-02-02 20:48 ` Tejun Heo 2023-02-02 20:53 ` Waiman Long 2023-02-02 21:05 ` Waiman Long 2023-02-02 21:50 ` Tejun Heo 2023-02-03 0:54 ` Waiman Long 2023-02-03 16:31 ` Will Deacon 2023-01-31 22:17 ` [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Will Deacon 2023-02-01 2:22 ` Waiman Long 2023-02-01 9:15 ` Peter Zijlstra 2023-02-01 15:03 ` Waiman Long 2023-02-01 9:27 ` Peter Zijlstra 2023-02-03 17:55 ` Waiman Long 2023-02-06 20:21 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).