* Additional issue with cpuset isolated partitions?
@ 2024-11-15 16:30 Juri Lelli
2024-11-15 17:47 ` Waiman Long
0 siblings, 1 reply; 3+ messages in thread
From: Juri Lelli @ 2024-11-15 16:30 UTC (permalink / raw)
To: Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutny
Cc: linux-kernel, cgroups
Hello,
While working on the recent cpuset/deadline fixes [1], I encountered
what looks like an issue to me. What I'm doing is (based on one of the
tests of test_cpuset_prs.sh):
# echo Y >/sys/kernel/debug/sched/verbose
# echo +cpuset >cgroup/cgroup.subtree_control
# mkdir cgroup/A1
# echo 0-3 >cgroup/A1/cpuset.cpus
# echo +cpuset >cgroup/A1/cgroup.subtree_control
# mkdir cgroup/A1/A2
# echo 1-3 >cgroup/A1/A2/cpuset.cpus
# echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
# mkdir cgroup/A1/A2/A3
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
# echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
# echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
# echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
and with this, on my 8 CPUs system, I correctly get a root domain for
0-1,4-7 and 2,3 are left isolated (attached to default root domain).
I now put the shell into the A1/A2/A3 cpuset
# echo $$ >cgroup/A1/A2/A3/cgroup.procs
and hotplug CPU 2,3
# echo 0 >/sys/devices/system/cpu/cpu2/online
# echo 0 >/sys/devices/system/cpu/cpu3/online
guess the shell is moved to the non-isolated domain. So far so good
then, only that if I turn CPUs 2,3 back on they are attached to the root
domain containing the non-isolated cpus
# echo 1 >/sys/devices/system/cpu/cpu2/online
...
[ 990.133593] root domain span: 0-2,4-7
[ 990.134480] rd 0-2,4-7
# echo 1 >/sys/devices/system/cpu/cpu3/online
...
[ 1082.858992] root domain span: 0-7
[ 1082.859530] rd 0-7
And now the A1/A2/A3 partition is not valid anymore
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
Is this expected? It looks like one need to put at least one process in
the partition before hotplugging its cpus for the above to reproduce
(hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
and isolated).
Thanks,
Juri
1 - https://lore.kernel.org/lkml/20241114142810.794657-1-juri.lelli@redhat.com/
https://lore.kernel.org/lkml/20241110025023.664487-1-longman@redhat.com/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Additional issue with cpuset isolated partitions?
2024-11-15 16:30 Additional issue with cpuset isolated partitions? Juri Lelli
@ 2024-11-15 17:47 ` Waiman Long
2024-11-18 9:11 ` Juri Lelli
0 siblings, 1 reply; 3+ messages in thread
From: Waiman Long @ 2024-11-15 17:47 UTC (permalink / raw)
To: Juri Lelli, Tejun Heo, Johannes Weiner, Michal Koutny
Cc: linux-kernel, cgroups
On 11/15/24 11:30 AM, Juri Lelli wrote:
> Hello,
>
> While working on the recent cpuset/deadline fixes [1], I encountered
> what looks like an issue to me. What I'm doing is (based on one of the
> tests of test_cpuset_prs.sh):
>
> # echo Y >/sys/kernel/debug/sched/verbose
> # echo +cpuset >cgroup/cgroup.subtree_control
> # mkdir cgroup/A1
> # echo 0-3 >cgroup/A1/cpuset.cpus
> # echo +cpuset >cgroup/A1/cgroup.subtree_control
> # mkdir cgroup/A1/A2
> # echo 1-3 >cgroup/A1/A2/cpuset.cpus
> # echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
> # mkdir cgroup/A1/A2/A3
> # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
> # echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
> # echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
> # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
> # echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
>
> and with this, on my 8 CPUs system, I correctly get a root domain for
> 0-1,4-7 and 2,3 are left isolated (attached to default root domain).
>
> I now put the shell into the A1/A2/A3 cpuset
>
> # echo $$ >cgroup/A1/A2/A3/cgroup.procs
>
> and hotplug CPU 2,3
>
> # echo 0 >/sys/devices/system/cpu/cpu2/online
> # echo 0 >/sys/devices/system/cpu/cpu3/online
>
> guess the shell is moved to the non-isolated domain. So far so good
> then, only that if I turn CPUs 2,3 back on they are attached to the root
> domain containing the non-isolated cpus
A valid partition must have CPUs associated with it. If no CPU is
available, it becomes invalid and fall back to use the CPUs from the
parent cgroup.
>
> # echo 1 >/sys/devices/system/cpu/cpu2/online
> ...
> [ 990.133593] root domain span: 0-2,4-7
> [ 990.134480] rd 0-2,4-7
>
> # echo 1 >/sys/devices/system/cpu/cpu3/online
> ...
> [ 1082.858992] root domain span: 0-7
> [ 1082.859530] rd 0-7
>
> And now the A1/A2/A3 partition is not valid anymore
>
> # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
>
> Is this expected? It looks like one need to put at least one process in
> the partition before hotplugging its cpus for the above to reproduce
> (hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
> and isolated).
Once a partition becomes invalid, there is no self recovery if the CPUs
become online again. Users have to explicitly re-enable it. It is really
a very rare case and so we don't spend effort to do that.
If only one of 2 CPUs are offline and then online again, the full 2-CPU
isolated partition can be recovered.
Please let me know if you have further question.
Cheers,
Longman
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Additional issue with cpuset isolated partitions?
2024-11-15 17:47 ` Waiman Long
@ 2024-11-18 9:11 ` Juri Lelli
0 siblings, 0 replies; 3+ messages in thread
From: Juri Lelli @ 2024-11-18 9:11 UTC (permalink / raw)
To: Waiman Long
Cc: Tejun Heo, Johannes Weiner, Michal Koutny, linux-kernel, cgroups
On 15/11/24 12:47, Waiman Long wrote:
> On 11/15/24 11:30 AM, Juri Lelli wrote:
> > Hello,
> >
> > While working on the recent cpuset/deadline fixes [1], I encountered
> > what looks like an issue to me. What I'm doing is (based on one of the
> > tests of test_cpuset_prs.sh):
> >
> > # echo Y >/sys/kernel/debug/sched/verbose
> > # echo +cpuset >cgroup/cgroup.subtree_control
> > # mkdir cgroup/A1
> > # echo 0-3 >cgroup/A1/cpuset.cpus
> > # echo +cpuset >cgroup/A1/cgroup.subtree_control
> > # mkdir cgroup/A1/A2
> > # echo 1-3 >cgroup/A1/A2/cpuset.cpus
> > # echo +cpuset >cgroup/A1/A2/cgroup.subtree_control
> > # mkdir cgroup/A1/A2/A3
> > # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus
> > # echo 2-3 >cgroup/A1/cpuset.cpus.exclusive
> > # echo 2-3 >cgroup/A1/A2/cpuset.cpus.exclusive
> > # echo 2-3 >cgroup/A1/A2/A3/cpuset.cpus.exclusive
> > # echo isolated >cgroup/A1/A2/A3/cpuset.cpus.partition
> >
> > and with this, on my 8 CPUs system, I correctly get a root domain for
> > 0-1,4-7 and 2,3 are left isolated (attached to default root domain).
> >
> > I now put the shell into the A1/A2/A3 cpuset
> >
> > # echo $$ >cgroup/A1/A2/A3/cgroup.procs
> >
> > and hotplug CPU 2,3
> >
> > # echo 0 >/sys/devices/system/cpu/cpu2/online
> > # echo 0 >/sys/devices/system/cpu/cpu3/online
> >
> > guess the shell is moved to the non-isolated domain. So far so good
> > then, only that if I turn CPUs 2,3 back on they are attached to the root
> > domain containing the non-isolated cpus
> A valid partition must have CPUs associated with it. If no CPU is available,
> it becomes invalid and fall back to use the CPUs from the parent cgroup.
Hummm, OK. But, if I don't put any process in the partition the behavior
is different, in that the partition still reads as correctly isolated
and CPUs are not moved to the root domain after hotplug, i.e.,
# echo 0 >/sys/devices/system/cpu/cpu2/online
# echo 0 >/sys/devices/system/cpu/cpu3/online
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated
# echo 1 >/sys/devices/system/cpu/cpu2/online
# echo 1 >/sys/devices/system/cpu/cpu3/online
# cat cgroup/A1/A2/A3/cpuset.cpus.partition
isolated
This is what puzzled me, the difference in behavior w/ or w/o a process
in the cgroup.
> > # echo 1 >/sys/devices/system/cpu/cpu2/online
> > ...
> > [ 990.133593] root domain span: 0-2,4-7
> > [ 990.134480] rd 0-2,4-7
> >
> > # echo 1 >/sys/devices/system/cpu/cpu3/online
> > ...
> > [ 1082.858992] root domain span: 0-7
> > [ 1082.859530] rd 0-7
> >
> > And now the A1/A2/A3 partition is not valid anymore
> >
> > # cat cgroup/A1/A2/A3/cpuset.cpus.partition
> > isolated invalid (Invalid cpu list in cpuset.cpus.exclusive)
> >
> > Is this expected? It looks like one need to put at least one process in
> > the partition before hotplugging its cpus for the above to reproduce
> > (hotpluging w/o processes involved leaves CPUs 2,3 in the default domain
> > and isolated).
>
> Once a partition becomes invalid, there is no self recovery if the CPUs
> become online again. Users have to explicitly re-enable it. It is really a
> very rare case and so we don't spend effort to do that.
>
> If only one of 2 CPUs are offline and then online again, the full 2-CPU
> isolated partition can be recovered.
>
> Please let me know if you have further question.
I see the point, but please see above my only remaining question. :)
Thanks,
Juri
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-11-18 9:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-15 16:30 Additional issue with cpuset isolated partitions? Juri Lelli
2024-11-15 17:47 ` Waiman Long
2024-11-18 9:11 ` Juri Lelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox