From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dietmar Eggemann Subject: Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on suspend-resume Date: Tue, 14 Mar 2023 12:41:36 +0100 Message-ID: References: <20230228174627.vja5aejq27dsta2u@airbuntu> <20230301122852.zgzreby42lh2zf6w@airbuntu> <20230301170322.xthlso7jfkixlyex@airbuntu> <20230311185150.stvtcbdkoofgn3wd@airbuntu> <7070da53-a5a7-6965-5604-abee3cae9d46@arm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Return-path: Content-Language: en-US In-Reply-To: <7070da53-a5a7-6965-5604-abee3cae9d46-5wv7dgnIgG8@public.gmane.org> List-ID: Content-Type: text/plain; charset="utf-8" To: Juri Lelli , Qais Yousef Cc: Hao Luo , Peter Zijlstra , Ingo Molnar , Waiman Long , Steven Rostedt , tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, luca.abeni-5rdYK369eBLQB0XuIGIEkQ@public.gmane.org, claudio-YOzL5CV4y4YG1A2ADO40+w@public.gmane.org, tommaso.cucinotta-5rdYK369eBLQB0XuIGIEkQ@public.gmane.org, bristot-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, mathieu.poirier-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Zefan Li , linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org On 13/03/2023 18:10, Dietmar Eggemann wrote: > On 13/03/2023 17:37, Juri Lelli wrote: >> On 11/03/23 18:51, Qais Yousef wrote: >>> On 03/09/23 14:23, Hao Luo wrote: >>>> On Wed, Mar 8, 2023 at 10:55 PM Juri Lelli wrote: >>>>> >>>>> On 08/03/23 10:01, Hao Luo wrote: >>>>>> On Wed, Mar 8, 2023 at 2:20 AM Juri Lelli wrote: >>>>>>> >>>>>>> On 01/03/23 17:03, Qais Yousef wrote: >>>>>>>> On 03/01/23 15:26, Juri Lelli wrote: > > [...] > >>> Yeah I am working on 5.10 too (this will need to be backported to 5.10 and 5.15 >>> ultimately) and had the same crash because task is NULL. >>> >>> Fixed it this way which I think what you intended to do Juri? It moves the >>> check for dl_task(task) inside cgroup_taskset_for_each() loop. >>> >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >>> index 83a8943467fb..06d6bb68d86b 100644 >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -2495,11 +2495,11 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) >>> ret = security_task_setscheduler(task); >>> if (ret) >>> goto out_unlock; >>> - } >>> >>> - if (dl_task(task)) { >>> - cs->deadline_tasks++; >>> - cpuset_attach_old_cs->deadline_tasks--; >>> + if (dl_task(task)) { >>> + cs->deadline_tasks++; >>> + cpuset_attach_old_cs->deadline_tasks--; >>> + } >>> } >>> >>> /* >> >> Duh, indeed. >> >>> Like Hao I don't have any deadline tasks in the system. With the fix above >>> I don't notice the delay on suspend resume using your patches. >> >> OK, cool. >> >>> If you want any debug; please feel free to add them into your branch so I can >>> run with that and give you the log. >> >> Will need to find time to run some tests with DEADLINE tasks, yeah. >> Maybe Dietmar, since you reported as well the issue above with your >> testing, you could help with testing DEADLINE? > > Ah, now I see! It's the same issue I saw. And it's not specifically > related to DL tasks. Any tasks which you move into a cpuset will trigger > this. > Yeah, can do some DL tests later on this fix. This fix also works for my DL test. root@juno:~# ps2 | grep DLN 83 83 140 0 - DLN sugov:0 84 84 140 0 - DLN sugov:1 1601 1602 140 0 - DLN thread0-0 1601 1603 140 0 - DLN thread0-1 1601 1604 140 0 - DLN thread0-2 1601 1605 140 0 - DLN thread0-3 1601 1606 140 0 - DLN thread0-4 1601 1607 140 0 - DLN thread0-5 1601 1608 140 0 - DLN thread0-6 1601 1609 140 0 - DLN thread0-7 1601 1610 140 0 - DLN thread0-8 1601 1611 140 0 - DLN thread0-9 1601 1612 140 0 - DLN thread0-10 1601 1613 140 0 - DLN thread0-11 cgroupv1 root@juno:# cd /sys/fs/cgroup/cpuset root@juno:# mkdir cs1 root@juno:# echo 0 > cs1/cpuset.mems root@juno:# echo 0,3-5 > cs1/cpuset.cpus root@juno:# cgclassify -g cpuset:cs1 1602 1603 1604 $$ One remaining doubt: `cgclassify` will still move one task at a time so cpuset_can_attach() has to deal with one task per call. But cgroup_taskset_for_each() says that tset can contain multiple tasks. In this case we would have to think about only changing cs->deadline_tasks if all tasks can be moved. Don't know which test would trigger a tset with multiple tasks in cpuset_can_attach().