public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: luca abeni <luca.abeni@santannapisa.it>
To: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vineeth Pillai	 <vineeth@bitbyteword.org>
Subject: Re: SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG_RECLAIM jobs in the mix (using GRUB)
Date: Fri, 30 May 2025 11:21:08 +0200	[thread overview]
Message-ID: <20250530112108.63a24cde@luca64> (raw)
In-Reply-To: <c91a117401225290fbf0390f2ce78c3e0fb3b2d5.camel@codethink.co.uk>

Hi Marcel,

On Sun, 25 May 2025 21:29:05 +0200
Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> wrote:
[...]
> > How do you configure systemd? I am having troubles in reproducing
> > your AllowedCPUs configuration... This is an example of what I am
> > trying: sudo systemctl set-property --runtime custom-workload.slice
> > AllowedCPUs=1 sudo systemctl set-property --runtime init.scope
> > AllowedCPUs=0,2,3 sudo systemctl set-property --runtime
> > system.slice AllowedCPUs=0,2,3 sudo systemctl set-property
> > --runtime user.slice AllowedCPUs=0,2,3 and then I try to run a
> > SCHED_DEADLINE application with sudo systemd-run --scope -p
> > Slice=custom-workload.slice <application>  
> 
> We just use a bunch of systemd configuration files as follows:
> 
> [root@localhost ~]# cat /lib/systemd/system/monitor.slice
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
[...]

So, I copied your *.slice files in /lib/systemd/system (and I added
them to the "Wants=" entry of /lib/systemd/system/slices.target,
otherwise the slices are not created), but I am still unable to run
SCHED_DEADLINE applications in these slices.

This is due to the fact that the kernel does not create a new root
domain for these cpusets (probably because the cpusets' CPUs are not
exclusive and the cpuset is not "isolated": for example,
/sys/fs/cgroup/safety1.slice/cpuset.cpus.partition is set to "member",
not to "isolated"). So, the "cpumask_subset(span, p->cpus_ptr)" in
sched_setsched() is still false and the syscall returns -EPERM.


Since I do not know how to obtain an isolated cpuset with cgroup v2 and
systemd, I tried using the old cgroup v1, as described in the
SCHED_DEADLINE documentation.

This worked fine, and enabling SCHED_FLAG_RECLAIM actually reduced the
number of missed deadlines (I tried with a set of periodic tasks having
the same parameters as the ones you described). So, it looks like
reclaiming is working correctly (at least, as far as I can see) when
using cgroup v1 to configure the CPU partitions... Maybe there is some
bug triggered by cgroup v2, or maybe I am misunderstanding your setup.

I think the experiment suggested by Juri can help in understanding
where the issue can be.


			Thanks,
				Luca


> [Unit]
> Description=Prioritized slice for the safety monitor.
> Before=slices.target
> 
> [Slice]
> CPUWeight=1000
> AllowedCPUs=0
> MemoryAccounting=true
> MemoryMin=10%
> ManagedOOMPreference=omit
> 
> [Install]
> WantedBy=slices.target
> 
> [root@localhost ~]# cat /lib/systemd/system/safety1.slice
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
> [Unit]
> Description=Slice for Safety case processes.
> Before=slices.target
> 
> [Slice]
> CPUWeight=1000
> AllowedCPUs=1
> MemoryAccounting=true
> MemoryMin=10%
> ManagedOOMPreference=omit
> 
> [Install]
> WantedBy=slices.target
> 
> [root@localhost ~]# cat /lib/systemd/system/safety2.slice
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
> [Unit]
> Description=Slice for Safety case processes.
> Before=slices.target
> 
> [Slice]
> CPUWeight=1000
> AllowedCPUs=2
> MemoryAccounting=true
> MemoryMin=10%
> ManagedOOMPreference=omit
> 
> [Install]
> WantedBy=slices.target
> 
> [root@localhost ~]# cat /lib/systemd/system/safety3.slice
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
> [Unit]
> Description=Slice for Safety case processes.
> Before=slices.target
> 
> [Slice]
> CPUWeight=1000
> AllowedCPUs=3
> MemoryAccounting=true
> MemoryMin=10%
> ManagedOOMPreference=omit
> 
> [Install]
> WantedBy=slices.target
> 
> [root@localhost ~]# cat /lib/systemd/system/system.slice 
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
> 
> #
> # This slice will control all processes started by systemd by
> # default.
> #
> 
> [Unit]
> Description=System Slice
> Documentation=man:systemd.special(7)
> Before=slices.target
> 
> [Slice]
> CPUQuota=150%
> AllowedCPUs=0
> MemoryAccounting=true
> MemoryMax=80%
> ManagedOOMSwap=kill
> ManagedOOMMemoryPressure=kill
> 
> [root@localhost ~]# cat /lib/systemd/system/user.slice 
> # Copyright (C) 2024 Codethink Limited
> # SPDX-License-Identifier: GPL-2.0-only
> 
> #
> # This slice will control all processes started by systemd-logind
> #
> 
> [Unit]
> Description=User and Session Slice
> Documentation=man:systemd.special(7)
> Before=slices.target
> 
> [Slice]
> CPUQuota=25%
> AllowedCPUs=0
> MemoryAccounting=true
> MemoryMax=80%
> ManagedOOMSwap=kill
> ManagedOOMMemoryPressure=kill
> 
> > However, this does not work because systemd is not creating an
> > isolated cpuset... So, the root domain still contains CPUs 0-3, and
> > the "custom-workload.slice" cpuset only has CPU 1. Hence, the check
> >                         /*
> >                          * Don't allow tasks with an affinity mask
> > smaller than
> >                          * the entire root_domain to become
> > SCHED_DEADLINE. We
> >                          * will also fail if there's no bandwidth
> > available. */
> >                         if (!cpumask_subset(span, p->cpus_ptr) ||
> >                             rq->rd->dl_bw.bw == 0) {
> >                                 retval = -EPERM;
> >                                 goto unlock;
> >                         }
> > in sched_setsched() fails.
> > 
> > 
> > How are you configuring the cpusets?  
> 
> See above.
> 
> > Also, which kernel version are you using?
> > (sorry if you already posted this information in previous emails
> > and I am missing something obvious)  
> 
> Not even sure, whether I explicitly mentioned that other than that we
> are always running latest stable.
> 
> Two months ago when we last run some extensive tests on this it was
> actually v6.13.6.
> 
> > 			Thanks,  
> 
> Thank you!
> 
> > 				Luca  
> 
> Cheers
> 
> Marcel


  parent reply	other threads:[~2025-05-30  9:21 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28 18:04 SCHED_DEADLINE tasks missing their deadline with SCHED_FLAG_RECLAIM jobs in the mix (using GRUB) Marcel Ziswiler
2025-05-02 13:55 ` Juri Lelli
2025-05-02 14:10   ` luca abeni
2025-05-03 13:14     ` Marcel Ziswiler
2025-05-05 15:53       ` luca abeni
2025-05-03 11:14   ` Marcel Ziswiler
2025-05-07 20:25     ` luca abeni
2025-05-19 13:32       ` Marcel Ziswiler
2025-05-20 16:09         ` luca abeni
2025-05-21  9:59           ` Marcel Ziswiler
2025-05-23 19:46         ` luca abeni
2025-05-25 19:29           ` Marcel Ziswiler
2025-05-29  9:39             ` Juri Lelli
2025-06-02 14:59               ` Marcel Ziswiler
2025-06-17 12:21                 ` Juri Lelli
2025-06-18 11:24                   ` Marcel Ziswiler
2025-06-20  9:29                     ` Juri Lelli
2025-06-20  9:37                       ` luca abeni
2025-06-20  9:58                         ` Juri Lelli
2025-06-20 14:16                         ` luca abeni
2025-06-20 15:28                           ` Juri Lelli
2025-06-20 16:52                             ` luca abeni
2025-06-24  7:49                               ` Juri Lelli
2025-06-24 12:59                                 ` Juri Lelli
2025-06-24 15:00                                   ` luca abeni
2025-06-25  9:30                                     ` Juri Lelli
2025-06-25 10:11                                       ` Juri Lelli
2025-06-25 12:50                                         ` luca abeni
2025-06-26 10:59                                           ` Marcel Ziswiler
2025-06-26 11:45                                             ` Juri Lelli
2025-06-25 15:55                                   ` Marcel Ziswiler
2025-06-24 13:36                               ` luca abeni
2025-05-30  9:21             ` luca abeni [this message]
2025-06-03 11:18               ` Marcel Ziswiler
2025-06-06 13:16                 ` luca abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250530112108.63a24cde@luca64 \
    --to=luca.abeni@santannapisa.it \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcel.ziswiler@codethink.co.uk \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vineeth@bitbyteword.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox