From: Tejun Heo <tj@kernel.org>
To: torvalds@linux-foundation.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org,
joshdon@google.com, brho@google.com, pjt@google.com,
derkling@google.com, haoluo@google.com, dvernet@meta.com,
dschatzberg@meta.com, dskarlat@cs.cmu.edu, riel@surriel.com,
changwoo@igalia.com, himadrics@inria.fr, memxor@gmail.com
Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
kernel-team@meta.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 03/36] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork()
Date: Fri, 10 Nov 2023 16:47:29 -1000 [thread overview]
Message-ID: <20231111024835.2164816-4-tj@kernel.org> (raw)
In-Reply-To: <20231111024835.2164816-1-tj@kernel.org>
A new BPF extensible sched_class will need more control over the forking
process. It wants to be able to fail from sched_cgroup_fork() after the new
task's sched_task_group is initialized so that the loaded BPF program can
prepare the task with its cgroup association is established and reject fork
if e.g. allocation fails.
Allow sched_cgroup_fork() to fail by making it return int instead of void
and adding sched_cancel_fork() to undo sched_fork() in the error path.
sched_cgroup_fork() doesn't fail yet and this patch shouldn't cause any
behavior changes.
v2: Patch description updated to detail the expected use.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: David Vernet <dvernet@meta.com>
Acked-by: Josh Don <joshdon@google.com>
Acked-by: Hao Luo <haoluo@google.com>
Acked-by: Barret Rhoden <brho@google.com>
---
include/linux/sched/task.h | 3 ++-
kernel/fork.c | 15 ++++++++++-----
kernel/sched/core.c | 8 +++++++-
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index a23af225c898..03d35e3eda3c 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -61,7 +61,8 @@ extern asmlinkage void schedule_tail(struct task_struct *prev);
extern void init_idle(struct task_struct *idle, int cpu);
extern int sched_fork(unsigned long clone_flags, struct task_struct *p);
-extern void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs);
+extern int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs);
+extern void sched_cancel_fork(struct task_struct *p);
extern void sched_post_fork(struct task_struct *p);
extern void sched_dead(struct task_struct *p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 204bc1c33533..38c5d142c691 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2472,7 +2472,7 @@ __latent_entropy struct task_struct *copy_process(
retval = perf_event_init_task(p, clone_flags);
if (retval)
- goto bad_fork_cleanup_policy;
+ goto bad_fork_sched_cancel_fork;
retval = audit_alloc(p);
if (retval)
goto bad_fork_cleanup_perf;
@@ -2604,7 +2604,9 @@ __latent_entropy struct task_struct *copy_process(
* cgroup specific, it unconditionally needs to place the task on a
* runqueue.
*/
- sched_cgroup_fork(p, args);
+ retval = sched_cgroup_fork(p, args);
+ if (retval)
+ goto bad_fork_cancel_cgroup;
/*
* From this point on we must avoid any synchronous user-space
@@ -2650,13 +2652,13 @@ __latent_entropy struct task_struct *copy_process(
/* Don't start children in a dying pid namespace */
if (unlikely(!(ns_of_pid(pid)->pid_allocated & PIDNS_ADDING))) {
retval = -ENOMEM;
- goto bad_fork_cancel_cgroup;
+ goto bad_fork_core_free;
}
/* Let kill terminate clone/fork in the middle */
if (fatal_signal_pending(current)) {
retval = -EINTR;
- goto bad_fork_cancel_cgroup;
+ goto bad_fork_core_free;
}
/* No more failure paths after this point. */
@@ -2732,10 +2734,11 @@ __latent_entropy struct task_struct *copy_process(
return p;
-bad_fork_cancel_cgroup:
+bad_fork_core_free:
sched_core_free(p);
spin_unlock(¤t->sighand->siglock);
write_unlock_irq(&tasklist_lock);
+bad_fork_cancel_cgroup:
cgroup_cancel_fork(p, args);
bad_fork_put_pidfd:
if (clone_flags & CLONE_PIDFD) {
@@ -2774,6 +2777,8 @@ __latent_entropy struct task_struct *copy_process(
audit_free(p);
bad_fork_cleanup_perf:
perf_event_free_task(p);
+bad_fork_sched_cancel_fork:
+ sched_cancel_fork(p);
bad_fork_cleanup_policy:
lockdep_free_task(p);
#ifdef CONFIG_NUMA
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c0ef271a31a..b5fb4b894477 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4784,7 +4784,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
return 0;
}
-void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs)
+int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs)
{
unsigned long flags;
@@ -4811,6 +4811,12 @@ void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs)
if (p->sched_class->task_fork)
p->sched_class->task_fork(p);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+ return 0;
+}
+
+void sched_cancel_fork(struct task_struct *p)
+{
}
void sched_post_fork(struct task_struct *p)
--
2.42.0
next prev parent reply other threads:[~2023-11-11 2:49 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-11 2:47 [PATCHSET v5] sched: Implement BPF extensible scheduler class Tejun Heo
2023-11-11 2:47 ` [PATCH 01/36] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2023-11-11 2:47 ` [PATCH 02/36] sched: Restructure sched_class order sanity checks in sched_init() Tejun Heo
2023-11-11 2:47 ` Tejun Heo [this message]
2023-11-11 2:47 ` [PATCH 04/36] sched: Add sched_class->reweight_task() Tejun Heo
2023-11-11 2:47 ` [PATCH 05/36] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2023-11-11 2:47 ` [PATCH 06/36] sched: Factor out cgroup weight conversion functions Tejun Heo
2023-11-11 2:47 ` [PATCH 07/36] sched: Expose css_tg() and __setscheduler_prio() Tejun Heo
2023-11-11 2:47 ` [PATCH 08/36] sched: Enumerate CPU cgroup file types Tejun Heo
2023-11-11 2:47 ` [PATCH 09/36] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2023-11-11 2:47 ` [PATCH 10/36] sched: Add normal_policy() Tejun Heo
2023-11-11 2:47 ` [PATCH 11/36] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2023-11-11 2:47 ` [PATCH 12/36] sched_ext: Implement BPF " Tejun Heo
2023-11-13 13:34 ` Changwoo Min
2023-11-14 19:07 ` Tejun Heo
2023-11-13 20:04 ` Andrea Righi
2023-11-14 19:07 ` Tejun Heo
2023-11-23 8:07 ` Andrea Righi
2023-11-25 19:59 ` Tejun Heo
2023-11-26 9:05 ` Andrea Righi
2023-12-07 2:04 ` [PATCH] scx: set p->scx.ops_state using atomic_long_set_release Changwoo Min
2023-12-08 0:16 ` Tejun Heo
2024-03-23 2:37 ` [PATCH 12/36] sched_ext: Implement BPF extensible scheduler class Joel Fernandes
2024-03-23 22:12 ` Tejun Heo
2024-04-25 21:28 ` Joel Fernandes
2024-04-26 16:57 ` Barret Rhoden
2024-04-26 21:58 ` Tejun Heo
2023-11-11 2:47 ` [PATCH 13/36] sched_ext: Add scx_simple and scx_example_qmap example schedulers Tejun Heo
2023-11-12 4:17 ` kernel test robot
2023-11-12 18:06 ` Tejun Heo
2023-11-11 2:47 ` [PATCH 14/36] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2023-11-11 2:47 ` [PATCH 15/36] sched_ext: Implement runnable task stall watchdog Tejun Heo
2023-11-11 2:47 ` [PATCH 16/36] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2023-11-11 2:47 ` [PATCH 17/36] sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext Tejun Heo
2023-11-11 2:47 ` [PATCH 18/36] sched_ext: Print sched_ext info when dumping stack Tejun Heo
2023-11-14 19:23 ` [PATCH v2 " Tejun Heo
2023-11-11 2:47 ` [PATCH 19/36] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2023-11-11 2:47 ` [PATCH 20/36] sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU Tejun Heo
2023-11-11 2:47 ` [PATCH 21/36] sched_ext: Make watchdog handle ops.dispatch() looping stall Tejun Heo
2023-11-11 2:47 ` [PATCH 22/36] sched_ext: Add task state tracking operations Tejun Heo
2023-11-11 2:47 ` [PATCH 23/36] sched_ext: Implement tickless support Tejun Heo
2023-11-11 2:47 ` [PATCH 24/36] sched_ext: Track tasks that are subjects of the in-flight SCX operation Tejun Heo
2023-11-11 2:47 ` [PATCH 25/36] sched_ext: Add cgroup support Tejun Heo
2023-11-11 2:47 ` [PATCH 26/36] sched_ext: Add a cgroup-based core-scheduling scheduler Tejun Heo
2023-11-11 2:47 ` [PATCH 27/36] sched_ext: Add a cgroup scheduler which uses flattened hierarchy Tejun Heo
2023-11-11 2:47 ` [PATCH 28/36] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2023-11-11 2:47 ` [PATCH 29/36] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2023-11-11 2:47 ` [PATCH 30/36] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2023-11-11 2:47 ` [PATCH 31/36] sched_ext: Implement core-sched support Tejun Heo
2023-11-11 2:47 ` [PATCH 32/36] sched_ext: Add vtime-ordered priority queue to dispatch_q's Tejun Heo
2023-11-11 2:47 ` [PATCH 33/36] sched_ext: Documentation: scheduler: Document extensible scheduler class Tejun Heo
2023-11-11 2:48 ` [PATCH 34/36] sched_ext: Add a basic, userland vruntime scheduler Tejun Heo
2023-11-11 2:48 ` [PATCH 35/36] sched_ext: Add scx_rusty, a rust userspace hybrid scheduler Tejun Heo
2023-11-11 2:48 ` [PATCH 36/36] sched_ext: Add scx_layered, a highly configurable multi-layer scheduler Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231111024835.2164816-4-tj@kernel.org \
--to=tj@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=daniel@iogearbox.net \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dschatzberg@meta.com \
--cc=dskarlat@cs.cmu.edu \
--cc=dvernet@meta.com \
--cc=haoluo@google.com \
--cc=himadrics@inria.fr \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.