* [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
@ 2026-05-28 9:52 Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 01/13] exec: factor argument setup out of do_execveat_common() Li Chen
` (15 more replies)
0 siblings, 16 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan
Hi,
This is an early RFC for an idea that is probably still rough in both the
UAPI and implementation details. Sorry for the rough edges; I am sending
it now to check whether this direction is worth pursuing and to get
feedback on the kernel/userspace boundary.
The series is based on linux-next version 20260518.
This RFC adds spawn_template, a userspace-controlled exec acceleration
mechanism for runtimes that repeatedly start the same executable with
different argv, envp, and per-spawn file descriptor setup.
The main target is agent runtimes. Modern coding agents repeatedly start
short-lived helper tools such as rg, git, sed, awk, python, node, and
shell wrappers while they inspect and edit a workspace. Those runtimes
already know which tools are hot, and they are also the right place to
decide policy. The kernel does not choose names such as rg, git, or sed.
Userspace opts in by creating a template fd for one executable, then uses
that fd for later spawns. Launchers, shells, and build systems have a
similar repeated-startup shape and could use the same primitive, but the
agent runtime case is the main motivation for this RFC.
The mechanism applies to the executable that userspace asks the kernel to
start. If an agent runtime directly starts /usr/bin/rg, the rg executable
is the template target. If the runtime starts /usr/bin/bash -c "rg ... |
head", the shell is the template target unless the shell itself opts in
when it starts rg and head. The kernel does not parse the shell command
string or rewrite inner commands into template spawns. Userspace has to
call spawn_template for those inner commands explicitly:
direct exec shell wrapper
----------- -------------
agent agent
template("/usr/bin/rg") template("/usr/bin/bash")
spawn rg argv spawn bash -c "rg ... | head"
kernel target: rg kernel target: bash
rg startup benefits rg/head need shell opt-in
Several agent runtime discussions are moving toward direct argv-style
exec tools for both security and policy clarity. For example, opencode
issue #2206 proposes an exec tool as a safer alternative to a shell-only
bash tool:
https://github.com/anomalyco/opencode/issues/2206
spawn_template is meant to support both models. Direct exec users can
cache the actual hot tool. Shell-wrapper users can cache the shell and
still reduce shell startup cost. If a shell or an agent runtime later
uses the same API for commands started inside a shell command, those
inner tools can benefit too.
Each spawn still goes through the normal exec path. The template reuses
only metadata that can be revalidated before use. Credential preparation,
permission checks, binary handler checks, secure-exec handling, and LSM
hooks remain on the normal execve path.
The UAPI has two operations. spawn_template_create() creates an
anonymous-inode template fd from either an executable fd or an absolute
executable path. spawn_template_spawn() starts one child from that
template, applies per-spawn fd, cwd, and signal actions, and returns both
pid and pidfd.
fd inheritance is deliberately conservative. By default, after the
requested per-spawn actions have run, the child closes fds above stderr.
An agent runtime can still request traditional inheritance explicitly,
but helper tools do not inherit unrelated secret files or sockets by
accident. The create-time actions fields are reserved and rejected in
this RFC because fd numbers are per-process state, not stable reusable
objects. The caller supplies fd actions for each spawn instead.
A typical agent runtime would keep one template per hot executable and
still build argv, envp, cwd, and pipe wiring for each tool call:
rg_tmpl = spawn_template_create("/usr/bin/rg");
for each search request:
out_r, out_w = pipe_cloexec();
err_r, err_w = pipe_cloexec();
actions = [
FCHDIR(worktree_fd),
DUP2(out_w, STDOUT_FILENO),
DUP2(err_w, STDERR_FILENO),
];
child = spawn_template_spawn(rg_tmpl, rg_argv, envp, actions);
close(out_w);
close(err_w);
read out_r and err_r;
waitid(P_PIDFD, child.pidfd, ...);
A shell-wrapper runtime would use the same shape with a template for
/usr/bin/bash and argv such as ["/usr/bin/bash", "-c", command]. That
reduces shell startup cost, but it does not cache rg or head inside that
command unless the shell also opts into spawn_template for commands it
starts internally.
The template pins the executable and denies writes to that file while the
template fd is alive, so cached executable metadata cannot race with a
writer changing the same inode. This means direct in-place writes to the
executable can fail while a runtime keeps a template open. It does not
block the common package-manager update pattern where a new inode is
written and then atomically renamed over the old path. In that case the
old path-created template becomes stale, spawn_template_spawn() rejects
it with ESTALE, and the runtime should close and recreate the template
for the new executable.
in-place write package-manager update
-------------- ----------------------
template pins old inode write new inode
write(old inode) denied rename(new, "/usr/bin/rg")
cached metadata safe old template sees path mismatch
spawn_template_spawn() = -ESTALE
recreate template for new inode
Each spawn revalidates executable identity before cached metadata is
used. Path-created templates only accept absolute paths: a relative path
such as ./tool depends on cwd, and the same string can name a different
file after chdir. For an absolute path template, each spawn reopens the
path and checks that it still resolves to the executable recorded when
the template was created. If the path now names a replaced file, the
template is stale and userspace should close and recreate it.
A template fd can be passed over SCM_RIGHTS like any other fd, but this
RFC does not treat that as delegation. spawn_template_spawn() only works
while the caller still has the same struct cred object that created the
template. If another task, or the same task after a credential change,
receives the fd, spawn fails instead of running the executable using the
creator's launch authority:
ordinary fd spawn_template fd
----------- -----------------
A: open log A: create rg template
A -> B: SCM_RIGHTS(fd) A -> B: SCM_RIGHTS(tfd)
B: read(fd) = ok B: spawn(tfd) = -EACCES
B: create own rg template
B: spawn(own_tfd) = ok
open-file use is delegated spawn authority is not delegated
The cached state is intentionally small. The template fd keeps the opened
main executable file, an optional absolute path string, the creator
credential pointer, and the deny-write state. The executable identity key
records device, inode, size, mode, owner, ctime, and mtime, and is
rechecked before cached metadata is used. The ELF cache keeps only the
main executable's ELF header, program header table, and program header
count.
cached in this RFC not cached in this RFC
------------------ ----------------------
opened main executable PT_INTERP metadata
executable identity key shared-library graph
main ELF header VMA layout metadata
main ELF program headers cross-process metadata sharing
creator cred pointer
deny-write state
This RFC does not cache ELF interpreter metadata, shared-library
dependency state, or derived mapping-layout state. Shared-library
resolution is dynamic linker policy and depends on LD_LIBRARY_PATH,
RPATH, RUNPATH, /etc/ld.so.cache, mount namespaces, and secure-exec
state. It also does not share cached executable metadata between template
fds created by different processes. Each template owns its small cached
metadata object in this RFC.
Performance
===========
The numbers below come from my separate local autogen-bench project.
autogen-bench uses AutoGen [1] Core as the agent harness: RoutedAgent
instances run under SingleThreadedAgentRuntime, and RPC-style dispatch
fans out concurrent tool-call requests to worker agents. The workload
definitions, generated test files, and subprocess/spawn_template backends
are local to autogen-bench.
The agent-tools preset includes direct tool calls and shell-wrapper forms
for:
rg, grep, sed, awk, cat, head, tail, find, stat, ls, git-status, git-diff,
python-small, node-small, sh-c, and bash-c.
The benchmark is launch-heavy but not no-op: it searches generated
Python-like source files, reads sample files, runs small Python and
Node.js programs, and runs git status and git diff in a small repository.
It does not include model inference or long-running tool work, so the
numbers mainly describe the short-tool regime.
The subprocess column starts each tool call through the existing
userspace launch path. The spawn_template column creates templates for
hot executables and uses spawn_template_spawn() for later calls.
Total in-flight tool calls stay at 16; only the worker-process split
changes. For example, 4x4 means 4 worker processes with 4 in-flight tool
calls each. The two time_s values are subprocess/spawn_template wall
times.
Workload Calls subprocess spawn_template time_s Delta
(workers) calls calls/s calls/s seconds
1x16 6144 411.04 420.32 14.95/14.62 +2.26%
2x8 6144 666.78 690.08 9.21/8.90 +3.49%
4x4 6144 955.61 1003.25 6.43/6.12 +4.99%
8x2 6144 1048.25 1069.18 5.86/5.75 +2.00%
The table measures the whole mixed workload, including both process
startup and the short tool work done after exec. Since this workload is
launch-heavy, the possible launch-side savings include:
- the template fd keeps an opened executable, avoiding repeated ordinary
open/path setup for that executable;
- the kernel can reuse cached main-executable ELF header and program
header metadata after revalidation;
- the fork-and-exec-style launch is submitted as one
spawn_template_spawn() operation;
- fd, cwd, and signal actions run in the child kernel path instead of
being driven one syscall at a time by userspace child glue;
- pid and pidfd are returned by the same operation, reducing some
runtime-side bookkeeping.
In local experiments before this RFC, I also tried caching ELF
interpreter metadata and derived ELF mapping-layout metadata. A focused
repeated-exec benchmark did not show a stable standalone throughput gain
for those two optimizations, so this RFC leaves them out and keeps only
the main executable metadata cache.
I also tried sharing main-executable ELF metadata across template fds
created by different processes for the same executable identity. That can
reduce duplicated metadata memory when many agent worker processes create
their own templates for /usr/bin/rg, /usr/bin/git, and similar tools, but
it did not show a stable throughput win in local multi-agent tests. It
also adds cache keying, lifetime, invalidation, credential, and namespace
questions to the RFC. This version therefore keeps per-template metadata
ownership and leaves cross-process sharing out.
Sorry again for the rough edges in this RFC. I would appreciate feedback
on whether this direction is useful and what the right API boundary
should be.
Thanks,
Li
[1]: https://github.com/microsoft/autogen
Li Chen (13):
exec: factor argument setup out of do_execveat_common()
exec: add an internal helper for opened executables
file: expose helpers for in-kernel fd actions
exec: add spawn template UAPI definitions
exec: add spawn template file descriptors
exec: add spawn_template_spawn()
exec: validate spawn template executable identity
binfmt_elf: cache ELF metadata for spawn templates
Documentation: describe spawn templates
exec: require absolute paths for path-created templates
exec: let close-range actions target the max fd
syscalls: add generic spawn template entries
selftests/exec: cover spawn template basics
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/spawn_template.rst | 153 +++
MAINTAINERS | 6 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +-
fs/Makefile | 2 +-
fs/binfmt_elf.c | 104 +-
fs/exec.c | 162 ++-
fs/file.c | 11 +-
fs/spawn_template.c | 619 +++++++++++
include/linux/binfmts.h | 10 +
include/linux/fdtable.h | 2 +
include/linux/spawn_template.h | 72 ++
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/spawn_template.h | 62 ++
scripts/syscall.tbl | 2 +
tools/testing/selftests/exec/Makefile | 1 +
tools/testing/selftests/exec/spawn_template.c | 997 ++++++++++++++++++
18 files changed, 2179 insertions(+), 42 deletions(-)
create mode 100644 Documentation/userspace-api/spawn_template.rst
create mode 100644 fs/spawn_template.c
create mode 100644 include/linux/spawn_template.h
create mode 100644 include/uapi/linux/spawn_template.h
create mode 100644 tools/testing/selftests/exec/spawn_template.c
--
2.52.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC PATCH v1 01/13] exec: factor argument setup out of do_execveat_common()
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 02/13] exec: add an internal helper for opened executables Li Chen
` (14 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Move the common userspace argv and envp counting and stack setup code
into do_execveat_common_bprm(). Keep do_execveat_common() responsible
for the existing RLIMIT_NPROC check, bprm allocation, and error path.
This is a mechanical refactor for later opened-file exec users. It
does not change execve or execveat behavior.
Signed-off-by: Li Chen <me@linux.beauty>
---
fs/exec.c | 53 +++++++++++++++++++++++++++++++----------------------
1 file changed, 31 insertions(+), 22 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 2889b7cf808d7..53f7b18d2b1ea 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1775,31 +1775,12 @@ static int bprm_execve(struct linux_binprm *bprm)
return retval;
}
-static int do_execveat_common(int fd, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp,
- int flags)
+static int do_execveat_common_bprm(struct linux_binprm *bprm,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp)
{
int retval;
- /*
- * We move the actual failure in case of RLIMIT_NPROC excess from
- * set*uid() to execve() because too many poorly written programs
- * don't check setuid() return code. Here we additionally recheck
- * whether NPROC limit is still exceeded.
- */
- if ((current->flags & PF_NPROC_EXCEEDED) &&
- is_rlimit_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)))
- return -EAGAIN;
-
- /* We're below the limit (still or again), so we don't want to make
- * further execve() calls fail. */
- current->flags &= ~PF_NPROC_EXCEEDED;
-
- CLASS(bprm, bprm)(fd, filename, flags);
- if (IS_ERR(bprm))
- return PTR_ERR(bprm);
-
retval = count(argv, MAX_ARG_STRINGS);
if (retval < 0)
return retval;
@@ -1846,6 +1827,34 @@ static int do_execveat_common(int fd, struct filename *filename,
return bprm_execve(bprm);
}
+static int do_execveat_common(int fd, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags)
+{
+ /*
+ * We move the actual failure in case of RLIMIT_NPROC excess from
+ * set*uid() to execve() because too many poorly written programs
+ * don't check setuid() return code. Here we additionally recheck
+ * whether NPROC limit is still exceeded.
+ */
+ if ((current->flags & PF_NPROC_EXCEEDED) &&
+ is_rlimit_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)))
+ return -EAGAIN;
+
+ /*
+ * We're below the limit (still or again), so we don't want to make
+ * further execve() calls fail.
+ */
+ current->flags &= ~PF_NPROC_EXCEEDED;
+
+ CLASS(bprm, bprm)(fd, filename, flags);
+ if (IS_ERR(bprm))
+ return PTR_ERR(bprm);
+
+ return do_execveat_common_bprm(bprm, argv, envp);
+}
+
int kernel_execve(const char *kernel_filename,
const char *const *argv, const char *const *envp)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 02/13] exec: add an internal helper for opened executables
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 01/13] exec: factor argument setup out of do_execveat_common() Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 03/13] file: expose helpers for in-kernel fd actions Li Chen
` (13 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Split alloc_bprm_file() from alloc_bprm() so internal callers can build
a linux_binprm from an executable file that they already opened.
Add kernel_execveat_file() for in-kernel users that need to execute an
opened file while still using the normal execve credential, LSM, and
binary-format path.
Signed-off-by: Li Chen <me@linux.beauty>
---
fs/exec.c | 78 +++++++++++++++++++++++++++++++++++------
include/linux/binfmts.h | 4 +++
2 files changed, 71 insertions(+), 11 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 53f7b18d2b1ea..5b91a9b208a77 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1392,16 +1392,13 @@ static void free_bprm(struct linux_binprm *bprm)
kfree(bprm);
}
-static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags)
+static struct linux_binprm *alloc_bprm_file(struct file *file,
+ struct filename *filename,
+ int fd, int flags)
{
struct linux_binprm *bprm;
- struct file *file;
int retval = -ENOMEM;
- file = do_open_execat(fd, filename, flags);
- if (IS_ERR(file))
- return ERR_CAST(file);
-
bprm = kzalloc_obj(*bprm);
if (!bprm) {
do_close_execat(file);
@@ -1463,6 +1460,17 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
return ERR_PTR(retval);
}
+static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags)
+{
+ struct file *file;
+
+ file = do_open_execat(fd, filename, flags);
+ if (IS_ERR(file))
+ return ERR_CAST(file);
+
+ return alloc_bprm_file(file, filename, fd, flags);
+}
+
DEFINE_CLASS(bprm, struct linux_binprm *, if (!IS_ERR(_T)) free_bprm(_T),
alloc_bprm(fd, name, flags), int fd, struct filename *name, int flags)
@@ -1901,6 +1909,59 @@ int kernel_execve(const char *kernel_filename,
return bprm_execve(bprm);
}
+static inline struct user_arg_ptr native_arg(const char __user *const __user *p)
+{
+ return (struct user_arg_ptr){.ptr.native = p};
+}
+
+static int do_execveat_file_common(struct file *file, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp, int flags)
+{
+ struct linux_binprm *bprm;
+ struct file *exec_file;
+ int retval;
+
+ if (flags & ~AT_EMPTY_PATH)
+ return -EINVAL;
+
+ if ((current->flags & PF_NPROC_EXCEEDED) &&
+ is_rlimit_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)))
+ return -EAGAIN;
+
+ current->flags &= ~PF_NPROC_EXCEEDED;
+
+ retval = exe_file_deny_write_access(file);
+ if (retval)
+ return retval;
+ exec_file = get_file(file);
+
+ bprm = alloc_bprm_file(exec_file, filename, AT_FDCWD, flags);
+ if (IS_ERR(bprm))
+ return PTR_ERR(bprm);
+
+ retval = do_execveat_common_bprm(bprm, argv, envp);
+ free_bprm(bprm);
+ return retval;
+}
+
+int kernel_execveat_file(struct file *file, const char *filename,
+ const void __user *argv,
+ const void __user *envp,
+ int flags)
+{
+ const char __user *const __user *user_argv;
+ const char __user *const __user *user_envp;
+
+ CLASS(filename_kernel, name)(filename);
+
+ user_argv = (const char __user *const __user *)argv;
+ user_envp = (const char __user *const __user *)envp;
+
+ return do_execveat_file_common(file, name, native_arg(user_argv),
+ native_arg(user_envp), flags);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
@@ -1925,11 +1986,6 @@ void set_dumpable(struct mm_struct *mm, int value)
__mm_flags_set_mask_dumpable(mm, value);
}
-static inline struct user_arg_ptr native_arg(const char __user *const __user *p)
-{
- return (struct user_arg_ptr){.ptr.native = p};
-}
-
SYSCALL_DEFINE3(execve,
const char __user *, filename,
const char __user *const __user *, argv,
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 65abd5ab8836c..c0715678c9a06 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -141,6 +141,10 @@ extern int transfer_args_to_stack(struct linux_binprm *bprm,
unsigned long *sp_location);
extern int bprm_change_interp(const char *interp, struct linux_binprm *bprm);
int copy_string_kernel(const char *arg, struct linux_binprm *bprm);
+int kernel_execveat_file(struct file *file, const char *filename,
+ const void __user *argv,
+ const void __user *envp,
+ int flags);
extern void set_binfmt(struct linux_binfmt *new);
extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 03/13] file: expose helpers for in-kernel fd actions
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 01/13] exec: factor argument setup out of do_execveat_common() Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 02/13] exec: add an internal helper for opened executables Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 04/13] exec: add spawn template UAPI definitions Li Chen
` (12 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Split do_close_range() from the close_range syscall wrapper and make
ksys_dup3() available to in-kernel callers. Later spawn-template fd
actions use these helpers instead of duplicating close and dup logic.
Signed-off-by: Li Chen <me@linux.beauty>
---
fs/file.c | 11 ++++++++---
include/linux/fdtable.h | 2 ++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index e5c75b22e0c7c..a9f4b4e2dcd45 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -815,8 +815,7 @@ static inline void __range_close(struct files_struct *files, unsigned int fd,
* from @fd up to and including @max_fd are closed.
* Currently, errors to close a given file descriptor are ignored.
*/
-SYSCALL_DEFINE3(close_range, unsigned int, fd, unsigned int, max_fd,
- unsigned int, flags)
+int do_close_range(unsigned int fd, unsigned int max_fd, unsigned int flags)
{
struct task_struct *me = current;
struct files_struct *cur_fds = me->files, *fds = NULL;
@@ -867,6 +866,12 @@ SYSCALL_DEFINE3(close_range, unsigned int, fd, unsigned int, max_fd,
return 0;
}
+SYSCALL_DEFINE3(close_range, unsigned int, fd, unsigned int, max_fd,
+ unsigned int, flags)
+{
+ return do_close_range(fd, max_fd, flags);
+}
+
/**
* file_close_fd - return file associated with fd
* @fd: file descriptor to retrieve file for
@@ -1421,7 +1426,7 @@ int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags)
return new_fd;
}
-static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
+int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
{
int err = -EBADF;
struct file *file;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index c45306a9f0072..7f852fcc082a4 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -112,6 +112,8 @@ int iterate_fd(struct files_struct *, unsigned,
extern int close_fd(unsigned int fd);
extern struct file *file_close_fd(unsigned int fd);
+int do_close_range(unsigned int fd, unsigned int max_fd, unsigned int flags);
+int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags);
extern struct kmem_cache *files_cachep;
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 04/13] exec: add spawn template UAPI definitions
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (2 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 03/13] file: expose helpers for in-kernel fd actions Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 05/13] exec: add spawn template file descriptors Li Chen
` (11 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Add the userspace ABI structures and flags for creating a spawn
template and spawning a process from it. The ABI carries argv, envp,
and per-spawn fd actions while leaving policy decisions in userspace.
Signed-off-by: Li Chen <me@linux.beauty>
---
MAINTAINERS | 1 +
include/uapi/linux/spawn_template.h | 62 +++++++++++++++++++++++++++++
2 files changed, 63 insertions(+)
create mode 100644 include/uapi/linux/spawn_template.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 3dd58a16f06a9..d7b1191e33ca0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9739,6 +9739,7 @@ F: include/linux/elf.h
F: include/uapi/linux/auxvec.h
F: include/uapi/linux/binfmts.h
F: include/uapi/linux/elf.h
+F: include/uapi/linux/spawn_template.h
F: kernel/fork.c
F: mm/vma_exec.c
F: tools/testing/selftests/exec/
diff --git a/include/uapi/linux/spawn_template.h b/include/uapi/linux/spawn_template.h
new file mode 100644
index 0000000000000..84f026fdf9090
--- /dev/null
+++ b/include/uapi/linux/spawn_template.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_SPAWN_TEMPLATE_H
+#define _UAPI_LINUX_SPAWN_TEMPLATE_H
+
+#include <linux/openat2.h>
+#include <linux/types.h>
+
+#define SPAWN_TEMPLATE_CREATE_CLOEXEC (1ULL << 0)
+#define SPAWN_TEMPLATE_SPAWN_INHERIT_FDS (1ULL << 0)
+
+enum spawn_template_action_type {
+ SPAWN_TEMPLATE_ACTION_CLOSE = 0,
+ SPAWN_TEMPLATE_ACTION_DUP2 = 1,
+ SPAWN_TEMPLATE_ACTION_FCHDIR = 2,
+ SPAWN_TEMPLATE_ACTION_OPEN = 3,
+ SPAWN_TEMPLATE_ACTION_CLOSE_RANGE = 4,
+ SPAWN_TEMPLATE_ACTION_SIGMASK = 5,
+ SPAWN_TEMPLATE_ACTION_SIGDEFAULT = 6,
+};
+
+struct spawn_template_action {
+ __u32 type;
+ __u32 flags;
+ __s32 fd;
+ __s32 newfd;
+ __aligned_u64 arg;
+};
+
+struct spawn_template_open {
+ __aligned_u64 path;
+ struct open_how how;
+};
+
+struct spawn_template_sigset {
+ __aligned_u64 sigset;
+ __u64 sigsetsize;
+};
+
+struct spawn_template_create_args {
+ __aligned_u64 flags;
+ __s32 execfd;
+ __u32 exec_flags;
+ __aligned_u64 filename;
+ __aligned_u64 actions;
+ __aligned_u64 actions_len;
+ __aligned_u64 reserved[4];
+};
+
+struct spawn_template_spawn_args {
+ __aligned_u64 flags;
+ __aligned_u64 pidfd;
+ __aligned_u64 argv;
+ __aligned_u64 envp;
+ __aligned_u64 actions;
+ __aligned_u64 actions_len;
+ __aligned_u64 reserved[4];
+};
+
+#define SPAWN_TEMPLATE_CREATE_ARGS_SIZE_VER0 72
+#define SPAWN_TEMPLATE_SPAWN_ARGS_SIZE_VER0 80
+
+#endif /* _UAPI_LINUX_SPAWN_TEMPLATE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 05/13] exec: add spawn template file descriptors
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (3 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 04/13] exec: add spawn template UAPI definitions Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 06/13] exec: add spawn_template_spawn() Li Chen
` (10 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Add spawn_template_create() and back each template with an anon-inode fd.
Creation records the per-template state that later spawns reuse: the opened
executable file, optional absolute path, creator credential, and deny-write
state. Keep write access denied until the template fd is released so cached
state cannot race with writers.
This patch only creates and releases template fds.
Spawning and ELF metadata caching are added separately.
Signed-off-by: Li Chen <me@linux.beauty>
---
MAINTAINERS | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 -
fs/Makefile | 2 +-
fs/spawn_template.c | 180 +++++++++++++++++++++++++
include/linux/syscalls.h | 3 +
5 files changed, 185 insertions(+), 2 deletions(-)
create mode 100644 fs/spawn_template.c
diff --git a/MAINTAINERS b/MAINTAINERS
index d7b1191e33ca0..d5441812825c3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9732,6 +9732,7 @@ F: Documentation/userspace-api/ELF.rst
F: fs/*binfmt_*.c
F: fs/Kconfig.binfmt
F: fs/exec.c
+F: fs/spawn_template.c
F: fs/tests/binfmt_*_kunit.c
F: fs/tests/exec_kunit.c
F: include/linux/binfmts.h
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 524155d655da1..d6c1667e8f3b8 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -396,7 +396,6 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
-
#
# Due to a historical design error, certain syscalls are numbered differently
# in x32 as compared to native x86_64. These syscalls have numbers 512-547.
diff --git a/fs/Makefile b/fs/Makefile
index ae1b07f9c6a0c..796eb4ae143e5 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -8,7 +8,7 @@
obj-y := open.o read_write.o file_table.o super.o \
- char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
+ char_dev.o stat.o exec.o spawn_template.o pipe.o namei.o fcntl.o \
ioctl.o readdir.o select.o dcache.o inode.o \
attr.o bad_inode.o file.o filesystems.o namespace.o \
seq_file.o xattr.o libfs.o fs-writeback.o \
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
new file mode 100644
index 0000000000000..280a1038cc45e
--- /dev/null
+++ b/fs/spawn_template.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/anon_inodes.h>
+#include <linux/cred.h>
+#include <linux/err.h>
+#include <linux/fcntl.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+#include <uapi/linux/spawn_template.h>
+
+#include "internal.h"
+
+#define SPAWN_TEMPLATE_MAX_ACTIONS 256
+
+struct spawn_template {
+ struct file *exec_file;
+ const struct cred *creator_cred;
+ char *filename;
+ bool deny_write;
+};
+
+static const struct file_operations spawn_template_fops;
+
+static bool spawn_template_file_exec_allowed(struct file *file)
+{
+ if (!S_ISREG(file_inode(file)->i_mode))
+ return false;
+ if (path_noexec(&file->f_path))
+ return false;
+ if (file_permission(file, MAY_EXEC))
+ return false;
+ return can_mmap_file(file);
+}
+
+static int spawn_template_release(struct inode *inode, struct file *file)
+{
+ struct spawn_template *tmpl = file->private_data;
+
+ if (tmpl->deny_write)
+ exe_file_allow_write_access(tmpl->exec_file);
+ fput(tmpl->exec_file);
+ put_cred(tmpl->creator_cred);
+ kfree(tmpl->filename);
+ kfree(tmpl);
+ return 0;
+}
+
+static const struct file_operations spawn_template_fops = {
+ .release = spawn_template_release,
+ .llseek = noop_llseek,
+};
+
+static int spawn_template_open_execfd(int execfd, struct file **file,
+ bool *deny_write)
+{
+ int ret;
+
+ if (execfd < 0)
+ return -EINVAL;
+
+ CLASS(fd, f)(execfd);
+ if (fd_empty(f))
+ return -EBADF;
+
+ if (!spawn_template_file_exec_allowed(fd_file(f)))
+ return -EACCES;
+
+ ret = exe_file_deny_write_access(fd_file(f));
+ if (ret)
+ return ret;
+
+ *file = get_file(fd_file(f));
+ *deny_write = true;
+ return 0;
+}
+
+static int spawn_template_open_filename(u64 filename, struct file **file,
+ char **path,
+ bool *deny_write)
+{
+ char *kfilename __free(kfree) = NULL;
+ struct file *exec __free(fput) = NULL;
+ struct file *tmp_file;
+ char *tmp;
+
+ if (!filename)
+ return -EINVAL;
+
+ tmp = strndup_user(u64_to_user_ptr(filename), PATH_MAX);
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);
+ kfilename = tmp;
+
+ tmp_file = open_exec(kfilename);
+ if (IS_ERR(tmp_file))
+ return PTR_ERR(tmp_file);
+ exec = tmp_file;
+ if (!spawn_template_file_exec_allowed(exec)) {
+ exe_file_allow_write_access(exec);
+ return -EACCES;
+ }
+
+ *file = no_free_ptr(exec);
+ *path = no_free_ptr(kfilename);
+ *deny_write = true;
+ return 0;
+}
+
+SYSCALL_DEFINE2(spawn_template_create,
+ struct spawn_template_create_args __user *, uargs,
+ size_t, usize)
+{
+ struct spawn_template_create_args args;
+ struct spawn_template *tmpl;
+ int fd_flags = 0;
+ int ret;
+
+ BUILD_BUG_ON(sizeof(struct spawn_template_create_args) !=
+ SPAWN_TEMPLATE_CREATE_ARGS_SIZE_VER0);
+
+ if (usize < SPAWN_TEMPLATE_CREATE_ARGS_SIZE_VER0)
+ return -EINVAL;
+ if (usize > PAGE_SIZE)
+ return -E2BIG;
+
+ ret = copy_struct_from_user(&args, sizeof(args), uargs, usize);
+ if (ret)
+ return ret;
+
+ if (args.flags & ~SPAWN_TEMPLATE_CREATE_CLOEXEC)
+ return -EINVAL;
+ if (args.exec_flags || args.reserved[0] || args.reserved[1] ||
+ args.reserved[2] || args.reserved[3])
+ return -EINVAL;
+ if (args.actions || args.actions_len)
+ return -EINVAL;
+ if ((args.execfd < 0 && !args.filename) ||
+ (args.execfd >= 0 && args.filename))
+ return -EINVAL;
+
+ tmpl = kzalloc_obj(*tmpl, GFP_KERNEL);
+ if (!tmpl)
+ return -ENOMEM;
+ tmpl->creator_cred = get_current_cred();
+
+ if (args.filename)
+ ret = spawn_template_open_filename(args.filename,
+ &tmpl->exec_file,
+ &tmpl->filename,
+ &tmpl->deny_write);
+ else
+ ret = spawn_template_open_execfd(args.execfd,
+ &tmpl->exec_file,
+ &tmpl->deny_write);
+ if (ret)
+ goto out_free_tmpl;
+
+ if (args.flags & SPAWN_TEMPLATE_CREATE_CLOEXEC)
+ fd_flags |= O_CLOEXEC;
+
+ ret = anon_inode_getfd("spawn_template", &spawn_template_fops, tmpl,
+ fd_flags);
+ if (ret < 0)
+ goto out_put_exec;
+
+ return ret;
+
+out_put_exec:
+ if (tmpl->deny_write)
+ exe_file_allow_write_access(tmpl->exec_file);
+ fput(tmpl->exec_file);
+out_free_tmpl:
+ put_cred(tmpl->creator_cred);
+ kfree(tmpl->filename);
+ kfree(tmpl);
+ return ret;
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index f3dfc3269188a..4b41950488bd6 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -67,6 +67,7 @@ struct rseq;
union bpf_attr;
struct io_uring_params;
struct clone_args;
+struct spawn_template_create_args;
struct open_how;
struct mount_attr;
struct landlock_ruleset_attr;
@@ -821,6 +822,8 @@ asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
#endif
asmlinkage long sys_clone3(struct clone_args __user *uargs, size_t size);
+asmlinkage long sys_spawn_template_create(struct spawn_template_create_args __user *uargs,
+ size_t size);
asmlinkage long sys_execve(const char __user *filename,
const char __user *const __user *argv,
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 06/13] exec: add spawn_template_spawn()
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (4 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 05/13] exec: add spawn template file descriptors Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 07/13] exec: validate spawn template executable identity Li Chen
` (9 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Add spawn_template_spawn() to start a child from a template fd. The child
uses the template's pinned executable file, runs per-spawn fd, cwd, and
signal actions, closes non-stdio fds by default, and then executes through
the normal opened-file exec path.
Return a pidfd for the child so userspace can wait or signal it without
racy pid reuse. Keep fd inheritance opt-in with
SPAWN_TEMPLATE_SPAWN_INHERIT_FDS.
This patch consumes cached template state but does not add ELF metadata
caching; executable identity and ELF metadata caching are added separately.
Signed-off-by: Li Chen <me@linux.beauty>
---
fs/spawn_template.c | 346 +++++++++++++++++++++++++++++++++++++++
include/linux/syscalls.h | 4 +
2 files changed, 350 insertions(+)
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
index 280a1038cc45e..8c3711929cffb 100644
--- a/fs/spawn_template.c
+++ b/fs/spawn_template.c
@@ -1,14 +1,24 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/anon_inodes.h>
+#include <linux/binfmts.h>
+#include <linux/close_range.h>
#include <linux/cred.h>
#include <linux/err.h>
#include <linux/fcntl.h>
+#include <linux/fdtable.h>
#include <linux/file.h>
#include <linux/fs.h>
+#include <linux/fs_struct.h>
#include <linux/kernel.h>
+#include <linux/namei.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
+#include <linux/signal.h>
#include <linux/slab.h>
+#include <linux/string.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
+#include <uapi/linux/openat2.h>
#include <uapi/linux/spawn_template.h>
#include "internal.h"
@@ -22,8 +32,262 @@ struct spawn_template {
bool deny_write;
};
+struct spawn_template_spawn_context {
+ struct spawn_template *tmpl;
+ struct spawn_template_spawn_args args;
+ struct spawn_template_action *actions;
+};
+
static const struct file_operations spawn_template_fops;
+static int spawn_template_exit_status(int err)
+{
+ switch (err) {
+ case -ENOENT:
+ return 127;
+ case -EACCES:
+ case -ENOEXEC:
+ return 126;
+ default:
+ return 1;
+ }
+}
+
+static bool spawn_template_cred_matches(struct spawn_template *tmpl)
+{
+ return current_cred() == tmpl->creator_cred;
+}
+
+static int spawn_template_copy_signal_set(const struct spawn_template_action *action,
+ sigset_t *mask)
+{
+ struct spawn_template_sigset sigset;
+
+ if (!action->arg)
+ return -EINVAL;
+ if (copy_from_user(&sigset, u64_to_user_ptr(action->arg),
+ sizeof(sigset)))
+ return -EFAULT;
+ if (sigset.sigsetsize != sizeof(sigset_t))
+ return -EINVAL;
+ if (copy_from_user(mask, u64_to_user_ptr(sigset.sigset), sizeof(*mask)))
+ return -EFAULT;
+ sigdelsetmask(mask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+
+ return 0;
+}
+
+static int spawn_template_apply_open(const struct spawn_template_action *action)
+{
+ struct spawn_template_open open;
+ struct file *file __free(fput) = NULL;
+ struct file *tmp;
+ struct open_flags op;
+ int ret;
+
+ if (action->fd < AT_FDCWD || action->newfd < 0 || action->flags ||
+ !action->arg)
+ return -EINVAL;
+
+ if (copy_from_user(&open, u64_to_user_ptr(action->arg), sizeof(open)))
+ return -EFAULT;
+
+ ret = build_open_flags(&open.how, &op);
+ if (ret)
+ return ret;
+
+ CLASS(filename_flags, name)(u64_to_user_ptr(open.path), op.lookup_flags);
+ tmp = do_file_open(action->fd, name, &op);
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);
+ file = tmp;
+
+ return replace_fd(action->newfd, file, open.how.flags & O_CLOEXEC);
+}
+
+static int spawn_template_apply_sigmask(const struct spawn_template_action *action)
+{
+ sigset_t mask;
+ int ret;
+
+ if (action->fd || action->newfd || action->flags)
+ return -EINVAL;
+
+ ret = spawn_template_copy_signal_set(action, &mask);
+ if (ret)
+ return ret;
+
+ set_current_blocked(&mask);
+ return 0;
+}
+
+static int spawn_template_apply_sigdefault(const struct spawn_template_action *action)
+{
+ sigset_t mask;
+ struct k_sigaction sa = {};
+ int ret;
+ int sig;
+
+ if (action->fd || action->newfd || action->flags)
+ return -EINVAL;
+
+ ret = spawn_template_copy_signal_set(action, &mask);
+ if (ret)
+ return ret;
+
+ sa.sa.sa_handler = SIG_DFL;
+ sigemptyset(&sa.sa.sa_mask);
+
+ for (sig = 1; sig < _NSIG; sig++) {
+ if (!sigismember(&mask, sig))
+ continue;
+ ret = do_sigaction(sig, &sa, NULL);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int spawn_template_apply_action(const struct spawn_template_action *action)
+{
+ switch (action->type) {
+ case SPAWN_TEMPLATE_ACTION_CLOSE:
+ return close_fd(action->fd);
+ case SPAWN_TEMPLATE_ACTION_DUP2:
+ if (action->fd == action->newfd) {
+ if (action->flags)
+ return -EINVAL;
+ CLASS(fd, f)(action->fd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ return 0;
+ }
+ return ksys_dup3(action->fd, action->newfd, action->flags);
+ case SPAWN_TEMPLATE_ACTION_FCHDIR: {
+ CLASS(fd, f)(action->fd);
+ int ret;
+
+ if (fd_empty(f))
+ return -EBADF;
+ if (!d_can_lookup(fd_file(f)->f_path.dentry))
+ return -ENOTDIR;
+
+ ret = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
+ if (!ret)
+ set_fs_pwd(current->fs, &fd_file(f)->f_path);
+ return ret;
+ }
+ case SPAWN_TEMPLATE_ACTION_OPEN:
+ return spawn_template_apply_open(action);
+ case SPAWN_TEMPLATE_ACTION_CLOSE_RANGE:
+ return do_close_range(action->fd, action->newfd, action->flags);
+ case SPAWN_TEMPLATE_ACTION_SIGMASK:
+ return spawn_template_apply_sigmask(action);
+ case SPAWN_TEMPLATE_ACTION_SIGDEFAULT:
+ return spawn_template_apply_sigdefault(action);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int spawn_template_copy_actions(struct spawn_template_action **out_actions,
+ u64 count, u64 uaddr)
+{
+ struct spawn_template_action __user *uactions;
+ struct spawn_template_action *actions __free(kfree) = NULL;
+ struct spawn_template_action *tmp;
+ u64 i;
+
+ *out_actions = NULL;
+ if (!count)
+ return 0;
+ if (count > SPAWN_TEMPLATE_MAX_ACTIONS)
+ return -E2BIG;
+ if (!uaddr)
+ return -EINVAL;
+
+ uactions = u64_to_user_ptr(uaddr);
+ tmp = memdup_array_user(uactions, count, sizeof(*actions));
+ if (IS_ERR(tmp))
+ return PTR_ERR(tmp);
+ actions = tmp;
+
+ for (i = 0; i < count; i++) {
+ switch (actions[i].type) {
+ case SPAWN_TEMPLATE_ACTION_CLOSE:
+ if (actions[i].fd < 0 || actions[i].flags ||
+ actions[i].newfd || actions[i].arg)
+ return -EINVAL;
+ break;
+ case SPAWN_TEMPLATE_ACTION_DUP2:
+ if (actions[i].fd < 0 || actions[i].newfd < 0 ||
+ (actions[i].flags & ~O_CLOEXEC) || actions[i].arg)
+ return -EINVAL;
+ break;
+ case SPAWN_TEMPLATE_ACTION_FCHDIR:
+ if (actions[i].fd < 0 || actions[i].flags ||
+ actions[i].newfd || actions[i].arg)
+ return -EINVAL;
+ break;
+ case SPAWN_TEMPLATE_ACTION_OPEN:
+ if (actions[i].fd < AT_FDCWD || actions[i].newfd < 0 ||
+ actions[i].flags || !actions[i].arg)
+ return -EINVAL;
+ break;
+ case SPAWN_TEMPLATE_ACTION_CLOSE_RANGE:
+ if (actions[i].fd < 0 || actions[i].newfd < 0 ||
+ actions[i].fd > actions[i].newfd ||
+ (actions[i].flags &
+ ~(CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC)) ||
+ actions[i].arg)
+ return -EINVAL;
+ break;
+ case SPAWN_TEMPLATE_ACTION_SIGMASK:
+ case SPAWN_TEMPLATE_ACTION_SIGDEFAULT:
+ if (actions[i].fd || actions[i].newfd ||
+ actions[i].flags || !actions[i].arg)
+ return -EINVAL;
+ break;
+ default:
+ return -EINVAL;
+ }
+ }
+
+ *out_actions = no_free_ptr(actions);
+ return 0;
+}
+
+static int spawn_template_child(void *data)
+{
+ struct spawn_template_spawn_context *ctx = data;
+ struct spawn_template *tmpl = ctx->tmpl;
+ int ret;
+ u64 i;
+
+ for (i = 0; i < ctx->args.actions_len; i++) {
+ ret = spawn_template_apply_action(&ctx->actions[i]);
+ if (ret < 0)
+ goto out_exec_error;
+ }
+
+ if (!(ctx->args.flags & SPAWN_TEMPLATE_SPAWN_INHERIT_FDS)) {
+ ret = do_close_range(3, ~0U, 0);
+ if (ret < 0)
+ goto out_exec_error;
+ }
+
+ ret = kernel_execveat_file(tmpl->exec_file, "",
+ u64_to_user_ptr(ctx->args.argv),
+ u64_to_user_ptr(ctx->args.envp),
+ AT_EMPTY_PATH);
+out_exec_error:
+ if (ret < 0)
+ do_exit(spawn_template_exit_status(ret));
+ return 0;
+}
+
static bool spawn_template_file_exec_allowed(struct file *file)
{
if (!S_ISREG(file_inode(file)->i_mode))
@@ -53,6 +317,18 @@ static const struct file_operations spawn_template_fops = {
.llseek = noop_llseek,
};
+static struct file *spawn_template_file_from_fd(int fd)
+{
+ CLASS(fd, f)(fd);
+
+ if (fd_empty(f))
+ return ERR_PTR(-EBADF);
+ if (fd_file(f)->f_op != &spawn_template_fops)
+ return ERR_PTR(-EINVAL);
+
+ return get_file(fd_file(f));
+}
+
static int spawn_template_open_execfd(int execfd, struct file **file,
bool *deny_write)
{
@@ -178,3 +454,73 @@ SYSCALL_DEFINE2(spawn_template_create,
kfree(tmpl);
return ret;
}
+
+SYSCALL_DEFINE3(spawn_template_spawn, int, template_fd,
+ struct spawn_template_spawn_args __user *, uargs,
+ size_t, usize)
+{
+ struct spawn_template_spawn_context *ctx;
+ struct kernel_clone_args kargs;
+ struct file *template_file;
+ int ret;
+
+ BUILD_BUG_ON(sizeof(struct spawn_template_spawn_args) !=
+ SPAWN_TEMPLATE_SPAWN_ARGS_SIZE_VER0);
+
+ if (usize < SPAWN_TEMPLATE_SPAWN_ARGS_SIZE_VER0)
+ return -EINVAL;
+ if (usize > PAGE_SIZE)
+ return -E2BIG;
+
+ template_file = spawn_template_file_from_fd(template_fd);
+ if (IS_ERR(template_file))
+ return PTR_ERR(template_file);
+
+ if (!spawn_template_cred_matches(template_file->private_data)) {
+ ret = -EACCES;
+ goto out_put_template;
+ }
+
+ ctx = kzalloc_obj(*ctx, GFP_KERNEL);
+ if (!ctx) {
+ ret = -ENOMEM;
+ goto out_put_template;
+ }
+
+ ctx->tmpl = template_file->private_data;
+
+ ret = copy_struct_from_user(&ctx->args, sizeof(ctx->args), uargs,
+ usize);
+ if (ret)
+ goto out_free_ctx;
+
+ if ((ctx->args.flags & ~SPAWN_TEMPLATE_SPAWN_INHERIT_FDS) ||
+ !ctx->args.pidfd || ctx->args.reserved[0] ||
+ ctx->args.reserved[1] || ctx->args.reserved[2] ||
+ ctx->args.reserved[3]) {
+ ret = -EINVAL;
+ goto out_free_ctx;
+ }
+
+ ret = spawn_template_copy_actions(&ctx->actions, ctx->args.actions_len,
+ ctx->args.actions);
+ if (ret)
+ goto out_free_ctx;
+
+ kargs = (struct kernel_clone_args) {
+ .flags = CLONE_VM | CLONE_VFORK | CLONE_PIDFD,
+ .pidfd = u64_to_user_ptr(ctx->args.pidfd),
+ .exit_signal = SIGCHLD,
+ .fn = spawn_template_child,
+ .fn_arg = ctx,
+ };
+
+ ret = kernel_clone(&kargs);
+
+ kfree(ctx->actions);
+out_free_ctx:
+ kfree(ctx);
+out_put_template:
+ fput(template_file);
+ return ret;
+}
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 4b41950488bd6..df7368edf6778 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -68,6 +68,7 @@ union bpf_attr;
struct io_uring_params;
struct clone_args;
struct spawn_template_create_args;
+struct spawn_template_spawn_args;
struct open_how;
struct mount_attr;
struct landlock_ruleset_attr;
@@ -824,6 +825,9 @@ asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
asmlinkage long sys_clone3(struct clone_args __user *uargs, size_t size);
asmlinkage long sys_spawn_template_create(struct spawn_template_create_args __user *uargs,
size_t size);
+asmlinkage long sys_spawn_template_spawn(int template_fd,
+ struct spawn_template_spawn_args __user *uargs,
+ size_t size);
asmlinkage long sys_execve(const char __user *filename,
const char __user *const __user *argv,
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 07/13] exec: validate spawn template executable identity
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (5 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 06/13] exec: add spawn_template_spawn() Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 08/13] binfmt_elf: cache ELF metadata for spawn templates Li Chen
` (8 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Record a conservative executable identity key when a template is created:
device, inode, size, mode, owner, ctime, and mtime. Recheck it before
each spawn. For path-created templates, also reopen the path so a replaced
executable cannot silently reuse the old template fd.
Reject stale templates with ESTALE. Keep the check conservative by also
rechecking that the file remains a regular executable mapping target.
Signed-off-by: Li Chen <me@linux.beauty>
---
MAINTAINERS | 1 +
fs/spawn_template.c | 75 ++++++++++++++++++++++++++++++++++
include/linux/spawn_template.h | 25 ++++++++++++
3 files changed, 101 insertions(+)
create mode 100644 include/linux/spawn_template.h
diff --git a/MAINTAINERS b/MAINTAINERS
index d5441812825c3..ea4134a188779 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9737,6 +9737,7 @@ F: fs/tests/binfmt_*_kunit.c
F: fs/tests/exec_kunit.c
F: include/linux/binfmts.h
F: include/linux/elf.h
+F: include/linux/spawn_template.h
F: include/uapi/linux/auxvec.h
F: include/uapi/linux/binfmts.h
F: include/uapi/linux/elf.h
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
index 8c3711929cffb..268f804227987 100644
--- a/fs/spawn_template.c
+++ b/fs/spawn_template.c
@@ -15,6 +15,7 @@
#include <linux/sched/task.h>
#include <linux/signal.h>
#include <linux/slab.h>
+#include <linux/spawn_template.h>
#include <linux/string.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
@@ -27,6 +28,7 @@
struct spawn_template {
struct file *exec_file;
+ struct spawn_template_file_key exec_key;
const struct cred *creator_cred;
char *filename;
bool deny_write;
@@ -40,6 +42,46 @@ struct spawn_template_spawn_context {
static const struct file_operations spawn_template_fops;
+static bool spawn_template_file_exec_allowed(struct file *file);
+
+void spawn_template_fill_file_key(struct file *file,
+ struct spawn_template_file_key *key)
+{
+ struct inode *inode = file_inode(file);
+ struct timespec64 ctime = inode_get_ctime(inode);
+ struct timespec64 mtime = inode_get_mtime(inode);
+
+ key->dev = inode->i_sb->s_dev;
+ key->ino = inode->i_ino;
+ key->size = i_size_read(inode);
+ key->mode = READ_ONCE(inode->i_mode);
+ key->uid = inode->i_uid;
+ key->gid = inode->i_gid;
+ key->ctime_sec = ctime.tv_sec;
+ key->ctime_nsec = ctime.tv_nsec;
+ key->mtime_sec = mtime.tv_sec;
+ key->mtime_nsec = mtime.tv_nsec;
+}
+
+bool spawn_template_file_key_matches(struct file *file,
+ const struct spawn_template_file_key *key)
+{
+ struct spawn_template_file_key cur;
+
+ spawn_template_fill_file_key(file, &cur);
+
+ return cur.dev == key->dev &&
+ cur.ino == key->ino &&
+ cur.size == key->size &&
+ cur.mode == key->mode &&
+ uid_eq(cur.uid, key->uid) &&
+ gid_eq(cur.gid, key->gid) &&
+ cur.ctime_sec == key->ctime_sec &&
+ cur.ctime_nsec == key->ctime_nsec &&
+ cur.mtime_sec == key->mtime_sec &&
+ cur.mtime_nsec == key->mtime_nsec;
+}
+
static int spawn_template_exit_status(int err)
{
switch (err) {
@@ -58,6 +100,32 @@ static bool spawn_template_cred_matches(struct spawn_template *tmpl)
return current_cred() == tmpl->creator_cred;
}
+static bool spawn_template_key_matches(struct spawn_template *tmpl)
+{
+ bool matches;
+
+ if (tmpl->filename) {
+ struct file *file __free(fput) = NULL;
+ struct file *tmp;
+
+ tmp = open_exec(tmpl->filename);
+ if (IS_ERR(tmp))
+ return false;
+ file = tmp;
+
+ matches = spawn_template_file_key_matches(file,
+ &tmpl->exec_key);
+ matches = matches && spawn_template_file_exec_allowed(file);
+ exe_file_allow_write_access(file);
+ if (!matches)
+ return false;
+ }
+
+ return spawn_template_file_exec_allowed(tmpl->exec_file) &&
+ spawn_template_file_key_matches(tmpl->exec_file,
+ &tmpl->exec_key);
+}
+
static int spawn_template_copy_signal_set(const struct spawn_template_action *action,
sigset_t *mask)
{
@@ -433,6 +501,7 @@ SYSCALL_DEFINE2(spawn_template_create,
&tmpl->deny_write);
if (ret)
goto out_free_tmpl;
+ spawn_template_fill_file_key(tmpl->exec_file, &tmpl->exec_key);
if (args.flags & SPAWN_TEMPLATE_CREATE_CLOEXEC)
fd_flags |= O_CLOEXEC;
@@ -507,6 +576,11 @@ SYSCALL_DEFINE3(spawn_template_spawn, int, template_fd,
if (ret)
goto out_free_ctx;
+ if (!spawn_template_key_matches(ctx->tmpl)) {
+ ret = -ESTALE;
+ goto out_free_actions;
+ }
+
kargs = (struct kernel_clone_args) {
.flags = CLONE_VM | CLONE_VFORK | CLONE_PIDFD,
.pidfd = u64_to_user_ptr(ctx->args.pidfd),
@@ -517,6 +591,7 @@ SYSCALL_DEFINE3(spawn_template_spawn, int, template_fd,
ret = kernel_clone(&kargs);
+out_free_actions:
kfree(ctx->actions);
out_free_ctx:
kfree(ctx);
diff --git a/include/linux/spawn_template.h b/include/linux/spawn_template.h
new file mode 100644
index 0000000000000..f14a7749fe55b
--- /dev/null
+++ b/include/linux/spawn_template.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SPAWN_TEMPLATE_H
+#define _LINUX_SPAWN_TEMPLATE_H
+
+#include <linux/fs.h>
+
+struct spawn_template_file_key {
+ dev_t dev;
+ ino_t ino;
+ loff_t size;
+ umode_t mode;
+ kuid_t uid;
+ kgid_t gid;
+ u64 ctime_sec;
+ u64 ctime_nsec;
+ u64 mtime_sec;
+ u64 mtime_nsec;
+};
+
+void spawn_template_fill_file_key(struct file *file,
+ struct spawn_template_file_key *key);
+bool spawn_template_file_key_matches(struct file *file,
+ const struct spawn_template_file_key *key);
+
+#endif /* _LINUX_SPAWN_TEMPLATE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 08/13] binfmt_elf: cache ELF metadata for spawn templates
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (6 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 07/13] exec: validate spawn template executable identity Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 09/13] Documentation: describe " Li Chen
` (7 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Spawn templates keep an opened executable and revalidate its file identity
before every spawn. Add an ELF-side template object for the main
executable.
It caches the executable identity key, ELF header, program header table,
and program header count so repeated spawns can reuse validated metadata.
Do not cache interpreter metadata, shared-library dependency state, or
derived mapping-layout state in this RFC.
Keep the normal exec security path intact. The child still executes through
bprm_execve(), credentials, permissions, and LSM hooks. This only avoids
rereading immutable main-executable metadata after template creation and
revalidation.
Signed-off-by: Li Chen <me@linux.beauty>
---
fs/binfmt_elf.c | 104 ++++++++++++++++++++++++++++++++-
fs/exec.c | 37 +++++++++++-
fs/spawn_template.c | 38 +++++++-----
include/linux/binfmts.h | 6 ++
include/linux/spawn_template.h | 47 +++++++++++++++
5 files changed, 213 insertions(+), 19 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 16a56b6b3f6ca..631dd029aeee7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -48,6 +48,7 @@
#include <linux/uaccess.h>
#include <uapi/linux/rseq.h>
#include <linux/rseq.h>
+#include <linux/spawn_template.h>
#include <asm/param.h>
#include <asm/page.h>
@@ -552,6 +553,89 @@ static struct elf_phdr *load_elf_phdrs(const struct elfhdr *elf_ex,
return elf_phdata;
}
+#if !ELF_COMPAT
+void spawn_exec_template_put(struct spawn_exec_template *tmpl)
+{
+ if (!tmpl)
+ return;
+ if (!refcount_dec_and_test(&tmpl->refcount))
+ return;
+ kfree(tmpl->exec_phdrs);
+ kfree(tmpl);
+}
+
+struct spawn_exec_template *
+spawn_exec_template_get(struct spawn_exec_template *tmpl)
+{
+ refcount_inc(&tmpl->refcount);
+ return tmpl;
+}
+
+bool spawn_exec_template_matches(struct spawn_exec_template *tmpl,
+ struct file *file)
+{
+ if (!tmpl)
+ return false;
+ if (!spawn_template_file_key_matches(file, &tmpl->exec_key))
+ return false;
+ if (!can_mmap_file(file))
+ return false;
+ return true;
+}
+
+int spawn_exec_template_create(struct file *file,
+ struct spawn_exec_template **out)
+{
+ struct spawn_exec_template *tmpl;
+ loff_t pos = 0;
+ ssize_t nread;
+ int retval;
+
+ *out = NULL;
+
+ tmpl = kzalloc_obj(*tmpl, GFP_KERNEL);
+ if (!tmpl)
+ return -ENOMEM;
+ refcount_set(&tmpl->refcount, 1);
+
+ spawn_template_fill_file_key(file, &tmpl->exec_key);
+
+ nread = kernel_read(file, &tmpl->exec_ehdr, sizeof(tmpl->exec_ehdr),
+ &pos);
+ if (nread < 0) {
+ retval = nread;
+ goto out_put_template;
+ }
+
+ retval = -ENOEXEC;
+ if (nread != sizeof(tmpl->exec_ehdr))
+ goto out_put_template;
+ if (memcmp(tmpl->exec_ehdr.e_ident, ELFMAG, SELFMAG) != 0)
+ goto out_put_template;
+ if (tmpl->exec_ehdr.e_type != ET_EXEC &&
+ tmpl->exec_ehdr.e_type != ET_DYN)
+ goto out_put_template;
+ if (!elf_check_arch(&tmpl->exec_ehdr))
+ goto out_put_template;
+ if (elf_check_fdpic(&tmpl->exec_ehdr))
+ goto out_put_template;
+ if (!can_mmap_file(file))
+ goto out_put_template;
+
+ tmpl->exec_phdrs = load_elf_phdrs(&tmpl->exec_ehdr, file);
+ if (!tmpl->exec_phdrs)
+ goto out_put_template;
+ tmpl->exec_phnum = tmpl->exec_ehdr.e_phnum;
+
+ *out = tmpl;
+ return 0;
+
+out_put_template:
+ spawn_exec_template_put(tmpl);
+ return retval;
+}
+#endif
+
#ifndef CONFIG_ARCH_BINFMT_ELF_STATE
/**
@@ -832,6 +916,7 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr,
static int load_elf_binary(struct linux_binprm *bprm)
{
struct file *interpreter = NULL; /* to shut gcc up */
+ struct spawn_exec_template *spawn_tmpl = bprm->spawn_template;
unsigned long load_bias = 0, phdr_addr = 0;
int first_pt_load = 1;
unsigned long error;
@@ -851,6 +936,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE;
struct mm_struct *mm;
struct pt_regs *regs;
+ bool use_spawn_tmpl = spawn_exec_template_matches(spawn_tmpl, bprm->file);
+ bool free_elf_phdata = true;
+
+ if (use_spawn_tmpl)
+ memcpy(bprm->buf, &spawn_tmpl->exec_ehdr,
+ sizeof(spawn_tmpl->exec_ehdr));
retval = -ENOEXEC;
/* First of all, some simple consistency checks */
@@ -866,7 +957,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
if (!can_mmap_file(bprm->file))
goto out;
- elf_phdata = load_elf_phdrs(elf_ex, bprm->file);
+ if (use_spawn_tmpl)
+ elf_phdata = spawn_tmpl->exec_phdrs;
+ else
+ elf_phdata = load_elf_phdrs(elf_ex, bprm->file);
+ if (use_spawn_tmpl)
+ free_elf_phdata = false;
if (!elf_phdata)
goto out;
@@ -1283,7 +1379,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
}
}
- kfree(elf_phdata);
+ if (free_elf_phdata)
+ kfree(elf_phdata);
set_binfmt(&elf_format);
@@ -1390,7 +1487,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
if (interpreter)
fput(interpreter);
out_free_ph:
- kfree(elf_phdata);
+ if (free_elf_phdata)
+ kfree(elf_phdata);
goto out;
}
diff --git a/fs/exec.c b/fs/exec.c
index 5b91a9b208a77..96b6f6274e0d3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1914,9 +1914,12 @@ static inline struct user_arg_ptr native_arg(const char __user *const __user *p)
return (struct user_arg_ptr){.ptr.native = p};
}
-static int do_execveat_file_common(struct file *file, struct filename *filename,
- struct user_arg_ptr argv,
- struct user_arg_ptr envp, int flags)
+static int do_execveat_file_template_common(struct file *file,
+ struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp,
+ int flags,
+ struct spawn_exec_template *tmpl)
{
struct linux_binprm *bprm;
struct file *exec_file;
@@ -1940,11 +1943,20 @@ static int do_execveat_file_common(struct file *file, struct filename *filename,
if (IS_ERR(bprm))
return PTR_ERR(bprm);
+ bprm->spawn_template = tmpl;
retval = do_execveat_common_bprm(bprm, argv, envp);
free_bprm(bprm);
return retval;
}
+static int do_execveat_file_common(struct file *file, struct filename *filename,
+ struct user_arg_ptr argv,
+ struct user_arg_ptr envp, int flags)
+{
+ return do_execveat_file_template_common(file, filename, argv, envp,
+ flags, NULL);
+}
+
int kernel_execveat_file(struct file *file, const char *filename,
const void __user *argv,
const void __user *envp,
@@ -1962,6 +1974,25 @@ int kernel_execveat_file(struct file *file, const char *filename,
native_arg(user_envp), flags);
}
+int kernel_execveat_file_template(struct file *file, const char *filename,
+ const void __user *argv,
+ const void __user *envp, int flags,
+ struct spawn_exec_template *tmpl)
+{
+ const char __user *const __user *user_argv;
+ const char __user *const __user *user_envp;
+
+ CLASS(filename_kernel, name)(filename);
+
+ user_argv = (const char __user *const __user *)argv;
+ user_envp = (const char __user *const __user *)envp;
+
+ return do_execveat_file_template_common(file, name,
+ native_arg(user_argv),
+ native_arg(user_envp),
+ flags, tmpl);
+}
+
void set_binfmt(struct linux_binfmt *new)
{
struct mm_struct *mm = current->mm;
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
index 268f804227987..a11a7ed676416 100644
--- a/fs/spawn_template.c
+++ b/fs/spawn_template.c
@@ -28,7 +28,7 @@
struct spawn_template {
struct file *exec_file;
- struct spawn_template_file_key exec_key;
+ struct spawn_exec_template *exec_template;
const struct cred *creator_cred;
char *filename;
bool deny_write;
@@ -36,6 +36,7 @@ struct spawn_template {
struct spawn_template_spawn_context {
struct spawn_template *tmpl;
+ struct spawn_exec_template *exec_template;
struct spawn_template_spawn_args args;
struct spawn_template_action *actions;
};
@@ -114,16 +115,16 @@ static bool spawn_template_key_matches(struct spawn_template *tmpl)
file = tmp;
matches = spawn_template_file_key_matches(file,
- &tmpl->exec_key);
+ &tmpl->exec_template->exec_key);
matches = matches && spawn_template_file_exec_allowed(file);
exe_file_allow_write_access(file);
if (!matches)
return false;
}
- return spawn_template_file_exec_allowed(tmpl->exec_file) &&
- spawn_template_file_key_matches(tmpl->exec_file,
- &tmpl->exec_key);
+ if (!spawn_template_file_exec_allowed(tmpl->exec_file))
+ return false;
+ return spawn_exec_template_matches(tmpl->exec_template, tmpl->exec_file);
}
static int spawn_template_copy_signal_set(const struct spawn_template_action *action,
@@ -331,26 +332,29 @@ static int spawn_template_child(void *data)
{
struct spawn_template_spawn_context *ctx = data;
struct spawn_template *tmpl = ctx->tmpl;
+ struct spawn_exec_template *exec_template = ctx->exec_template;
int ret;
u64 i;
for (i = 0; i < ctx->args.actions_len; i++) {
ret = spawn_template_apply_action(&ctx->actions[i]);
if (ret < 0)
- goto out_exec_error;
+ goto out_put_exec_template;
}
if (!(ctx->args.flags & SPAWN_TEMPLATE_SPAWN_INHERIT_FDS)) {
ret = do_close_range(3, ~0U, 0);
if (ret < 0)
- goto out_exec_error;
+ goto out_put_exec_template;
}
- ret = kernel_execveat_file(tmpl->exec_file, "",
- u64_to_user_ptr(ctx->args.argv),
- u64_to_user_ptr(ctx->args.envp),
- AT_EMPTY_PATH);
-out_exec_error:
+ ret = kernel_execveat_file_template(tmpl->exec_file, "",
+ u64_to_user_ptr(ctx->args.argv),
+ u64_to_user_ptr(ctx->args.envp),
+ AT_EMPTY_PATH,
+ exec_template);
+out_put_exec_template:
+ spawn_exec_template_put(exec_template);
if (ret < 0)
do_exit(spawn_template_exit_status(ret));
return 0;
@@ -373,6 +377,7 @@ static int spawn_template_release(struct inode *inode, struct file *file)
if (tmpl->deny_write)
exe_file_allow_write_access(tmpl->exec_file);
+ spawn_exec_template_put(tmpl->exec_template);
fput(tmpl->exec_file);
put_cred(tmpl->creator_cred);
kfree(tmpl->filename);
@@ -501,7 +506,10 @@ SYSCALL_DEFINE2(spawn_template_create,
&tmpl->deny_write);
if (ret)
goto out_free_tmpl;
- spawn_template_fill_file_key(tmpl->exec_file, &tmpl->exec_key);
+
+ ret = spawn_exec_template_create(tmpl->exec_file, &tmpl->exec_template);
+ if (ret)
+ goto out_put_exec;
if (args.flags & SPAWN_TEMPLATE_CREATE_CLOEXEC)
fd_flags |= O_CLOEXEC;
@@ -514,6 +522,7 @@ SYSCALL_DEFINE2(spawn_template_create,
return ret;
out_put_exec:
+ spawn_exec_template_put(tmpl->exec_template);
if (tmpl->deny_write)
exe_file_allow_write_access(tmpl->exec_file);
fput(tmpl->exec_file);
@@ -580,6 +589,7 @@ SYSCALL_DEFINE3(spawn_template_spawn, int, template_fd,
ret = -ESTALE;
goto out_free_actions;
}
+ ctx->exec_template = spawn_exec_template_get(ctx->tmpl->exec_template);
kargs = (struct kernel_clone_args) {
.flags = CLONE_VM | CLONE_VFORK | CLONE_PIDFD,
@@ -590,6 +600,8 @@ SYSCALL_DEFINE3(spawn_template_spawn, int, template_fd,
};
ret = kernel_clone(&kargs);
+ if (ret < 0)
+ spawn_exec_template_put(ctx->exec_template);
out_free_actions:
kfree(ctx->actions);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index c0715678c9a06..4e76a94d331a8 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -9,6 +9,7 @@
struct filename;
struct coredump_params;
+struct spawn_exec_template;
#define CORENAME_MAX_SIZE 128
@@ -53,6 +54,7 @@ struct linux_binprm {
struct file *executable; /* Executable to pass to the interpreter */
struct file *interpreter;
struct file *file;
+ struct spawn_exec_template *spawn_template;
struct cred *cred; /* new credentials */
int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */
unsigned int per_clear; /* bits to clear in current->personality */
@@ -145,6 +147,10 @@ int kernel_execveat_file(struct file *file, const char *filename,
const void __user *argv,
const void __user *envp,
int flags);
+int kernel_execveat_file_template(struct file *file, const char *filename,
+ const void __user *argv,
+ const void __user *envp, int flags,
+ struct spawn_exec_template *tmpl);
extern void set_binfmt(struct linux_binfmt *new);
extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
diff --git a/include/linux/spawn_template.h b/include/linux/spawn_template.h
index f14a7749fe55b..426413bc11eea 100644
--- a/include/linux/spawn_template.h
+++ b/include/linux/spawn_template.h
@@ -2,7 +2,9 @@
#ifndef _LINUX_SPAWN_TEMPLATE_H
#define _LINUX_SPAWN_TEMPLATE_H
+#include <linux/elf.h>
#include <linux/fs.h>
+#include <linux/refcount.h>
struct spawn_template_file_key {
dev_t dev;
@@ -17,9 +19,54 @@ struct spawn_template_file_key {
u64 mtime_nsec;
};
+struct spawn_exec_template {
+ refcount_t refcount;
+ struct spawn_template_file_key exec_key;
+ struct elfhdr exec_ehdr;
+ struct elf_phdr *exec_phdrs;
+ unsigned int exec_phnum;
+};
+
void spawn_template_fill_file_key(struct file *file,
struct spawn_template_file_key *key);
bool spawn_template_file_key_matches(struct file *file,
const struct spawn_template_file_key *key);
+#ifdef CONFIG_BINFMT_ELF
+int spawn_exec_template_create(struct file *file,
+ struct spawn_exec_template **out);
+struct spawn_exec_template *
+spawn_exec_template_get(struct spawn_exec_template *tmpl);
+void spawn_exec_template_put(struct spawn_exec_template *tmpl);
+bool spawn_exec_template_matches(struct spawn_exec_template *tmpl,
+ struct file *file);
+#else
+static inline int spawn_exec_template_create(struct file *file,
+ struct spawn_exec_template **out)
+{
+ (void)file;
+ (void)out;
+ return -ENOEXEC;
+}
+
+static inline void spawn_exec_template_put(struct spawn_exec_template *tmpl)
+{
+ (void)tmpl;
+}
+
+static inline struct spawn_exec_template *
+spawn_exec_template_get(struct spawn_exec_template *tmpl)
+{
+ return tmpl;
+}
+
+static inline bool spawn_exec_template_matches(struct spawn_exec_template *tmpl,
+ struct file *file)
+{
+ (void)tmpl;
+ (void)file;
+ return false;
+}
+#endif
+
#endif /* _LINUX_SPAWN_TEMPLATE_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 09/13] Documentation: describe spawn templates
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (7 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 08/13] binfmt_elf: cache ELF metadata for spawn templates Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 10/13] exec: require absolute paths for path-created templates Li Chen
` (6 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Document the spawn_template userspace ABI, fd lifetime, per-spawn
actions, default fd-closing behavior, security model, invalidation, and
cached ELF metadata. Keep workload-specific benchmark details out of the
kernel documentation.
Add the spawn template files to the exec/binfmt MAINTAINERS entry so the
documentation, UAPI, internal header, and implementation are covered in
the same patch.
Signed-off-by: Li Chen <me@linux.beauty>
---
Documentation/userspace-api/index.rst | 1 +
.../userspace-api/spawn_template.rst | 141 ++++++++++++++++++
MAINTAINERS | 2 +
3 files changed, 144 insertions(+)
create mode 100644 Documentation/userspace-api/spawn_template.rst
diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
index a68b1bea57a85..28520d16d3862 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -22,6 +22,7 @@ System calls
ioctl/index
mseal
rseq
+ spawn_template
Security-related interfaces
===========================
diff --git a/Documentation/userspace-api/spawn_template.rst b/Documentation/userspace-api/spawn_template.rst
new file mode 100644
index 0000000000000..0396d292fd17d
--- /dev/null
+++ b/Documentation/userspace-api/spawn_template.rst
@@ -0,0 +1,141 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Spawn templates
+===============
+
+``spawn_template`` is a userspace-controlled interface for workloads that
+repeatedly start the same executable with different arguments, environment, and
+file-descriptor setup.
+
+Userspace creates a template fd for an executable with
+``spawn_template_create()``. Later calls to ``spawn_template_spawn()`` create a
+new child from that template and return both a pid and a pidfd. The child still
+executes through the normal ``execve`` path. The template only lets the kernel
+reuse metadata that is safe to reuse after revalidation.
+
+This is intended for launchers, shells, and agent runtimes that already know
+which tools are hot. The kernel does not decide policy for names such as
+``rg``, ``git``, or ``sed``. Userspace should keep its existing spawn path as a
+fallback for unsupported files, invalidated templates, and policy decisions.
+
+This RFC version supports ELF executable templates only. Scripts, binfmt_misc
+targets, and other non-ELF formats are expected to use the fallback path.
+
+Template lifetime
+=================
+
+``spawn_template_create()`` takes ``struct spawn_template_create_args`` and
+returns a template fd. The fd is an ordinary file descriptor backed by an
+anonymous inode. Closing the fd releases the template.
+
+Userspace can identify the executable either by an existing executable fd or by
+path. Exactly one of ``execfd`` and ``filename`` must be supplied. Passing
+``SPAWN_TEMPLATE_CREATE_CLOEXEC`` sets ``O_CLOEXEC`` on the returned template
+fd.
+
+Creating a template for an unsupported executable format fails. For this RFC
+that means non-ELF executables fail template creation rather than becoming a
+partially cached template.
+
+Create-time fd actions are not supported. ``actions`` and ``actions_len`` in
+``struct spawn_template_create_args`` are reserved and must be zero. File
+descriptor numbers are per-process state, so reusable fd actions would be
+ambiguous once the creating process changes its fd table.
+
+Spawning
+========
+
+``spawn_template_spawn()`` takes a template fd and
+``struct spawn_template_spawn_args``. ``argv`` and ``envp`` point to the normal
+userspace argument and environment vectors for the new image. ``pidfd`` points
+to an ``int`` in userspace where the kernel stores the new pidfd. The syscall
+return value is the new pid on success.
+
+A successful ``spawn_template_spawn()`` return means the child has been created
+and the pidfd has been installed. After that point, per-spawn action failures
+or exec failures are reported by the child exit status, not by changing the
+syscall return value. The syscall itself returns a negative errno only for
+errors detected before child creation, such as bad arguments, a bad template
+fd, stale executable identity, or clone failure.
+
+Per-spawn actions run in the child before exec. They are intended for the same
+kind of setup that ``posix_spawn_file_actions_t`` commonly performs:
+
+``SPAWN_TEMPLATE_ACTION_CLOSE``
+ Close one fd.
+
+``SPAWN_TEMPLATE_ACTION_DUP2``
+ Duplicate one fd to another fd, optionally with ``O_CLOEXEC``.
+
+``SPAWN_TEMPLATE_ACTION_FCHDIR``
+ Change the child's current working directory to an open directory fd.
+
+``SPAWN_TEMPLATE_ACTION_OPEN``
+ Open a path using ``struct open_how`` and install it at ``newfd``.
+
+``SPAWN_TEMPLATE_ACTION_CLOSE_RANGE``
+ Apply ``close_range()`` to a child fd range.
+
+``SPAWN_TEMPLATE_ACTION_SIGMASK``
+ Set the child signal mask.
+
+``SPAWN_TEMPLATE_ACTION_SIGDEFAULT``
+ Reset selected signal dispositions to ``SIG_DFL``.
+
+By default, the child closes all inherited file descriptors above standard
+error after the requested actions have run. Passing
+``SPAWN_TEMPLATE_SPAWN_INHERIT_FDS`` keeps the traditional inheritance model.
+Launchers for untrusted or secret-bearing workloads should prefer the default.
+
+Security model
+==============
+
+``spawn_template_spawn()`` is not a shortcut around ``execve`` security. Each
+spawn still reaches the normal binary handler and credential commit path, so
+permission checks, LSM hooks, secure-exec handling, and ``no_new_privs`` remain
+part of execution.
+
+The template fd does not grant ambient authority to unrelated tasks. The
+current implementation requires the caller to have the same credential object
+that created the template. Passing the fd with ``SCM_RIGHTS`` is therefore not
+enough to delegate spawn authority after credentials have changed.
+
+The kernel pins the executable inode against writes while the template exists.
+An in-place writer therefore fails while a template fd is alive. A package
+manager can still replace a tool with a rename; a path-created template then
+sees that the absolute path resolves to a different executable and spawn fails
+before creating a child. Userspace can close the old template fd and create a
+new one after such an update.
+
+Each spawn revalidates cached identity metadata before using template metadata.
+The key includes device, inode, size, mode, owner, ctime, and mtime.
+Path-created templates re-open the path before child creation and reject reuse
+if the path now names a different executable.
+
+Cached metadata
+===============
+
+For ELF executables, the template caches only the main executable ELF header,
+program headers, and executable identity key. The cached program headers are
+used to avoid repeated metadata reads for hot executables after the executable
+identity has been revalidated.
+
+The cache does not include the shared-library dependency graph. Shared
+libraries are found by the userspace dynamic linker after exec and depend on
+userspace policy such as ``LD_LIBRARY_PATH``, ``RPATH``, ``RUNPATH``,
+``/etc/ld.so.cache``, mount namespaces, and secure-exec state. The kernel
+therefore does not try to duplicate dynamic-linker policy in a spawn template.
+
+Errors and fallback
+===================
+
+If template creation reports an unsupported format, or if spawn reports a stale
+template before child creation, the caller should use its existing spawn
+implementation. A launcher may also drop the template fd and create a new
+template after a failure. Once spawn has returned a pid, the caller should
+observe child success or failure by waiting on the pid or pidfd.
+
+The interface is designed so ordinary tools do not need to be modified.
+Runtimes that already centralize process launch can opt in one executable at a
+time and preserve their existing fallback behavior.
diff --git a/MAINTAINERS b/MAINTAINERS
index ea4134a188779..3e737097940f9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9728,7 +9728,9 @@ M: Kees Cook <kees@kernel.org>
L: linux-mm@kvack.org
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
+F: arch/x86/entry/syscalls/syscall_64.tbl
F: Documentation/userspace-api/ELF.rst
+F: Documentation/userspace-api/spawn_template.rst
F: fs/*binfmt_*.c
F: fs/Kconfig.binfmt
F: fs/exec.c
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 10/13] exec: require absolute paths for path-created templates
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (8 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 09/13] Documentation: describe " Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 11/13] exec: let close-range actions target the max fd Li Chen
` (5 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Path-created spawn templates re-open the stored path during spawn-time
revalidation. A relative path would be interpreted against the caller cwd
at spawn time, not necessarily the cwd used when the template was created.
Reject relative paths for now. Userspace can resolve the executable first
or create the template from an executable fd when it needs cwd-relative
lookup.
Signed-off-by: Li Chen <me@linux.beauty>
---
Documentation/userspace-api/spawn_template.rst | 17 ++++++++++++++---
fs/spawn_template.c | 2 ++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/Documentation/userspace-api/spawn_template.rst b/Documentation/userspace-api/spawn_template.rst
index 0396d292fd17d..afe215e51db6f 100644
--- a/Documentation/userspace-api/spawn_template.rst
+++ b/Documentation/userspace-api/spawn_template.rst
@@ -30,9 +30,20 @@ returns a template fd. The fd is an ordinary file descriptor backed by an
anonymous inode. Closing the fd releases the template.
Userspace can identify the executable either by an existing executable fd or by
-path. Exactly one of ``execfd`` and ``filename`` must be supplied. Passing
-``SPAWN_TEMPLATE_CREATE_CLOEXEC`` sets ``O_CLOEXEC`` on the returned template
-fd.
+an absolute path. Exactly one of ``execfd`` and ``filename`` must be supplied.
+Passing ``SPAWN_TEMPLATE_CREATE_CLOEXEC`` sets ``O_CLOEXEC`` on the returned
+template fd.
+
+Relative paths are rejected for path-created templates. The kernel stores the
+filename and re-opens it at spawn time to check that the path still names the
+same executable. A relative filename would be resolved against the caller's
+current working directory at spawn time, not the directory that was current
+when the template was created. For example, a template created for ``bin/tool``
+while the caller is in ``/repo-a`` could later be spawned after the caller has
+changed to ``/repo-b``. Revalidating ``bin/tool`` would then look under
+``/repo-b`` and give different semantics from the executable that was
+originally templated. Userspace that wants directory-relative lookup should
+open the executable itself and create the template from ``execfd``.
Creating a template for an unsupported executable format fails. For this RFC
that means non-ELF executables fail template creation rather than becoming a
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
index a11a7ed676416..6430a6645fb57 100644
--- a/fs/spawn_template.c
+++ b/fs/spawn_template.c
@@ -441,6 +441,8 @@ static int spawn_template_open_filename(u64 filename, struct file **file,
tmp = strndup_user(u64_to_user_ptr(filename), PATH_MAX);
if (IS_ERR(tmp))
return PTR_ERR(tmp);
+ if (tmp[0] != '/')
+ return -EINVAL;
kfilename = tmp;
tmp_file = open_exec(kfilename);
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 11/13] exec: let close-range actions target the max fd
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (9 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 10/13] exec: require absolute paths for path-created templates Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 12/13] syscalls: add generic spawn template entries Li Chen
` (4 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Allow CLOSE_RANGE actions to pass newfd == -1 to mean the largest
possible fd. This gives userspace a compact way to request the common
close_range(first, ~0U, flags) pattern even though the UAPI action uses
signed fd fields so OPEN actions can still carry AT_FDCWD.
Signed-off-by: Li Chen <me@linux.beauty>
---
Documentation/userspace-api/spawn_template.rst | 3 ++-
fs/spawn_template.c | 10 +++++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/Documentation/userspace-api/spawn_template.rst b/Documentation/userspace-api/spawn_template.rst
index afe215e51db6f..be66be20d4fde 100644
--- a/Documentation/userspace-api/spawn_template.rst
+++ b/Documentation/userspace-api/spawn_template.rst
@@ -86,7 +86,8 @@ kind of setup that ``posix_spawn_file_actions_t`` commonly performs:
Open a path using ``struct open_how`` and install it at ``newfd``.
``SPAWN_TEMPLATE_ACTION_CLOSE_RANGE``
- Apply ``close_range()`` to a child fd range.
+ Apply ``close_range()`` to a child fd range. Passing ``newfd == -1`` means
+ the range extends to the largest possible fd.
``SPAWN_TEMPLATE_ACTION_SIGMASK``
Set the child signal mask.
diff --git a/fs/spawn_template.c b/fs/spawn_template.c
index 6430a6645fb57..82b833bc9865a 100644
--- a/fs/spawn_template.c
+++ b/fs/spawn_template.c
@@ -220,6 +220,8 @@ static int spawn_template_apply_sigdefault(const struct spawn_template_action *a
static int spawn_template_apply_action(const struct spawn_template_action *action)
{
+ unsigned int max_fd;
+
switch (action->type) {
case SPAWN_TEMPLATE_ACTION_CLOSE:
return close_fd(action->fd);
@@ -251,7 +253,8 @@ static int spawn_template_apply_action(const struct spawn_template_action *actio
case SPAWN_TEMPLATE_ACTION_OPEN:
return spawn_template_apply_open(action);
case SPAWN_TEMPLATE_ACTION_CLOSE_RANGE:
- return do_close_range(action->fd, action->newfd, action->flags);
+ max_fd = action->newfd == -1 ? ~0U : action->newfd;
+ return do_close_range(action->fd, max_fd, action->flags);
case SPAWN_TEMPLATE_ACTION_SIGMASK:
return spawn_template_apply_sigmask(action);
case SPAWN_TEMPLATE_ACTION_SIGDEFAULT:
@@ -306,8 +309,9 @@ static int spawn_template_copy_actions(struct spawn_template_action **out_action
return -EINVAL;
break;
case SPAWN_TEMPLATE_ACTION_CLOSE_RANGE:
- if (actions[i].fd < 0 || actions[i].newfd < 0 ||
- actions[i].fd > actions[i].newfd ||
+ if (actions[i].fd < 0 || actions[i].newfd < -1 ||
+ (actions[i].newfd >= 0 &&
+ actions[i].fd > actions[i].newfd) ||
(actions[i].flags &
~(CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC)) ||
actions[i].arg)
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 12/13] syscalls: add generic spawn template entries
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (10 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 11/13] exec: let close-range actions target the max fd Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 13/13] selftests/exec: cover spawn template basics Li Chen
` (3 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Add spawn_template_create() and spawn_template_spawn() to the generic
syscall table and asm-generic UAPI numbering. This lets architectures
using the generic table pick up the spawn-template ABI instead of
leaving the mechanism x86-only.
Signed-off-by: Li Chen <me@linux.beauty>
---
arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
include/uapi/asm-generic/unistd.h | 7 ++++++-
scripts/syscall.tbl | 2 ++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index d6c1667e8f3b8..e9dcfc6de79bc 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -396,6 +396,8 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 64 spawn_template_create sys_spawn_template_create
+473 64 spawn_template_spawn sys_spawn_template_spawn
#
# Due to a historical design error, certain syscalls are numbered differently
# in x32 as compared to native x86_64. These syscalls have numbers 512-547.
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a627acc8fb5fe..8589f2b9696a7 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -863,8 +863,13 @@ __SYSCALL(__NR_listns, sys_listns)
#define __NR_rseq_slice_yield 471
__SYSCALL(__NR_rseq_slice_yield, sys_rseq_slice_yield)
+#define __NR_spawn_template_create 472
+__SYSCALL(__NR_spawn_template_create, sys_spawn_template_create)
+#define __NR_spawn_template_spawn 473
+__SYSCALL(__NR_spawn_template_spawn, sys_spawn_template_spawn)
+
#undef __NR_syscalls
-#define __NR_syscalls 472
+#define __NR_syscalls 474
/*
* 32 bit systems traditionally used different
diff --git a/scripts/syscall.tbl b/scripts/syscall.tbl
index 7a42b32b65776..7f8e74e866e48 100644
--- a/scripts/syscall.tbl
+++ b/scripts/syscall.tbl
@@ -412,3 +412,5 @@
469 common file_setattr sys_file_setattr
470 common listns sys_listns
471 common rseq_slice_yield sys_rseq_slice_yield
+472 common spawn_template_create sys_spawn_template_create
+473 common spawn_template_spawn sys_spawn_template_spawn
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC PATCH v1 13/13] selftests/exec: cover spawn template basics
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (11 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 12/13] syscalls: add generic spawn template entries Li Chen
@ 2026-05-28 9:52 ` Li Chen
2026-05-28 11:02 ` [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Christian Brauner
` (2 subsequent siblings)
15 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-05-28 9:52 UTC (permalink / raw)
To: Christian Brauner, Kees Cook, Alexander Viro
Cc: linux-fsdevel, linux-api, linux-kernel, linux-mm, linux-arch,
linux-doc, linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan, Li Chen
Add exec selftests for the spawn_template ABI. Cover basic spawning,
relative path rejection, execfd execute-permission checks, default fd
closing, close-range actions using newfd -1, and stale path rejection
after executable metadata changes.
Also cover atomic path replacement while a template fd for an old path is
still alive. The old template must reject the changed path with ESTALE, and
a new template for the same path must execute the replacement.
Signed-off-by: Li Chen <me@linux.beauty>
---
MAINTAINERS | 1 +
tools/testing/selftests/exec/Makefile | 1 +
tools/testing/selftests/exec/spawn_template.c | 997 ++++++++++++++++++
3 files changed, 999 insertions(+)
create mode 100644 tools/testing/selftests/exec/spawn_template.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 3e737097940f9..77b3da32b4d2a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9747,6 +9747,7 @@ F: include/uapi/linux/spawn_template.h
F: kernel/fork.c
F: mm/vma_exec.c
F: tools/testing/selftests/exec/
+F: tools/testing/selftests/exec/spawn_template.c
N: asm/elf.h
N: binfmt
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
index 45a3cfc435cfd..cf39fe916b9ba 100644
--- a/tools/testing/selftests/exec/Makefile
+++ b/tools/testing/selftests/exec/Makefile
@@ -20,6 +20,7 @@ TEST_FILES := Makefile
TEST_GEN_PROGS += recursion-depth
TEST_GEN_PROGS += null-argv
TEST_GEN_PROGS += check-exec
+TEST_GEN_PROGS += spawn_template
EXTRA_CLEAN := $(OUTPUT)/subdir.moved $(OUTPUT)/execveat.moved $(OUTPUT)/xxxxx* \
$(OUTPUT)/S_I*.test
diff --git a/tools/testing/selftests/exec/spawn_template.c b/tools/testing/selftests/exec/spawn_template.c
new file mode 100644
index 0000000000000..26708143ac9dc
--- /dev/null
+++ b/tools/testing/selftests/exec/spawn_template.c
@@ -0,0 +1,997 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include <linux/spawn_template.h>
+
+#include "kselftest.h"
+
+#ifndef __NR_spawn_template_create
+#define __NR_spawn_template_create 472
+#endif
+
+#ifndef __NR_spawn_template_spawn
+#define __NR_spawn_template_spawn 473
+#endif
+
+#define SPAWN_TEMPLATE_MISSING_SYSCALL_ERRNO 38
+#define SPAWN_TEMPLATE_KERNEL_NSIG 64
+#define SPAWN_TEMPLATE_KERNEL_SIGSET_WORDS \
+ (SPAWN_TEMPLATE_KERNEL_NSIG / (8 * sizeof(unsigned long)))
+
+static const char *true_path;
+static char self_path[PATH_MAX];
+
+struct spawn_template_kernel_sigset {
+ unsigned long sig[SPAWN_TEMPLATE_KERNEL_SIGSET_WORDS];
+};
+
+static void spawn_template_kernel_sigempty(struct spawn_template_kernel_sigset *set)
+{
+ memset(set, 0, sizeof(*set));
+}
+
+static void spawn_template_kernel_sigadd(struct spawn_template_kernel_sigset *set,
+ int sig)
+{
+ sig--;
+ set->sig[sig / (8 * sizeof(unsigned long))] |=
+ 1UL << (sig % (8 * sizeof(unsigned long)));
+}
+
+static int read_fd_string(int fd, const char *expected)
+{
+ char buf[128];
+ ssize_t nread;
+
+ nread = read(fd, buf, sizeof(buf) - 1);
+ if (nread < 0)
+ return -errno;
+
+ buf[nread] = '\0';
+ return strcmp(buf, expected) ? -EINVAL : 0;
+}
+
+static int write_file(const char *path, const char *data, mode_t mode)
+{
+ size_t left = strlen(data);
+ const char *p = data;
+ int fd;
+ int ret = 0;
+
+ fd = open(path, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, mode);
+ if (fd < 0)
+ return -errno;
+
+ while (left) {
+ ssize_t written = write(fd, p, left);
+
+ if (written < 0) {
+ ret = -errno;
+ break;
+ }
+ left -= written;
+ p += written;
+ }
+
+ close(fd);
+ return ret;
+}
+
+static int create_template_path(const char *path)
+{
+ struct spawn_template_create_args args = {
+ .flags = SPAWN_TEMPLATE_CREATE_CLOEXEC,
+ .execfd = -1,
+ .filename = (uintptr_t)path,
+ };
+
+ return syscall(__NR_spawn_template_create, &args, sizeof(args));
+}
+
+static int create_template_fd(int execfd)
+{
+ struct spawn_template_create_args args = {
+ .flags = SPAWN_TEMPLATE_CREATE_CLOEXEC,
+ .execfd = execfd,
+ };
+
+ return syscall(__NR_spawn_template_create, &args, sizeof(args));
+}
+
+static int spawn_template_start(int template_fd, char *const argv[],
+ struct spawn_template_action *actions,
+ unsigned int actions_len,
+ unsigned long long flags, pid_t *pid_out,
+ int *pidfd_out)
+{
+ char *const envp[] = { "PATH=/usr/bin:/bin", NULL };
+ struct spawn_template_spawn_args args = {
+ .flags = flags,
+ .argv = (uintptr_t)argv,
+ .envp = (uintptr_t)envp,
+ .actions = (uintptr_t)actions,
+ .actions_len = actions_len,
+ };
+ int pidfd = -1;
+ pid_t pid;
+ int ret;
+
+ args.pidfd = (uintptr_t)&pidfd;
+
+ pid = syscall(__NR_spawn_template_spawn, template_fd, &args,
+ sizeof(args));
+ if (pid < 0) {
+ ret = -errno;
+ if (pidfd >= 0) {
+ siginfo_t info;
+
+ waitid(P_PIDFD, pidfd, &info, WEXITED);
+ close(pidfd);
+ }
+ return ret;
+ }
+
+ *pid_out = pid;
+ *pidfd_out = pidfd;
+ return 0;
+}
+
+static int spawn_template(int template_fd, char *const argv[],
+ struct spawn_template_action *actions,
+ unsigned int actions_len, unsigned long long flags)
+{
+ siginfo_t info = {};
+ int pidfd;
+ pid_t pid;
+ int ret;
+
+ ret = spawn_template_start(template_fd, argv, actions, actions_len, flags,
+ &pid, &pidfd);
+ if (ret)
+ return ret;
+ (void)pid;
+
+ ret = waitid(P_PIDFD, pidfd, &info, WEXITED);
+ if (ret < 0) {
+ ret = -errno;
+ goto out_close_pidfd;
+ }
+
+ if (info.si_code != CLD_EXITED) {
+ ret = -EINVAL;
+ goto out_close_pidfd;
+ }
+
+ ret = info.si_status;
+
+out_close_pidfd:
+ if (pidfd >= 0)
+ close(pidfd);
+ return ret;
+}
+
+static const char *find_true(void)
+{
+ static const char * const paths[] = {
+ "/usr/bin/true",
+ "/bin/true",
+ };
+ unsigned int i;
+
+ for (i = 0; i < ARRAY_SIZE(paths); i++) {
+ if (access(paths[i], X_OK) == 0)
+ return paths[i];
+ }
+ return NULL;
+}
+
+static int copy_file(const char *src, const char *dst)
+{
+ char buf[8192];
+ ssize_t nread;
+ int infd;
+ int outfd;
+ int ret = 0;
+
+ infd = open(src, O_RDONLY | O_CLOEXEC);
+ if (infd < 0)
+ return -errno;
+
+ outfd = open(dst, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0700);
+ if (outfd < 0) {
+ ret = -errno;
+ goto out_close_in;
+ }
+
+ while ((nread = read(infd, buf, sizeof(buf))) > 0) {
+ char *p = buf;
+ ssize_t left = nread;
+
+ while (left > 0) {
+ ssize_t written = write(outfd, p, left);
+
+ if (written < 0) {
+ ret = -errno;
+ goto out_close_out;
+ }
+ left -= written;
+ p += written;
+ }
+ }
+ if (nread < 0)
+ ret = -errno;
+
+out_close_out:
+ close(outfd);
+out_close_in:
+ close(infd);
+ return ret;
+}
+
+static int test_basic_spawn(void)
+{
+ char *const argv[] = { (char *)true_path, NULL };
+ int template_fd;
+ int ret;
+
+ template_fd = create_template_path(true_path);
+ if (template_fd < 0)
+ return -errno;
+
+ ret = spawn_template(template_fd, argv, NULL, 0, 0);
+ close(template_fd);
+ return ret;
+}
+
+static int test_relative_path_rejected(void)
+{
+ int template_fd;
+
+ template_fd = create_template_path("true");
+ if (template_fd >= 0) {
+ close(template_fd);
+ return -EINVAL;
+ }
+
+ return errno == EINVAL ? 0 : -errno;
+}
+
+static int test_execfd_requires_execute(void)
+{
+ char path[] = "/tmp/spawn-template-noexec-XXXXXX";
+ int template_fd;
+ int fd;
+ int ret = 0;
+
+ fd = mkstemp(path);
+ if (fd < 0)
+ return -errno;
+
+ if (fchmod(fd, 0600)) {
+ ret = -errno;
+ goto out;
+ }
+
+ template_fd = create_template_fd(fd);
+ if (template_fd >= 0) {
+ close(template_fd);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret = errno == EACCES ? 0 : -errno;
+
+out:
+ close(fd);
+ unlink(path);
+ return ret;
+}
+
+static int test_default_closes_extra_fds(void)
+{
+ char fdarg[32];
+ char *const argv[] = {
+ self_path,
+ "--check-fd-closed",
+ fdarg,
+ NULL,
+ };
+ int template_fd;
+ int extra_fd;
+ int ret;
+
+ extra_fd = open("/dev/null", O_RDONLY);
+ if (extra_fd < 0)
+ return -errno;
+
+ snprintf(fdarg, sizeof(fdarg), "%d", extra_fd);
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_close_extra;
+ }
+
+ ret = spawn_template(template_fd, argv, NULL, 0, 0);
+ close(template_fd);
+
+out_close_extra:
+ close(extra_fd);
+ return ret;
+}
+
+static int test_close_range_max_action(void)
+{
+ char fdarg[32];
+ char *const argv[] = {
+ self_path,
+ "--check-fd-closed",
+ fdarg,
+ NULL,
+ };
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_CLOSE_RANGE,
+ .fd = -1,
+ .newfd = -1,
+ };
+ int template_fd;
+ int extra_fd;
+ int ret;
+
+ extra_fd = open("/dev/null", O_RDONLY | O_CLOEXEC);
+ if (extra_fd < 0)
+ return -errno;
+
+ action.fd = extra_fd;
+ snprintf(fdarg, sizeof(fdarg), "%d", extra_fd);
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_close_extra;
+ }
+
+ ret = spawn_template(template_fd, argv, &action, 1,
+ SPAWN_TEMPLATE_SPAWN_INHERIT_FDS);
+ close(template_fd);
+
+out_close_extra:
+ close(extra_fd);
+ return ret;
+}
+
+static int test_dup2_stdio_actions(void)
+{
+ char *const argv[] = { self_path, "--write-stdio", NULL };
+ struct spawn_template_action actions[2];
+ char out_buf[32];
+ char err_buf[32];
+ int out_pipe[2];
+ int err_pipe[2];
+ int template_fd;
+ int ret = 0;
+
+ if (pipe2(out_pipe, O_CLOEXEC))
+ return -errno;
+ if (pipe2(err_pipe, O_CLOEXEC)) {
+ ret = -errno;
+ goto out_close_out_pipe;
+ }
+
+ actions[0] = (struct spawn_template_action) {
+ .type = SPAWN_TEMPLATE_ACTION_DUP2,
+ .fd = out_pipe[1],
+ .newfd = STDOUT_FILENO,
+ };
+ actions[1] = (struct spawn_template_action) {
+ .type = SPAWN_TEMPLATE_ACTION_DUP2,
+ .fd = err_pipe[1],
+ .newfd = STDERR_FILENO,
+ };
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_close_err_pipe;
+ }
+
+ ret = spawn_template(template_fd, argv, actions, ARRAY_SIZE(actions), 0);
+ close(template_fd);
+ if (ret)
+ goto out_close_err_pipe;
+
+ close(out_pipe[1]);
+ out_pipe[1] = -1;
+ close(err_pipe[1]);
+ err_pipe[1] = -1;
+
+ memset(out_buf, 0, sizeof(out_buf));
+ memset(err_buf, 0, sizeof(err_buf));
+ if (read(out_pipe[0], out_buf, sizeof(out_buf) - 1) < 0) {
+ ret = -errno;
+ goto out_close_err_pipe;
+ }
+ if (read(err_pipe[0], err_buf, sizeof(err_buf) - 1) < 0) {
+ ret = -errno;
+ goto out_close_err_pipe;
+ }
+ if (strcmp(out_buf, "stdout-token\n") ||
+ strcmp(err_buf, "stderr-token\n"))
+ ret = -EINVAL;
+
+out_close_err_pipe:
+ if (err_pipe[1] >= 0)
+ close(err_pipe[1]);
+ close(err_pipe[0]);
+out_close_out_pipe:
+ if (out_pipe[1] >= 0)
+ close(out_pipe[1]);
+ close(out_pipe[0]);
+ return ret;
+}
+
+static int test_open_action_stdin(void)
+{
+ char dir[] = "/tmp/spawn-template-open-XXXXXX";
+ char path[PATH_MAX];
+ char *const argv[] = {
+ self_path,
+ "--check-fd-content",
+ "0",
+ "open-action-token\n",
+ NULL,
+ };
+ struct spawn_template_open open_arg = {
+ .path = (uintptr_t)path,
+ .how = {
+ .flags = O_RDONLY,
+ },
+ };
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_OPEN,
+ .fd = AT_FDCWD,
+ .newfd = STDIN_FILENO,
+ .arg = (uintptr_t)&open_arg,
+ };
+ int template_fd;
+ int ret;
+
+ if (!mkdtemp(dir))
+ return -errno;
+
+ snprintf(path, sizeof(path), "%s/input", dir);
+ ret = write_file(path, "open-action-token\n", 0600);
+ if (ret)
+ goto out_unlink;
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_unlink;
+ }
+
+ ret = spawn_template(template_fd, argv, &action, 1, 0);
+ close(template_fd);
+
+out_unlink:
+ unlink(path);
+ rmdir(dir);
+ return ret;
+}
+
+static int test_fchdir_action(void)
+{
+ char dir[] = "/tmp/spawn-template-fchdir-XXXXXX";
+ char resolved[PATH_MAX];
+ char *const argv[] = {
+ self_path,
+ "--check-cwd",
+ resolved,
+ NULL,
+ };
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_FCHDIR,
+ };
+ int template_fd;
+ int dirfd;
+ int ret;
+
+ if (!mkdtemp(dir))
+ return -errno;
+ if (!realpath(dir, resolved)) {
+ ret = -errno;
+ goto out_rmdir;
+ }
+
+ dirfd = open(dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
+ if (dirfd < 0) {
+ ret = -errno;
+ goto out_rmdir;
+ }
+ action.fd = dirfd;
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_close_dirfd;
+ }
+
+ ret = spawn_template(template_fd, argv, &action, 1, 0);
+ close(template_fd);
+
+out_close_dirfd:
+ close(dirfd);
+out_rmdir:
+ rmdir(dir);
+ return ret;
+}
+
+static int test_sigmask_action(void)
+{
+ char sigarg[16];
+ char *const argv[] = {
+ self_path,
+ "--check-sigmask",
+ sigarg,
+ NULL,
+ };
+ struct spawn_template_kernel_sigset mask;
+ struct spawn_template_sigset sigset_arg = {
+ .sigset = (uintptr_t)&mask,
+ .sigsetsize = sizeof(mask),
+ };
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_SIGMASK,
+ .arg = (uintptr_t)&sigset_arg,
+ };
+ int template_fd;
+ int ret;
+
+ spawn_template_kernel_sigempty(&mask);
+ spawn_template_kernel_sigadd(&mask, SIGUSR1);
+ snprintf(sigarg, sizeof(sigarg), "%d", SIGUSR1);
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0)
+ return -errno;
+
+ ret = spawn_template(template_fd, argv, &action, 1, 0);
+ close(template_fd);
+ return ret;
+}
+
+static int test_sigdefault_action(void)
+{
+ char sigarg[16];
+ char *const argv[] = {
+ self_path,
+ "--check-sigdefault",
+ sigarg,
+ NULL,
+ };
+ struct spawn_template_kernel_sigset mask;
+ struct sigaction old_sa;
+ struct sigaction ignore_sa = {
+ .sa_handler = SIG_IGN,
+ };
+ struct spawn_template_sigset sigset_arg = {
+ .sigset = (uintptr_t)&mask,
+ .sigsetsize = sizeof(mask),
+ };
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_SIGDEFAULT,
+ .arg = (uintptr_t)&sigset_arg,
+ };
+ int template_fd;
+ int ret;
+
+ spawn_template_kernel_sigempty(&mask);
+ spawn_template_kernel_sigadd(&mask, SIGUSR1);
+ snprintf(sigarg, sizeof(sigarg), "%d", SIGUSR1);
+
+ if (sigaction(SIGUSR1, &ignore_sa, &old_sa))
+ return -errno;
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_restore_signal;
+ }
+
+ ret = spawn_template(template_fd, argv, &action, 1, 0);
+ close(template_fd);
+
+out_restore_signal:
+ sigaction(SIGUSR1, &old_sa, NULL);
+ return ret;
+}
+
+static int test_inherit_fds_flag(void)
+{
+ char fdarg[32];
+ char *const argv[] = {
+ self_path,
+ "--check-fd-open",
+ fdarg,
+ NULL,
+ };
+ int template_fd;
+ int extra_fd;
+ int ret;
+
+ extra_fd = open("/dev/null", O_RDONLY);
+ if (extra_fd < 0)
+ return -errno;
+ snprintf(fdarg, sizeof(fdarg), "%d", extra_fd);
+
+ template_fd = create_template_path(self_path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_close_extra;
+ }
+
+ ret = spawn_template(template_fd, argv, NULL, 0,
+ SPAWN_TEMPLATE_SPAWN_INHERIT_FDS);
+ close(template_fd);
+
+out_close_extra:
+ close(extra_fd);
+ return ret;
+}
+
+static int test_pidfd_waitid(void)
+{
+ char *const argv[] = { (char *)true_path, NULL };
+ siginfo_t info = {};
+ int template_fd;
+ int pidfd;
+ pid_t pid;
+ int ret;
+
+ template_fd = create_template_path(true_path);
+ if (template_fd < 0)
+ return -errno;
+
+ ret = spawn_template_start(template_fd, argv, NULL, 0, 0, &pid, &pidfd);
+ close(template_fd);
+ if (ret)
+ return ret;
+
+ ret = waitid(P_PIDFD, pidfd, &info, WEXITED);
+ if (ret < 0) {
+ ret = -errno;
+ waitpid(pid, NULL, 0);
+ goto out_close_pidfd;
+ }
+ if (info.si_code != CLD_EXITED || info.si_status)
+ ret = -EINVAL;
+
+out_close_pidfd:
+ close(pidfd);
+ return ret;
+}
+
+static int test_create_actions_rejected(void)
+{
+ struct spawn_template_action action = {
+ .type = SPAWN_TEMPLATE_ACTION_CLOSE,
+ .fd = STDIN_FILENO,
+ };
+ struct spawn_template_create_args args = {
+ .flags = SPAWN_TEMPLATE_CREATE_CLOEXEC,
+ .execfd = -1,
+ .filename = (uintptr_t)true_path,
+ .actions = (uintptr_t)&action,
+ .actions_len = 1,
+ };
+ int template_fd;
+
+ template_fd = syscall(__NR_spawn_template_create, &args, sizeof(args));
+ if (template_fd >= 0) {
+ close(template_fd);
+ return -EINVAL;
+ }
+
+ return errno == EINVAL ? 0 : -errno;
+}
+
+static int test_script_template_unsupported(void)
+{
+ char dir[] = "/tmp/spawn-template-script-XXXXXX";
+ char path[PATH_MAX];
+ int template_fd;
+ int ret;
+
+ if (!mkdtemp(dir))
+ return -errno;
+
+ snprintf(path, sizeof(path), "%s/script", dir);
+ ret = write_file(path, "#!/bin/sh\nexit 0\n", 0700);
+ if (ret)
+ goto out_unlink;
+
+ template_fd = create_template_path(path);
+ if (template_fd >= 0) {
+ close(template_fd);
+ ret = -EINVAL;
+ goto out_unlink;
+ }
+ ret = errno == ENOEXEC ? 0 : -errno;
+
+out_unlink:
+ unlink(path);
+ rmdir(dir);
+ return ret;
+}
+
+static int test_deny_write_while_template_alive(void)
+{
+ char dir[] = "/tmp/spawn-template-deny-write-XXXXXX";
+ char path[PATH_MAX];
+ int template_fd;
+ int write_fd;
+ int ret = 0;
+
+ if (!mkdtemp(dir))
+ return -errno;
+
+ snprintf(path, sizeof(path), "%s/copy", dir);
+ ret = copy_file(self_path, path);
+ if (ret)
+ goto out_unlink;
+
+ template_fd = create_template_path(path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_unlink;
+ }
+
+ write_fd = open(path, O_WRONLY | O_TRUNC | O_CLOEXEC);
+ if (write_fd >= 0) {
+ close(write_fd);
+ ret = -EINVAL;
+ } else {
+ ret = errno == ETXTBSY ? 0 : -errno;
+ }
+
+ close(template_fd);
+out_unlink:
+ unlink(path);
+ rmdir(dir);
+ return ret;
+}
+
+static int test_stale_path_rejected(void)
+{
+ char dir[] = "/tmp/spawn-template-stale-XXXXXX";
+ char path[PATH_MAX];
+ char *const argv[] = { path, "--exit-zero", NULL };
+ int template_fd;
+ int ret = 0;
+
+ if (!mkdtemp(dir))
+ return -errno;
+
+ snprintf(path, sizeof(path), "%s/copy", dir);
+ ret = copy_file(self_path, path);
+ if (ret)
+ goto out_unlink;
+
+ template_fd = create_template_path(path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out_unlink;
+ }
+
+ if (chmod(path, 0600)) {
+ ret = -errno;
+ goto out_close_template;
+ }
+
+ ret = spawn_template(template_fd, argv, NULL, 0, 0);
+ if (ret >= 0)
+ ret = -EINVAL;
+ else
+ ret = ret == -ESTALE ? 0 : ret;
+
+out_close_template:
+ close(template_fd);
+out_unlink:
+ unlink(path);
+ rmdir(dir);
+ return ret;
+}
+
+static int test_path_replacement_allows_tool_update(void)
+{
+ char dir[] = "/tmp/spawn-template-update-XXXXXX";
+ char path[PATH_MAX];
+ char new_path[PATH_MAX];
+ char *const argv[] = { path, "--exit-zero", NULL };
+ int new_template_fd = -1;
+ int template_fd = -1;
+ int ret;
+
+ if (!mkdtemp(dir))
+ return -errno;
+
+ snprintf(path, sizeof(path), "%s/tool", dir);
+ snprintf(new_path, sizeof(new_path), "%s/tool.new", dir);
+ ret = copy_file(self_path, path);
+ if (ret)
+ goto out;
+ ret = copy_file(self_path, new_path);
+ if (ret)
+ goto out;
+
+ template_fd = create_template_path(path);
+ if (template_fd < 0) {
+ ret = -errno;
+ goto out;
+ }
+
+ if (rename(new_path, path)) {
+ ret = -errno;
+ goto out;
+ }
+
+ ret = spawn_template(template_fd, argv, NULL, 0, 0);
+ if (ret != -ESTALE) {
+ ret = ret < 0 ? ret : -EINVAL;
+ goto out;
+ }
+
+ new_template_fd = create_template_path(path);
+ if (new_template_fd < 0) {
+ ret = -errno;
+ goto out;
+ }
+
+ ret = spawn_template(new_template_fd, argv, NULL, 0, 0);
+
+out:
+ if (new_template_fd >= 0)
+ close(new_template_fd);
+ if (template_fd >= 0)
+ close(template_fd);
+ unlink(new_path);
+ unlink(path);
+ rmdir(dir);
+ return ret;
+}
+
+static void run_test(const char *name, int (*fn)(void))
+{
+ int ret = fn();
+
+ if (!ret)
+ ksft_test_result_pass("%s\n", name);
+ else
+ ksft_test_result_fail("%s failed: %s (%d)\n",
+ name, strerror(-ret), -ret);
+}
+
+static void check_syscall_available(void)
+{
+ int template_fd;
+
+ template_fd = create_template_path(true_path);
+ if (template_fd >= 0) {
+ close(template_fd);
+ return;
+ }
+
+ if (errno == SPAWN_TEMPLATE_MISSING_SYSCALL_ERRNO)
+ ksft_exit_skip("spawn_template syscalls are not available\n");
+
+ ksft_exit_fail_msg("spawn_template_create failed: %s (%d)\n",
+ strerror(errno), errno);
+}
+
+int main(int argc, char **argv)
+{
+ ssize_t len;
+
+ if (argc == 2 && !strcmp(argv[1], "--exit-zero"))
+ return 0;
+
+ if (argc == 3 && !strcmp(argv[1], "--check-fd-closed")) {
+ int fd = atoi(argv[2]);
+
+ return fcntl(fd, F_GETFD) < 0 && errno == EBADF ? 0 : 1;
+ }
+
+ if (argc == 3 && !strcmp(argv[1], "--check-fd-open")) {
+ int fd = atoi(argv[2]);
+
+ return fcntl(fd, F_GETFD) >= 0 ? 0 : 1;
+ }
+
+ if (argc == 4 && !strcmp(argv[1], "--check-fd-content"))
+ return read_fd_string(atoi(argv[2]), argv[3]) ? 1 : 0;
+
+ if (argc == 3 && !strcmp(argv[1], "--check-cwd")) {
+ char cwd[PATH_MAX];
+
+ if (!getcwd(cwd, sizeof(cwd)))
+ return 1;
+ return strcmp(cwd, argv[2]) ? 1 : 0;
+ }
+
+ if (argc == 3 && !strcmp(argv[1], "--check-sigmask")) {
+ sigset_t mask;
+ int sig = atoi(argv[2]);
+
+ if (sigprocmask(SIG_BLOCK, NULL, &mask))
+ return 1;
+ return sigismember(&mask, sig) == 1 ? 0 : 1;
+ }
+
+ if (argc == 3 && !strcmp(argv[1], "--check-sigdefault")) {
+ struct sigaction sa;
+ int sig = atoi(argv[2]);
+
+ if (sigaction(sig, NULL, &sa))
+ return 1;
+ return sa.sa_handler == SIG_DFL ? 0 : 1;
+ }
+
+ if (argc == 2 && !strcmp(argv[1], "--write-stdio")) {
+ if (write(STDOUT_FILENO, "stdout-token\n", 13) != 13)
+ return 1;
+ if (write(STDERR_FILENO, "stderr-token\n", 13) != 13)
+ return 1;
+ return 0;
+ }
+
+ true_path = find_true();
+ if (!true_path)
+ ksft_exit_skip("could not find true executable\n");
+
+ len = readlink("/proc/self/exe", self_path, sizeof(self_path) - 1);
+ if (len < 0)
+ ksft_exit_fail_msg("readlink(/proc/self/exe) failed: %s\n",
+ strerror(errno));
+ self_path[len] = '\0';
+
+ check_syscall_available();
+
+ ksft_print_header();
+ ksft_set_plan(17);
+
+ run_test("basic spawn", test_basic_spawn);
+ run_test("relative path rejected", test_relative_path_rejected);
+ run_test("execfd execute permission checked",
+ test_execfd_requires_execute);
+ run_test("default fd close", test_default_closes_extra_fds);
+ run_test("close_range action max fd", test_close_range_max_action);
+ run_test("dup2 stdio actions", test_dup2_stdio_actions);
+ run_test("open action stdin", test_open_action_stdin);
+ run_test("fchdir action", test_fchdir_action);
+ run_test("sigmask action", test_sigmask_action);
+ run_test("sigdefault action", test_sigdefault_action);
+ run_test("inherit fds flag", test_inherit_fds_flag);
+ run_test("pidfd waitid", test_pidfd_waitid);
+ run_test("create-time actions rejected", test_create_actions_rejected);
+ run_test("script template unsupported", test_script_template_unsupported);
+ run_test("deny write while template alive",
+ test_deny_write_while_template_alive);
+ run_test("stale path rejected", test_stale_path_rejected);
+ run_test("path replacement allows tool update",
+ test_path_replacement_allows_tool_update);
+
+ ksft_finished();
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (12 preceding siblings ...)
2026-05-28 9:52 ` [RFC PATCH v1 13/13] selftests/exec: cover spawn template basics Li Chen
@ 2026-05-28 11:02 ` Christian Brauner
2026-06-01 2:47 ` Li Chen
2026-06-01 19:55 ` Kees Cook
2026-05-28 12:55 ` Mateusz Guzik
2026-05-28 18:27 ` Andy Lutomirski
15 siblings, 2 replies; 20+ messages in thread
From: Christian Brauner @ 2026-05-28 11:02 UTC (permalink / raw)
To: Li Chen
Cc: Kees Cook, Alexander Viro, linux-fsdevel, linux-api, linux-kernel,
linux-mm, linux-arch, linux-doc, linux-kselftest, x86,
Arnd Bergmann, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Jan Kara,
Jonathan Corbet, Shuah Khan
On Thu, May 28, 2026 at 05:52:21PM +0800, Li Chen wrote:
> Hi,
>
> This is an early RFC for an idea that is probably still rough in both the
> UAPI and implementation details. Sorry for the rough edges; I am sending
> it now to check whether this direction is worth pursuing and to get
> feedback on the kernel/userspace boundary.
The idea of having a builder api for exec isn't all that crazy. But it
should simply be built on top of pidfds and thus pidfs itself instead.
It has all the basic infrastructure in place already. Any implementation
should also allow userspace to implement posix_spawn() on top of it.
fd = pidfd_open(0, PIDFD_EMPTY /* or better name */)
pidfd_config(fd, ...) // modeled similar to fsconfig()
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (13 preceding siblings ...)
2026-05-28 11:02 ` [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Christian Brauner
@ 2026-05-28 12:55 ` Mateusz Guzik
2026-06-01 15:11 ` Li Chen
2026-05-28 18:27 ` Andy Lutomirski
15 siblings, 1 reply; 20+ messages in thread
From: Mateusz Guzik @ 2026-05-28 12:55 UTC (permalink / raw)
To: Li Chen
Cc: Christian Brauner, Kees Cook, Alexander Viro, linux-fsdevel,
linux-api, linux-kernel, linux-mm, linux-arch, linux-doc,
linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan
On Thu, May 28, 2026 at 05:52:21PM +0800, Li Chen wrote:
> This RFC adds spawn_template, a userspace-controlled exec acceleration
> mechanism for runtimes that repeatedly start the same executable with
> different argv, envp, and per-spawn file descriptor setup.
>
> The main target is agent runtimes. Modern coding agents repeatedly start
> short-lived helper tools such as rg, git, sed, awk, python, node, and
> shell wrappers while they inspect and edit a workspace. Those runtimes
> already know which tools are hot, and they are also the right place to
> decide policy. The kernel does not choose names such as rg, git, or sed.
> Userspace opts in by creating a template fd for one executable, then uses
> that fd for later spawns. Launchers, shells, and build systems have a
> similar repeated-startup shape and could use the same primitive, but the
> agent runtime case is the main motivation for this RFC.
>
[..]
> A typical agent runtime would keep one template per hot executable and
> still build argv, envp, cwd, and pipe wiring for each tool call:
>
> rg_tmpl = spawn_template_create("/usr/bin/rg");
>
> for each search request:
> out_r, out_w = pipe_cloexec();
> err_r, err_w = pipe_cloexec();
> actions = [
> FCHDIR(worktree_fd),
> DUP2(out_w, STDOUT_FILENO),
> DUP2(err_w, STDERR_FILENO),
> ];
> child = spawn_template_spawn(rg_tmpl, rg_argv, envp, actions);
> close(out_w);
> close(err_w);
> read out_r and err_r;
> waitid(P_PIDFD, child.pidfd, ...);
>
>
[..]
> The cached state is intentionally small. The template fd keeps the opened
> main executable file, an optional absolute path string, the creator
> credential pointer, and the deny-write state. The executable identity key
> records device, inode, size, mode, owner, ctime, and mtime, and is
> rechecked before cached metadata is used. The ELF cache keeps only the
> main executable's ELF header, program header table, and program header
> count.
>
> cached in this RFC not cached in this RFC
> ------------------ ----------------------
> opened main executable PT_INTERP metadata
> executable identity key shared-library graph
> main ELF header VMA layout metadata
> main ELF program headers cross-process metadata sharing
> creator cred pointer
> deny-write state
>
> This RFC does not cache ELF interpreter metadata, shared-library
> dependency state, or derived mapping-layout state. Shared-library
> resolution is dynamic linker policy and depends on LD_LIBRARY_PATH,
> RPATH, RUNPATH, /etc/ld.so.cache, mount namespaces, and secure-exec
> state. It also does not share cached executable metadata between template
> fds created by different processes. Each template owns its small cached
> metadata object in this RFC.
>
> Performance
> ===========
>
[..]
> Workload Calls subprocess spawn_template time_s Delta
> (workers) calls calls/s calls/s seconds
> 1x16 6144 411.04 420.32 14.95/14.62 +2.26%
> 2x8 6144 666.78 690.08 9.21/8.90 +3.49%
> 4x4 6144 955.61 1003.25 6.43/6.12 +4.99%
> 8x2 6144 1048.25 1069.18 5.86/5.75 +2.00%
>
This problem is dear to my heart and I have been pondering it on and off
for some time now. The entire fork + exec idiom is terrible and needs to
be retired.
Is this vibe-coded? I asked claude for in-kernel posix_spawn for kicks
some time ago and it generated remarkably similar code. But that's a
tangent.
I'm rather confused by the angle in the patchset. Most of this shaves
off a tiny amount of work, while retaining the primary avoidable reason
for bad performance: the very fact that fork is part of the picture,
especially the part mucking with mm. Creating a pristine process is the
way to go.
Additionally there is a known problem where transiently copied file
descriptors on fork + exec cause a headache in multithreaded programs
doing something like this in parallel. I only did cursory reading, it
seems your patchset keeps the same problem in place.
There are numerous impactful ways to speed up execs both in terms of
single-threaded cost and their multicore scalability, most of which
would be immediately usable by all programs without an opt-in. imo these
needs to be exhausted before something like a "template" can be
considered.
Per the above, the primary win would stem from *NOT* messing with mm.
As in, whatever the interface, it needs to create an "empty" target
process (for lack of a better term).
In terms of userspace-visible APIs, a clean solution escapes me.
Some time ago I proposed returning a handle which is populated over time
by the parnet-to-be. One of the problems with it I failed to consider at
the time is NUMA locality -- what if the process to be created is going
to run on another domain? For example, opening and installing a file for
its later use will result in avoidable loss of locality for some of the
in-kernel data. That's on top of the fd vs fork problem.
From perf standpoint, the final goal of whatever mechanism should be a
state where the target process avoided copying any state it did not need
to and which allocated any memory it needed from local NUMA node
(whatever it may happen to be). Of course if no affinity is assigned it
may happen to move again and lose such locality, nothing can be done
about that. But pretend the process is to run in a specific node the
parent is NOT running in.
So I think the pragmatic way forward is to implement something close to
posix_spawn in the kernel. It may make sense for the thing to take the
PATH argument for repeated exec attempts. I understand this is of no use
in your particular case, but it very much IS of use for most of the
real-world. The initial implementation might even start with doing vfork
just to get it off the ground.
The next step would be to extend the interface with means to AVOID
copying any file descriptors. There could be a dedicated file action
which tells the kernel to avoid such copies or something like a
close_range file action (or close_from) -- with a range like <0, INT_MAX>
you know no fds are copied.
For the NUMA angle to be sorted out, any file action which opens a file
or dups from the parent needs to execute in the child. And frankly
something would be needed to ask the scheduler where does it think the
child is going to run, so that the task_struct itself can also be
allocated with the right backing.
I have not looked into what's needed to create a new process and NOT
mess with mm, but I don't think there are unsolvable problems there, at
worst some churn.
There are of course other parameters which need to be sorted out, that's
covered by the posix_spawn thing.
This e-mail is long enough, so I'm not going to go into issues
concerning exec itself right now.
tl;dr I would suggest redoing the patchset as posix_spawn and then doing
the actual optimization of not cloning mm itself.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
` (14 preceding siblings ...)
2026-05-28 12:55 ` Mateusz Guzik
@ 2026-05-28 18:27 ` Andy Lutomirski
15 siblings, 0 replies; 20+ messages in thread
From: Andy Lutomirski @ 2026-05-28 18:27 UTC (permalink / raw)
To: Li Chen
Cc: Christian Brauner, Kees Cook, Alexander Viro, linux-fsdevel,
linux-api, linux-kernel, linux-mm, linux-arch, linux-doc,
linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan
On Thu, May 28, 2026 at 2:55 AM Li Chen <me@linux.beauty> wrote:
>
>
> The template pins the executable and denies writes to that file while the
> template fd is alive,
Please don't. *Maybe* detect when it gets modified and clear your cache.
Or develop a generic way to open a new fd that's an immutable view
into an existing file such that the fd retains its contents even if
the file changes. (Think a reflink that's not persistent and has no
name -- you'll need some way to avoid resource exhaustion.)
>
> Workload Calls subprocess spawn_template time_s Delta
> (workers) calls calls/s calls/s seconds
> 1x16 6144 411.04 420.32 14.95/14.62 +2.26%
> 2x8 6144 666.78 690.08 9.21/8.90 +3.49%
> 4x4 6144 955.61 1003.25 6.43/6.12 +4.99%
> 8x2 6144 1048.25 1069.18 5.86/5.75 +2.00%
This is a lot of complexity in the kernel for a teeny tiny gain.
I'm with Christian -- a better spawn API would be great (and much
faster than fork/vfork + exec), but that's a different patch.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 11:02 ` [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Christian Brauner
@ 2026-06-01 2:47 ` Li Chen
2026-06-01 19:55 ` Kees Cook
1 sibling, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-06-01 2:47 UTC (permalink / raw)
To: Christian Brauner
Cc: Kees Cook, Alexander Viro, linux-fsdevel, linux-api, linux-kernel,
linux-mm, linux-arch, linux-doc, linux-kselftest, x86,
Arnd Bergmann, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Jan Kara,
Jonathan Corbet, Shuah Khan
Hi Christian,
Thanks a lot for your great review!
---- On Thu, 28 May 2026 19:02:53 +0800 Christian Brauner <brauner@kernel.org> wrote ---
> On Thu, May 28, 2026 at 05:52:21PM +0800, Li Chen wrote:
> > Hi,
> >
> > This is an early RFC for an idea that is probably still rough in both the
> > UAPI and implementation details. Sorry for the rough edges; I am sending
> > it now to check whether this direction is worth pursuing and to get
> > feedback on the kernel/userspace boundary.
>
> The idea of having a builder api for exec isn't all that crazy. But it
> should simply be built on top of pidfds and thus pidfs itself instead.
> It has all the basic infrastructure in place already.
Yes, that makes a lot more sense. I was staring too hard at the "hot
executable" part and made the cache/template the API, which was probably
the wrong thing to expose. Sorry about that.
> Any implementation
> should also allow userspace to implement posix_spawn() on top of it.
That's so cool, and this is a really useful point. I had not thought about this as
something that could sit under posix_spawn(), but that makes the target
much clearer. It should be a generic exec/spawn builder first, and the
agent use case should just be one user of it.
> fd = pidfd_open(0, PIDFD_EMPTY /* or better name */)
>
> pidfd_config(fd, ...) // modeled similar to fsconfig()
Reusing pidfd_open() with an empty target is nice because it keeps the API close
to pidfds, but I wonder if a separate entry point such as
pidfd_spawn_open() or pidfd_create() would make the "new process
builder" case a bit more explicit? Either way, the configuration side
being fsconfig-like makes sense to me.
Thanks again for pointing me in this direction. It helps a lot.
Regards,
Li
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 12:55 ` Mateusz Guzik
@ 2026-06-01 15:11 ` Li Chen
0 siblings, 0 replies; 20+ messages in thread
From: Li Chen @ 2026-06-01 15:11 UTC (permalink / raw)
To: Mateusz Guzik
Cc: Christian Brauner, Kees Cook, Alexander Viro, linux-fsdevel,
linux-api, linux-kernel, linux-mm, linux-arch, linux-doc,
linux-kselftest, x86, Arnd Bergmann, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Jan Kara, Jonathan Corbet, Shuah Khan
Hi Mateusz,
---- On Thu, 28 May 2026 20:55:32 +0800 Mateusz Guzik <mjguzik@gmail.com> wrote ---
> On Thu, May 28, 2026 at 05:52:21PM +0800, Li Chen wrote:
> > This RFC adds spawn_template, a userspace-controlled exec acceleration
> > mechanism for runtimes that repeatedly start the same executable with
> > different argv, envp, and per-spawn file descriptor setup.
> >
> > The main target is agent runtimes. Modern coding agents repeatedly start
> > short-lived helper tools such as rg, git, sed, awk, python, node, and
> > shell wrappers while they inspect and edit a workspace. Those runtimes
> > already know which tools are hot, and they are also the right place to
> > decide policy. The kernel does not choose names such as rg, git, or sed.
> > Userspace opts in by creating a template fd for one executable, then uses
> > that fd for later spawns. Launchers, shells, and build systems have a
> > similar repeated-startup shape and could use the same primitive, but the
> > agent runtime case is the main motivation for this RFC.
> >
> [..]
> > A typical agent runtime would keep one template per hot executable and
> > still build argv, envp, cwd, and pipe wiring for each tool call:
> >
> > rg_tmpl = spawn_template_create("/usr/bin/rg");
> >
> > for each search request:
> > out_r, out_w = pipe_cloexec();
> > err_r, err_w = pipe_cloexec();
> > actions = [
> > FCHDIR(worktree_fd),
> > DUP2(out_w, STDOUT_FILENO),
> > DUP2(err_w, STDERR_FILENO),
> > ];
> > child = spawn_template_spawn(rg_tmpl, rg_argv, envp, actions);
> > close(out_w);
> > close(err_w);
> > read out_r and err_r;
> > waitid(P_PIDFD, child.pidfd, ...);
> >
> >
> [..]
> > The cached state is intentionally small. The template fd keeps the opened
> > main executable file, an optional absolute path string, the creator
> > credential pointer, and the deny-write state. The executable identity key
> > records device, inode, size, mode, owner, ctime, and mtime, and is
> > rechecked before cached metadata is used. The ELF cache keeps only the
> > main executable's ELF header, program header table, and program header
> > count.
> >
> > cached in this RFC not cached in this RFC
> > ------------------ ----------------------
> > opened main executable PT_INTERP metadata
> > executable identity key shared-library graph
> > main ELF header VMA layout metadata
> > main ELF program headers cross-process metadata sharing
> > creator cred pointer
> > deny-write state
> >
> > This RFC does not cache ELF interpreter metadata, shared-library
> > dependency state, or derived mapping-layout state. Shared-library
> > resolution is dynamic linker policy and depends on LD_LIBRARY_PATH,
> > RPATH, RUNPATH, /etc/ld.so.cache, mount namespaces, and secure-exec
> > state. It also does not share cached executable metadata between template
> > fds created by different processes. Each template owns its small cached
> > metadata object in this RFC.
> >
> > Performance
> > ===========
> >
> [..]
> > Workload Calls subprocess spawn_template time_s Delta
> > (workers) calls calls/s calls/s seconds
> > 1x16 6144 411.04 420.32 14.95/14.62 +2.26%
> > 2x8 6144 666.78 690.08 9.21/8.90 +3.49%
> > 4x4 6144 955.61 1003.25 6.43/6.12 +4.99%
> > 8x2 6144 1048.25 1069.18 5.86/5.75 +2.00%
> >
>
> This problem is dear to my heart and I have been pondering it on and off
> for some time now. The entire fork + exec idiom is terrible and needs tox
> be retired.
>
> Is this vibe-coded? I asked claude for in-kernel posix_spawn for kicks
> some time ago and it generated remarkably similar code. But that's a
> tangent.
Partly, yes. The original idea came from using agents myself and noticing
that they spend a lot of time starting short-lived tools such as rg, sed,
git, bash, and python. I was wondering whether repeated tool calls could
be made cheaper.
After that I used an LLM to bounce around the smallest kernel prototype
for the idea. I did some review, patch split, test, benchmark, leak-check work,
and throw away some cache codes that not actually useful.
> I'm rather confused by the angle in the patchset. Most of this shaves
> off a tiny amount of work, while retaining the primary avoidable reason
> for bad performance: the very fact that fork is part of the picture,
> especially the part mucking with mm. Creating a pristine process is the
> way to go.
>
> Additionally there is a known problem where transiently copied file
> descriptors on fork + exec cause a headache in multithreaded programs
> doing something like this in parallel. I only did cursory reading, it
> seems your patchset keeps the same problem in place.
>
> There are numerous impactful ways to speed up execs both in terms of
> single-threaded cost and their multicore scalability, most of which
> would be immediately usable by all programs without an opt-in. imo these
> needs to be exhausted before something like a "template" can be
> considered.
>
> Per the above, the primary win would stem from *NOT* messing with mm.
>
> As in, whatever the interface, it needs to create an "empty" target
> process (for lack of a better term).
>
> In terms of userspace-visible APIs, a clean solution escapes me.
>
> Some time ago I proposed returning a handle which is populated over time
> by the parnet-to-be. One of the problems with it I failed to consider at
> the time is NUMA locality -- what if the process to be created is going
> to run on another domain? For example, opening and installing a file for
> its later use will result in avoidable loss of locality for some of the
> in-kernel data. That's on top of the fd vs fork problem.
>
> From perf standpoint, the final goal of whatever mechanism should be a
> state where the target process avoided copying any state it did not need
> to and which allocated any memory it needed from local NUMA node
> (whatever it may happen to be). Of course if no affinity is assigned it
> may happen to move again and lose such locality, nothing can be done
> about that. But pretend the process is to run in a specific node the
> parent is NOT running in.
>
> So I think the pragmatic way forward is to implement something close to
> posix_spawn in the kernel. It may make sense for the thing to take the
> PATH argument for repeated exec attempts. I understand this is of no use
> in your particular case, but it very much IS of use for most of the
> real-world. The initial implementation might even start with doing vfork
> just to get it off the ground.
>
> The next step would be to extend the interface with means to AVOID
> copying any file descriptors. There could be a dedicated file action
> which tells the kernel to avoid such copies or something like a
> close_range file action (or close_from) -- with a range like <0, INT_MAX>
> you know no fds are copied.
>
> For the NUMA angle to be sorted out, any file action which opens a file
> or dups from the parent needs to execute in the child. And frankly
> something would be needed to ask the scheduler where does it think the
> child is going to run, so that the task_struct itself can also be
> allocated with the right backing.
>
> I have not looked into what's needed to create a new process and NOT
> mess with mm, but I don't think there are unsolvable problems there, at
> worst some churn.
>
> There are of course other parameters which need to be sorted out, that's
> covered by the posix_spawn thing.
>
> This e-mail is long enough, so I'm not going to go into issues
> concerning exec itself right now.
>
> tl;dr I would suggest redoing the patchset as posix_spawn and then doing
> the actual optimization of not cloning mm itself.
>
Thanks a lot for writing this up. I clearly had too narrow a view of the
problem. I was mostly thinking about repeated executable startup, but your
reply and Christian's and Andy's made me see that the more useful target is probably
a pidfd/pidfs-backed process builder which can sit under posix_spawn, and
then grow into something that avoids the fork-shaped mm and fd costs. I
learned a lot from this thread.
At a high level, Windows CreateProcess/NtCreateUserProcess also looks
closer to this direction than fork+exec: create the target process
directly, pass explicit startup attributes and handle inheritance state,
and avoid starting from a copy of the parent address space. That seems
to be the same basic advantage here: build the child closer to its final
shape instead of copying parent state and then throwing much of it away.
I will study the process creation, exec, pidfd/pidfs, and posix_spawn
codes more carefully, then try the direction you suggested
and benchmark the mm/fd costs.
Regards,
Li
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
2026-05-28 11:02 ` [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Christian Brauner
2026-06-01 2:47 ` Li Chen
@ 2026-06-01 19:55 ` Kees Cook
1 sibling, 0 replies; 20+ messages in thread
From: Kees Cook @ 2026-06-01 19:55 UTC (permalink / raw)
To: Christian Brauner
Cc: Li Chen, Alexander Viro, linux-fsdevel, linux-api, linux-kernel,
linux-mm, linux-arch, linux-doc, linux-kselftest, x86,
Arnd Bergmann, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Jan Kara,
Jonathan Corbet, Shuah Khan
On Thu, May 28, 2026 at 01:02:53PM +0200, Christian Brauner wrote:
> On Thu, May 28, 2026 at 05:52:21PM +0800, Li Chen wrote:
> > Hi,
> >
> > This is an early RFC for an idea that is probably still rough in both the
> > UAPI and implementation details. Sorry for the rough edges; I am sending
> > it now to check whether this direction is worth pursuing and to get
> > feedback on the kernel/userspace boundary.
>
> The idea of having a builder api for exec isn't all that crazy. But it
> should simply be built on top of pidfds and thus pidfs itself instead.
> It has all the basic infrastructure in place already. Any implementation
> should also allow userspace to implement posix_spawn() on top of it.
>
> fd = pidfd_open(0, PIDFD_EMPTY /* or better name */)
>
> pidfd_config(fd, ...) // modeled similar to fsconfig()
FWIW, I agree this should be modelled after fsconfig and built on pidfs.
Doing so will avoid a bunch of design issues, etc.
-Kees
--
Kees Cook
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-06-01 19:55 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 9:52 [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 01/13] exec: factor argument setup out of do_execveat_common() Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 02/13] exec: add an internal helper for opened executables Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 03/13] file: expose helpers for in-kernel fd actions Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 04/13] exec: add spawn template UAPI definitions Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 05/13] exec: add spawn template file descriptors Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 06/13] exec: add spawn_template_spawn() Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 07/13] exec: validate spawn template executable identity Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 08/13] binfmt_elf: cache ELF metadata for spawn templates Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 09/13] Documentation: describe " Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 10/13] exec: require absolute paths for path-created templates Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 11/13] exec: let close-range actions target the max fd Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 12/13] syscalls: add generic spawn template entries Li Chen
2026-05-28 9:52 ` [RFC PATCH v1 13/13] selftests/exec: cover spawn template basics Li Chen
2026-05-28 11:02 ` [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup Christian Brauner
2026-06-01 2:47 ` Li Chen
2026-06-01 19:55 ` Kees Cook
2026-05-28 12:55 ` Mateusz Guzik
2026-06-01 15:11 ` Li Chen
2026-05-28 18:27 ` Andy Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox