The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Renzo Davoli <renzo@cs.unibo.it>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Alexey Gladkov <legion@kernel.org>,
	Eugene Syromyatnikov <evgsyr@gmail.com>,
	Davide Berardi <berardi.dav@gmail.com>,
	strace-devel@lists.strace.io, "Dmitry V . Levin" <ldv@strace.io>
Subject: Re: [PATCH v2 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP
Date: Fri, 3 Jul 2026 17:01:47 +0200	[thread overview]
Message-ID: <akfO2xqhDjWU-uDs@cs.unibo.it> (raw)
In-Reply-To: <akeXWxiS0GcW1Hh0@redhat.com>

On Fri, Jul 03, 2026 at 01:04:59PM +0200, Oleg Nesterov wrote:
> On 07/03, Renzo Davoli wrote:
> >
> > This flag adds support for modifying the tracee's instruction pointer.
> >
> > To do this, the tracer stores the new instruction pointer value in the
> > instruction_pointer field of the ptrace_syscall_info structure and
> > sets the PTRACE_SYSCALL_INFO_FLAG_SET_IP flag in the flags field.
> 
> But why? Who will use this feature and for what? How often?
> 
> I think the changelog should be more convincing...

I'll add this to V3 cover letter.

	renzo


PTRACE_SYSCALL_INFO_FLAG_SET_IP

The proposal does not add any new ptrace capability. It merely provides a
portable interface for a capability that already exists and is already relied
upon by existing applications.

WHY

PTRACE_SYSCALL_INFO_FLAG_SET_IP completes the set of actions that a tracer can
request when intercepting a system call.

A tracer can currently instruct a tracee to:

* execute the original system call;
* execute a different system call (or the same system call with modified arguments);
* skip the system call and provide the desired return value and/or errno.

The proposed PTRACE_SYSCALL_INFO_FLAG_SET_IP adds a fourth possibility:
* execute an arbitrary sequence of two or more system calls in place of the original one.

The mechanism is straightforward. During a PTRACE_SYSCALL_INFO_EXIT stop, the
tracer rewinds the instruction pointer to the system call instruction (e.g. by
2 bytes on x86-64 for syscall, or by the appropriate amount on other
architectures). When the tracee resumes, it immediately generates a new
syscall-entry stop, allowing the tracer to provide a new system call number and
arguments. By repeating this process, a tracer can transparently replace a
single system call with any sequence of system calls.

This capability already exists on all architectures through
architecture-specific interfaces such as PTRACE_POKEUSER, PTRACE_SETREGS, or
PTRACE_SETREGSET. PTRACE_SYSCALL_INFO_FLAG_SET_IP does not introduce a new
capability; it merely exposes an existing one through the portable
PTRACE_GET_SYSCALL_INFO/PTRACE_SET_SYSCALL_INFO API.  

WHO

The VUOS project uses this mechanism extensively.
VUOS provides namespace-like execution environments implemented entirely in
user space, without relying on kernel namespaces.
https://wiki.virtualsquare.org/#/tutorials/vuosbasics

For example, VUOS allows unprivileged processes to use user-space
implementations of filesystems (FUSE), networking stacks, virtual devices, and
other resources.

To improve scalability on multicore systems, VUOS implements what we call the
guardian angel model: each traced thread has its own dedicated tracer thread.
This avoids a single tracer becoming a bottleneck.

When a traced thread creates a child, ownership of the new tracee must be
transferred to a newly created guardian angel. This requires delaying execution
of the child's first system call until the new tracer has attached.

The current implementation proceeds as follows:
* save the original system call number and arguments;
* replace the system call with a blocking ppoll(NULL, 0, NULL, NULL) call;
* detach the original tracer;
* attach the new guardian angel using PTRACE_SEIZE;
* interrupt the blocking ppoll() with PTRACE_INTERRUPT;
* at the subsequent syscall-exit stop, rewind the instruction pointer to the system call instruction;
* at the following syscall-entry stop, restore the original system call number and arguments.

This mechanism is currently implemented using architecture-specific register
manipulation. PTRACE_SYSCALL_INFO_FLAG_SET_IP would allow the same
implementation to be written using the portable ptrace syscall information API.

Although VUOS is the primary motivation for this proposal, the feature is
generally useful for any project implementing ptrace-based system call
interposition, including PRoot, strace's syscall injection machinery, and
similar frameworks.

  reply	other threads:[~2026-07-03 15:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-03 10:50 [PATCH v2 0/5] ptrace_set_syscall_info: add support for seccomp syscall skipping and instruction pointer modification Renzo Davoli
2026-07-03 10:50 ` [PATCH v2 1/5] ptrace: PTRACE_SET_SYSCALL_INFO syscall skipping support Renzo Davoli
2026-07-03 10:58   ` Oleg Nesterov
2026-07-03 11:48   ` Oleg Nesterov
2026-07-03 10:50 ` [PATCH v2 2/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO syscall skipping Renzo Davoli
2026-07-03 10:50 ` [PATCH v2 3/5] asm/ptrace.h: add instruction_pointer_set Renzo Davoli
2026-07-03 10:50 ` [PATCH v2 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli
2026-07-03 11:04   ` Oleg Nesterov
2026-07-03 15:01     ` Renzo Davoli [this message]
2026-07-03 15:54       ` Oleg Nesterov
2026-07-03 10:50 ` [PATCH v2 5/5] selftests/ptrace: add a test case for PTRACE_SYSCALL_INFO_FLAG_SET_IP Renzo Davoli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akfO2xqhDjWU-uDs@cs.unibo.it \
    --to=renzo@cs.unibo.it \
    --cc=akpm@linux-foundation.org \
    --cc=berardi.dav@gmail.com \
    --cc=evgsyr@gmail.com \
    --cc=ldv@strace.io \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=shuah@kernel.org \
    --cc=strace-devel@lists.strace.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox