From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.virtlab.unibo.it (mail.virtlab.unibo.it [130.136.161.50]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2A60740D57F for ; Fri, 3 Jul 2026 15:01:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=130.136.161.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783090917; cv=none; b=lZBCnyAZ9scus/KXEMUjJXt1g90Cnyw7GlmgukTpcpFAMr9LRljbejsMirNKVmhGTJxwlBekGV9z3ztY7NguoIxllfPEoH+3PGqxcdfkvr4lN3jfkr0arPYHcQ7sse0UtC4NFOGJ9KX+1glroEJbQFx10nJwtGFzl8Ka1CJb1Ew= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783090917; c=relaxed/simple; bh=ZefhxHG27GErNHOvTD87BeDTp3AuYZPtgHB6Xm+/2Vs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=JcanvDD95K9yhkbR39ojhAQtEJtB7wayCGz5kmAETvrRsvNPgMjeu2IydQxYzru1evAow9a0EZKP0sRRaJb3Avz0wPBbcCOenUglBM2P4TLU+0LpgIFFkWdEH6RRq0+RmnGP3HSOtKGYPcRh68+FOpw/him+No7ihFd/ksIZXZk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cs.unibo.it; spf=pass smtp.mailfrom=cs.unibo.it; dkim=pass (1024-bit key) header.d=cs.unibo.it header.i=@cs.unibo.it header.b=gwKll8SU; arc=none smtp.client-ip=130.136.161.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cs.unibo.it Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cs.unibo.it Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=cs.unibo.it header.i=@cs.unibo.it header.b="gwKll8SU" Received: from cs.unibo.it (host0.studiodavoli.it [109.234.61.1]) by mail.virtlab.unibo.it (Postfix) with ESMTPSA id 3E71B1C0095; Fri, 3 Jul 2026 17:01:49 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.unibo.it; s=virtlab; t=1783090909; bh=ZefhxHG27GErNHOvTD87BeDTp3AuYZPtgHB6Xm+/2Vs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gwKll8SUo1JVM5i6k5fIvd9GtcU0GW6E5oKYF5TcMYcroy00VH2VivJuDEiUrhcMU SVqy9KXzZ5ePIa9TnYmMNZg0gJ4OPjGdda8wQ7Ol19Lw/hWYvvZoML4Pl+6qEAxQBq Z8n4pBCDHHLoyZC4wpAWe8Ce1qKBgV5Z/XXd6Y+Q= Date: Fri, 3 Jul 2026 17:01:47 +0200 From: Renzo Davoli To: Oleg Nesterov Cc: linux-kernel@vger.kernel.org, Andrew Morton , Shuah Khan , Alexey Gladkov , Eugene Syromyatnikov , Davide Berardi , strace-devel@lists.strace.io, "Dmitry V . Levin" Subject: Re: [PATCH v2 4/5] ptrace: add PTRACE_SYSCALL_INFO_FLAG_SET_IP Message-ID: References: <20260703105027.539399-1-renzo@cs.unibo.it> <20260703105027.539399-5-renzo@cs.unibo.it> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Fri, Jul 03, 2026 at 01:04:59PM +0200, Oleg Nesterov wrote: > On 07/03, Renzo Davoli wrote: > > > > This flag adds support for modifying the tracee's instruction pointer. > > > > To do this, the tracer stores the new instruction pointer value in the > > instruction_pointer field of the ptrace_syscall_info structure and > > sets the PTRACE_SYSCALL_INFO_FLAG_SET_IP flag in the flags field. > > But why? Who will use this feature and for what? How often? > > I think the changelog should be more convincing... I'll add this to V3 cover letter. renzo PTRACE_SYSCALL_INFO_FLAG_SET_IP The proposal does not add any new ptrace capability. It merely provides a portable interface for a capability that already exists and is already relied upon by existing applications. WHY PTRACE_SYSCALL_INFO_FLAG_SET_IP completes the set of actions that a tracer can request when intercepting a system call. A tracer can currently instruct a tracee to: * execute the original system call; * execute a different system call (or the same system call with modified arguments); * skip the system call and provide the desired return value and/or errno. The proposed PTRACE_SYSCALL_INFO_FLAG_SET_IP adds a fourth possibility: * execute an arbitrary sequence of two or more system calls in place of the original one. The mechanism is straightforward. During a PTRACE_SYSCALL_INFO_EXIT stop, the tracer rewinds the instruction pointer to the system call instruction (e.g. by 2 bytes on x86-64 for syscall, or by the appropriate amount on other architectures). When the tracee resumes, it immediately generates a new syscall-entry stop, allowing the tracer to provide a new system call number and arguments. By repeating this process, a tracer can transparently replace a single system call with any sequence of system calls. This capability already exists on all architectures through architecture-specific interfaces such as PTRACE_POKEUSER, PTRACE_SETREGS, or PTRACE_SETREGSET. PTRACE_SYSCALL_INFO_FLAG_SET_IP does not introduce a new capability; it merely exposes an existing one through the portable PTRACE_GET_SYSCALL_INFO/PTRACE_SET_SYSCALL_INFO API. WHO The VUOS project uses this mechanism extensively. VUOS provides namespace-like execution environments implemented entirely in user space, without relying on kernel namespaces. https://wiki.virtualsquare.org/#/tutorials/vuosbasics For example, VUOS allows unprivileged processes to use user-space implementations of filesystems (FUSE), networking stacks, virtual devices, and other resources. To improve scalability on multicore systems, VUOS implements what we call the guardian angel model: each traced thread has its own dedicated tracer thread. This avoids a single tracer becoming a bottleneck. When a traced thread creates a child, ownership of the new tracee must be transferred to a newly created guardian angel. This requires delaying execution of the child's first system call until the new tracer has attached. The current implementation proceeds as follows: * save the original system call number and arguments; * replace the system call with a blocking ppoll(NULL, 0, NULL, NULL) call; * detach the original tracer; * attach the new guardian angel using PTRACE_SEIZE; * interrupt the blocking ppoll() with PTRACE_INTERRUPT; * at the subsequent syscall-exit stop, rewind the instruction pointer to the system call instruction; * at the following syscall-entry stop, restore the original system call number and arguments. This mechanism is currently implemented using architecture-specific register manipulation. PTRACE_SYSCALL_INFO_FLAG_SET_IP would allow the same implementation to be written using the portable ptrace syscall information API. Although VUOS is the primary motivation for this proposal, the feature is generally useful for any project implementing ptrace-based system call interposition, including PRoot, strace's syscall injection machinery, and similar frameworks.