Linux s390 Architecture development
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: "H. Peter Anvin" <hpa@zytor.com>,
	"Michal Suchánek" <msuchanek@suse.de>,
	"Peter Zijlstra" <peterz@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	Paul Walmsley <pjw@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Alexandre Ghiti <alex@ghiti.fr>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, Andrew Donnellan <andrew+kernel@donnellan.id.au>,
	Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mukesh Kumar Chaurasiya <mkchauras@linux.ibm.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Zong Li <zong.li@sifive.com>, Nam Cao <namcao@linutronix.de>,
	Deepak Gupta <debug@rivosinc.com>,
	Lukas Gerlach <lukas.gerlach@cispa.de>,
	Rui Qi <qirui.001@bytedance.com>, Kees Cook <kees@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	loongarch@lists.linux.dev, linuxppc-dev@lists.ozlabs.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org
Subject: Re: [RFC] entry: Untangle the return value of syscall_enter_from_user_mode from syscall NR
Date: Thu, 02 Jul 2026 23:49:56 +0200	[thread overview]
Message-ID: <87h5mhnjsr.ffs@fw13> (raw)
In-Reply-To: <BA7CD91D-C0E5-47A1-B49C-BC6AF6604182@zytor.com>

On Wed, Jul 01 2026 at 11:29, H. Peter Anvin wrote:

Can you please trim your replies? Scrolling through hundred lines of
useless quoted text is just annoying.

> On July 1, 2026 10:42:08 AM PDT, "Michal Suchánek" <msuchanek@suse.de> wrote:
>>-static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
>>+static __always_inline long syscall_enter_from_user_mode(struct pt_regs *regs, long *syscall)
>> {
>> 	long ret;
>>

> 1. The type for a system call is int.

That ship has sailed long ago. man syscall ...

> 2. A valid system call number is always going to be positive.

That's true today.

> 3. Bits [30:24] are available for architecture ABI use. The
>    "architecture independent" part of the system call number is therefore
>    24 bits wide.
>
> 4. The exact ABI is platform-specific, obviously, but as a general
>    guideline (especially for new platforms/ABIs) should follow the rules
>    for a platform "int" if practical. Notably, when passing a value in a
>    register larger than 32 bits, which side of the calling interface is
>    responsible for sign-extending a value passed in a register. If caller
>    side, the kernel should validate, if callee side the kernel should
>    ignore the additional bits and do the extension.

The kernel sign expands today already, i.e. for compat syscalls.

> 5. A negative system call number is guaranteed to return -ENOSYS
>    (unless intercepted by seccomp, ptrace, or another mechanism under
>    user space control.)

That's true today.

ASM entry:
       regs->eax = -ENOSYS;

C entry:
       nr = syscall_enter_from_user_mode(regs, nr);

       if ((unsigned)nr < SYSCALL_MAX)
       	    regs->eax = handle_syscall();
       else if (nr != -1)
       	    regs->eax = -ENOSYS;

       ....

If seccomp overwrites regs->eax and aborts any syscall (including -1) by
returning -1, then the value seccomp wrote into regs->eax is preserved
and returned to user space.

The same applies for syscall_user_dispatch() and ptrace...() if they
decide to overwrite regs->eax _and_ abort the syscall by letting
syscall_enter_from_user_mode() return -1.

trace_syscall_enter() is not any different. If the magic BPF in there
rewrites the syscall number to -1 then either the original -ENOSYS or
the BPF induced overwrite is returned to user space.

It's less than obvious and I have no objections to clean that up and
make it more intuitive, but I still fail to see what Michal is actually
trying to solve and what the magic flag is for. If s390 requires it,
then that's an s390 problem, but definitely x86 does not.

> 6. If the platform needs to algorithmically modify the system call
>    number due to platform-specific concerns (say, the platform uses a
>    16-bit special purpose register for the syscall number, or it has
>    multiple kernel entry points with different behavior), it should if at
>    all possible transcode the system call number as necessary to match
>    this convention in APIs that are exposed to general kernel code.
>
> For example, in the future I could very much see the IA32 code in the
> x86 kernel using bit 29 internally to indicate an ia32 system call,
> simplifying the is_compat implementation on x86.

I don't see how that makes it simpler. Those are two different entry
code paths and magic bits wont make that go away.

> It should not mean that passing bit 29 to either the syscall
> instruction or int $0x80 will be accepted.

Your proposal looks even more like a solution in search of a problem
than the original one.

Thanks,

        tglx


  parent reply	other threads:[~2026-07-02 21:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-01 17:42 [RFC] entry: Untangle the return value of syscall_enter_from_user_mode from syscall NR Michal Suchánek
2026-07-01 17:58 ` sashiko-bot
2026-07-01 18:29 ` H. Peter Anvin
2026-07-02  9:30   ` Michal Suchánek
2026-07-02 21:49   ` Thomas Gleixner [this message]
2026-07-02  8:12 ` Sven Schnelle
2026-07-02  9:12   ` Michal Suchánek
2026-07-02 12:01     ` Sven Schnelle
2026-07-02 12:13       ` Michal Suchánek
2026-07-02 11:24 ` Thomas Gleixner
2026-07-02 11:45   ` Michal Suchánek
2026-07-02 20:45     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h5mhnjsr.ffs@fw13 \
    --to=tglx@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=alex@ghiti.fr \
    --cc=andrew+kernel@donnellan.id.au \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=chenhuacai@kernel.org \
    --cc=chleroy@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=debug@rivosinc.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kees@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=lukas.gerlach@cispa.de \
    --cc=luto@kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mkchauras@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=namcao@linutronix.de \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=peterz@infradead.org \
    --cc=pjw@kernel.org \
    --cc=qirui.001@bytedance.com \
    --cc=ryan.roberts@arm.com \
    --cc=skhan@linuxfoundation.org \
    --cc=sshegde@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=x86@kernel.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox