The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: "Michal Suchánek" <msuchanek@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	WANG Xuerui <kernel@xen0n.name>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
	Paul Walmsley <pjw@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Alexandre Ghiti <alex@ghiti.fr>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Donnellan <andrew+kernel@donnellan.id.au>,
	Mark Rutland <mark.rutland@arm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mukesh Kumar Chaurasiya <mkchauras@linux.ibm.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	Zong Li <zong.li@sifive.com>, Nam Cao <namcao@linutronix.de>,
	Deepak Gupta <debug@rivosinc.com>,
	Lukas Gerlach <lukas.gerlach@cispa.de>,
	Rui Qi <qirui.001@bytedance.com>, Kees Cook <kees@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	loongarch@lists.linux.dev, linuxppc-dev@lists.ozlabs.org,
	linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org
Subject: Re: [RFC] entry: Untangle the return value of syscall_enter_from_user_mode from syscall NR
Date: Thu, 02 Jul 2026 22:45:54 +0200	[thread overview]
Message-ID: <87jyrdnmrh.ffs@fw13> (raw)
In-Reply-To: <akZPakNl6JT_jgGd@kunlun.suse.cz>

On Thu, Jul 02 2026 at 13:45, Michal Suchánek wrote:
> On Thu, Jul 02, 2026 at 01:24:57PM +0200, Thomas Gleixner wrote:
>> On Wed, Jul 01 2026 at 19:42, Michal Suchánek wrote:
>> > The return value of syscall_enter_from_user_mode is used both for the
>> > adjusted syscall number and the indicator that a syscall should be
>> > skipped.
>> >
>> > As seccomp can be invoked on any syscall, including invalid ones this
>> > somewhat undermines seccomp.
>> >
>> > While the seccomp variants that terminate the process do not need to
>> > care about this for the filter that sets the syscall return value this
>> > disctinction is required.
>> 
>> You completely fail to explain why and what actual problem you are
>> trying to solve. At least I can't figure it out from the above word
>> salad.
>
> syscall_enter_from_user_mode returns the new syscall number after doing
> something arbitrarry with it, including running seccomp.
>
> Wehn the syscall is already handled, eg. by seccomp filtering it returns
> -1 as the new syscall number. -1 is an invalid syscall number but it can
> still be filtered by seccomp.

Once syscall_enter_from_user_mode() returns -1 nothing can filter it
anymore.

> When the syscall number was -1 to start with it's not possible to
> determine if the syscall was fileterd from the return value. s390
> returns the filtered state in a flag it sets on the regs structure,
> avoiding this problem.

What needs to determine whether the syscall was filtered or not?

> However, the API should be specified in a way that does not require
> everyone implementing such flag.

Which exact problem does the flag solve?

>> > -	instrumentation_begin();
>> > -	if (!invoke_syscall(regs, nr) && nr != -1)
>> > -	 	result_reg(regs) = __sys_ni_syscall(regs);
>> > -	instrumentation_end();
>> > +	/* Skip syscall when -1 is returned */
>> > +	if (!syscall_enter_from_user_mode(regs, &nr)) {
>> 
>> Seriously?
>> 
>> If we go and separate the syscall number from the return value, then the
>> return value 0 means success and anything else fail. Which in other
>> words is a boolean. So instead of tastelessly adding a completely
>> nonsensical comment about -1 here, syscall_enter_from_user_mode() wants
>> to have the return value type bool with a proper boolean logic: true =
>> success, false = abort.
>
> We have that very same API down to __secure_computing() which returns
> boolean represented as -1 and 0 values. That does not mean it's not
> tasteless.

Hahahahaha.

We have a lot of functions which have a boolean return value but a
int/long return type for historical reasons.

We've added bool because it's not ambiguous and allows the compiler to
optimize better. It also makes the code more clear. Modern code uses a
non-boolean return type only when there is an actual reason for it,
e.g. propagating an error code all the way back through the call chain.
The historical 0=success <0 = errorcode model really want's to be
restricted to such cases.

Just for the record:

  https://lore.kernel.org/all/67c3ae5c-d88b-4172-9996-4e2046b7e0dc@huawei.com/
  https://lore.kernel.org/all/20260629130616.642022-2-ruanjinjie@huawei.com/

So I stand with my comment that it is sloppy and tasteless to slap an
argument into a pile of functions, claim separation of return value and
syscall number and leave the return value in an ill-defined state.

>> > @@ -168,8 +168,7 @@ __visible noinstr void do_int80_emulation(struct pt_regs *regs)
>> >  	nr = syscall_32_enter(regs);
>> >  
>> >  	local_irq_enable();
>> > -	nr = syscall_enter_from_user_mode_work(regs, nr);
>> > -	do_syscall_32_irqs_on(regs, nr);
>> > +	syscall_enter_from_user_mode_work(regs, &nr);
>> 
>> How exactly is this ever going to invoke a valid syscall?
>
> That's one of the problems with giant all-in-one patch, things like this
> easily slip in. However, it is in cluded mostly for illustration, I
> don't expect anyone to merge this as-is.

It's a problem with hastily cobbled together slop. Even RFC patches
should at least be functional.
 
>> > +	if (!syscall_enter_from_user_mode_work(regs, &nr)) {
>> > +		nr &= GENMASK(31, 0);
>> > +		do_syscall_32_irqs_on(regs, nr);
>> 
>>   do_syscall_32_irqs_on(regs, (int)nr);
>> 
>> would be too simple, right?
>
> Also way less explicit.

Now you care about explicit, but the return value mess can be left
ambiguous, right?

Aside of that, the cast is very much explicit for people who can read C.

It would be great if you could sit back and come up with a very explicit
and comprehensible explanation for the problem you are trying to solve.

Thanks,

        tglx

      reply	other threads:[~2026-07-02 20:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-01 17:42 [RFC] entry: Untangle the return value of syscall_enter_from_user_mode from syscall NR Michal Suchánek
2026-07-01 18:29 ` H. Peter Anvin
2026-07-02  9:30   ` Michal Suchánek
2026-07-02 21:49   ` Thomas Gleixner
2026-07-02  8:12 ` Sven Schnelle
2026-07-02  9:12   ` Michal Suchánek
2026-07-02 12:01     ` Sven Schnelle
2026-07-02 12:13       ` Michal Suchánek
2026-07-02 11:24 ` Thomas Gleixner
2026-07-02 11:45   ` Michal Suchánek
2026-07-02 20:45     ` Thomas Gleixner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jyrdnmrh.ffs@fw13 \
    --to=tglx@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=alex@ghiti.fr \
    --cc=andrew+kernel@donnellan.id.au \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=chenhuacai@kernel.org \
    --cc=chleroy@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=debug@rivosinc.com \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kees@kernel.org \
    --cc=kernel@xen0n.name \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=lukas.gerlach@cispa.de \
    --cc=luto@kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mkchauras@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=namcao@linutronix.de \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=peterz@infradead.org \
    --cc=pjw@kernel.org \
    --cc=qirui.001@bytedance.com \
    --cc=ryan.roberts@arm.com \
    --cc=skhan@linuxfoundation.org \
    --cc=sshegde@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=x86@kernel.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox