* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
@ 2024-06-27 9:43 ` Björn Töpel
0 siblings, 0 replies; 22+ messages in thread
From: Björn Töpel @ 2024-06-27 9:43 UTC (permalink / raw)
To: Celeste Liu
Cc: Dmitry V. Levin, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Guo Ren, Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
>
> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>
> > Hi,
> >
> > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> >> entry: Save a0 prior syscall_enter_from_user_mode()").
> >>
> >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> >> this syscall with specified errno, they will set a0 to return number as
> >> syscall ABI, and then return -1. This return number is finally pass as
> >> return number of syscall_enter_from_user_mode, and then is compared with
> >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> >> is always executed. It covered a0 set by seccomp, so we always get
> >> ENOSYS when match seccomp RET_ERRNO rule.
> >>
> >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> >> Reported-by: Felix Yan <felixonmars@archlinux.org>
> >> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
> >> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
> >> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> >> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> >> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> >> Tested-by: Felix Yan <felixonmars@archlinux.org>
> >> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
> >> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> >> Reviewed-by: Guo Ren <guoren@kernel.org>
> >> ---
> >>
> >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
> >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> >> implementation-defined behavior, and make the judgment of syscall
> >> invalid more explicit
> >> v2 -> v3: use if-statement instead of set default value,
> >> clarify the type of syscall
> >> v1 -> v2: added explanation on why always got ENOSYS
> >>
> >> arch/riscv/kernel/traps.c | 6 +++---
> >> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> >> index f910dfccbf5d2..729f79c97e2bf 100644
> >> --- a/arch/riscv/kernel/traps.c
> >> +++ b/arch/riscv/kernel/traps.c
> >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> >> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >> {
> >> if (user_mode(regs)) {
> >> - ulong syscall = regs->a7;
> >> + long syscall = regs->a7;
> >>
> >> regs->epc += 4;
> >> regs->orig_a0 = regs->a0;
> >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> >>
> >> syscall = syscall_enter_from_user_mode(regs, syscall);
> >>
> >> - if (syscall < NR_syscalls)
> >> + if (syscall >= 0 && syscall < NR_syscalls)
> >> syscall_handler(regs, syscall);
> >> - else
> >> + else if (syscall != -1)
> >> regs->a0 = -ENOSYS;
> >>
> >> syscall_exit_to_user_mode(regs);
> >
> > Unfortunately, this change introduced a regression: it broke strace
> > syscall tampering on riscv. When the tracer changes syscall number to -1,
> > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > return the error code of the failed syscall to userspace.
>
> In the patch v2, we actually do the right thing. But as Björn Töpel's
> suggestion and we found cast long to ulong is implementation-defined
> behavior in C, so we change it to current form. So revert this patch and
> apply patch v2 should fix this issue. Patch v2 uses ths same way with
> other architectures.
>
> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
--8<--
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05a16b1f0aee..51ebfd23e007 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
regs->epc += 4;
regs->orig_a0 = regs->a0;
+ regs->a0 = -ENOSYS;
riscv_v_vstate_discard(regs);
@@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
if (syscall >= 0 && syscall < NR_syscalls)
syscall_handler(regs, syscall);
- else if (syscall != -1)
- regs->a0 = -ENOSYS;
+
/*
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
* so the maximum stack offset is 1k bytes (10 bits).
--8<--
Celeste, do you want to cook that fix properly?
Björn
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
2024-06-27 9:43 ` Björn Töpel
@ 2024-06-27 9:52 ` Dmitry V. Levin
-1 siblings, 0 replies; 22+ messages in thread
From: Dmitry V. Levin @ 2024-06-27 9:52 UTC (permalink / raw)
To: Björn Töpel
Cc: Celeste Liu, Palmer Dabbelt, Paul Walmsley, Albert Ou, Guo Ren,
Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On Thu, Jun 27, 2024 at 11:43:03AM +0200, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
> > On 2024-06-27 15:14, Dmitry V. Levin wrote:
> >
> > > Hi,
> > >
> > > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> > >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> > >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> > >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> > >> entry: Save a0 prior syscall_enter_from_user_mode()").
> > >>
> > >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> > >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> > >> this syscall with specified errno, they will set a0 to return number as
> > >> syscall ABI, and then return -1. This return number is finally pass as
> > >> return number of syscall_enter_from_user_mode, and then is compared with
> > >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> > >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> > >> is always executed. It covered a0 set by seccomp, so we always get
> > >> ENOSYS when match seccomp RET_ERRNO rule.
> > >>
> > >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> > >> Reported-by: Felix Yan <felixonmars@archlinux.org>
> > >> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
> > >> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
> > >> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > >> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > >> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> > >> Tested-by: Felix Yan <felixonmars@archlinux.org>
> > >> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > >> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> > >> Reviewed-by: Guo Ren <guoren@kernel.org>
> > >> ---
> > >>
> > >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> > >> implementation-defined behavior, and make the judgment of syscall
> > >> invalid more explicit
> > >> v2 -> v3: use if-statement instead of set default value,
> > >> clarify the type of syscall
> > >> v1 -> v2: added explanation on why always got ENOSYS
> > >>
> > >> arch/riscv/kernel/traps.c | 6 +++---
> > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > >> index f910dfccbf5d2..729f79c97e2bf 100644
> > >> --- a/arch/riscv/kernel/traps.c
> > >> +++ b/arch/riscv/kernel/traps.c
> > >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> > >> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > >> {
> > >> if (user_mode(regs)) {
> > >> - ulong syscall = regs->a7;
> > >> + long syscall = regs->a7;
> > >>
> > >> regs->epc += 4;
> > >> regs->orig_a0 = regs->a0;
> > >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > >>
> > >> syscall = syscall_enter_from_user_mode(regs, syscall);
> > >>
> > >> - if (syscall < NR_syscalls)
> > >> + if (syscall >= 0 && syscall < NR_syscalls)
> > >> syscall_handler(regs, syscall);
> > >> - else
> > >> + else if (syscall != -1)
> > >> regs->a0 = -ENOSYS;
> > >>
> > >> syscall_exit_to_user_mode(regs);
> > >
> > > Unfortunately, this change introduced a regression: it broke strace
> > > syscall tampering on riscv. When the tracer changes syscall number to -1,
> > > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > > return the error code of the failed syscall to userspace.
> >
> > In the patch v2, we actually do the right thing. But as Björn Töpel's
> > suggestion and we found cast long to ulong is implementation-defined
> > behavior in C, so we change it to current form. So revert this patch and
> > apply patch v2 should fix this issue. Patch v2 uses ths same way with
> > other architectures.
> >
> > [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
Given that struct user_regs_struct doesn't have orig_a0, wouldn't this
clobber a0 too early so that the tracer will get -ENOSYS in place of the
first syscall argument?
--
ldv
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
@ 2024-06-27 9:52 ` Dmitry V. Levin
0 siblings, 0 replies; 22+ messages in thread
From: Dmitry V. Levin @ 2024-06-27 9:52 UTC (permalink / raw)
To: Björn Töpel
Cc: Celeste Liu, Palmer Dabbelt, Paul Walmsley, Albert Ou, Guo Ren,
Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On Thu, Jun 27, 2024 at 11:43:03AM +0200, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
> > On 2024-06-27 15:14, Dmitry V. Levin wrote:
> >
> > > Hi,
> > >
> > > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> > >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> > >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> > >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> > >> entry: Save a0 prior syscall_enter_from_user_mode()").
> > >>
> > >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> > >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> > >> this syscall with specified errno, they will set a0 to return number as
> > >> syscall ABI, and then return -1. This return number is finally pass as
> > >> return number of syscall_enter_from_user_mode, and then is compared with
> > >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> > >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> > >> is always executed. It covered a0 set by seccomp, so we always get
> > >> ENOSYS when match seccomp RET_ERRNO rule.
> > >>
> > >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> > >> Reported-by: Felix Yan <felixonmars@archlinux.org>
> > >> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
> > >> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
> > >> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > >> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > >> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> > >> Tested-by: Felix Yan <felixonmars@archlinux.org>
> > >> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > >> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> > >> Reviewed-by: Guo Ren <guoren@kernel.org>
> > >> ---
> > >>
> > >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> > >> implementation-defined behavior, and make the judgment of syscall
> > >> invalid more explicit
> > >> v2 -> v3: use if-statement instead of set default value,
> > >> clarify the type of syscall
> > >> v1 -> v2: added explanation on why always got ENOSYS
> > >>
> > >> arch/riscv/kernel/traps.c | 6 +++---
> > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > >> index f910dfccbf5d2..729f79c97e2bf 100644
> > >> --- a/arch/riscv/kernel/traps.c
> > >> +++ b/arch/riscv/kernel/traps.c
> > >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> > >> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > >> {
> > >> if (user_mode(regs)) {
> > >> - ulong syscall = regs->a7;
> > >> + long syscall = regs->a7;
> > >>
> > >> regs->epc += 4;
> > >> regs->orig_a0 = regs->a0;
> > >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > >>
> > >> syscall = syscall_enter_from_user_mode(regs, syscall);
> > >>
> > >> - if (syscall < NR_syscalls)
> > >> + if (syscall >= 0 && syscall < NR_syscalls)
> > >> syscall_handler(regs, syscall);
> > >> - else
> > >> + else if (syscall != -1)
> > >> regs->a0 = -ENOSYS;
> > >>
> > >> syscall_exit_to_user_mode(regs);
> > >
> > > Unfortunately, this change introduced a regression: it broke strace
> > > syscall tampering on riscv. When the tracer changes syscall number to -1,
> > > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > > return the error code of the failed syscall to userspace.
> >
> > In the patch v2, we actually do the right thing. But as Björn Töpel's
> > suggestion and we found cast long to ulong is implementation-defined
> > behavior in C, so we change it to current form. So revert this patch and
> > apply patch v2 should fix this issue. Patch v2 uses ths same way with
> > other architectures.
> >
> > [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
Given that struct user_regs_struct doesn't have orig_a0, wouldn't this
clobber a0 too early so that the tracer will get -ENOSYS in place of the
first syscall argument?
--
ldv
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
2024-06-27 9:52 ` Dmitry V. Levin
@ 2024-06-27 10:23 ` Björn Töpel
-1 siblings, 0 replies; 22+ messages in thread
From: Björn Töpel @ 2024-06-27 10:23 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Celeste Liu, Palmer Dabbelt, Paul Walmsley, Albert Ou, Guo Ren,
Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On Thu, Jun 27, 2024 at 11:52 AM Dmitry V. Levin <ldv@strace.io> wrote:
>
> On Thu, Jun 27, 2024 at 11:43:03AM +0200, Björn Töpel wrote:
> > On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
> > > On 2024-06-27 15:14, Dmitry V. Levin wrote:
> > >
> > > > Hi,
> > > >
> > > > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> > > >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> > > >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> > > >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> > > >> entry: Save a0 prior syscall_enter_from_user_mode()").
> > > >>
> > > >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> > > >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> > > >> this syscall with specified errno, they will set a0 to return number as
> > > >> syscall ABI, and then return -1. This return number is finally pass as
> > > >> return number of syscall_enter_from_user_mode, and then is compared with
> > > >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> > > >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> > > >> is always executed. It covered a0 set by seccomp, so we always get
> > > >> ENOSYS when match seccomp RET_ERRNO rule.
> > > >>
> > > >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> > > >> Reported-by: Felix Yan <felixonmars@archlinux.org>
> > > >> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
> > > >> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
> > > >> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > > >> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > > >> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> > > >> Tested-by: Felix Yan <felixonmars@archlinux.org>
> > > >> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > > >> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> > > >> Reviewed-by: Guo Ren <guoren@kernel.org>
> > > >> ---
> > > >>
> > > >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > > >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> > > >> implementation-defined behavior, and make the judgment of syscall
> > > >> invalid more explicit
> > > >> v2 -> v3: use if-statement instead of set default value,
> > > >> clarify the type of syscall
> > > >> v1 -> v2: added explanation on why always got ENOSYS
> > > >>
> > > >> arch/riscv/kernel/traps.c | 6 +++---
> > > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >>
> > > >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > >> index f910dfccbf5d2..729f79c97e2bf 100644
> > > >> --- a/arch/riscv/kernel/traps.c
> > > >> +++ b/arch/riscv/kernel/traps.c
> > > >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> > > >> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > > >> {
> > > >> if (user_mode(regs)) {
> > > >> - ulong syscall = regs->a7;
> > > >> + long syscall = regs->a7;
> > > >>
> > > >> regs->epc += 4;
> > > >> regs->orig_a0 = regs->a0;
> > > >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > > >>
> > > >> syscall = syscall_enter_from_user_mode(regs, syscall);
> > > >>
> > > >> - if (syscall < NR_syscalls)
> > > >> + if (syscall >= 0 && syscall < NR_syscalls)
> > > >> syscall_handler(regs, syscall);
> > > >> - else
> > > >> + else if (syscall != -1)
> > > >> regs->a0 = -ENOSYS;
> > > >>
> > > >> syscall_exit_to_user_mode(regs);
> > > >
> > > > Unfortunately, this change introduced a regression: it broke strace
> > > > syscall tampering on riscv. When the tracer changes syscall number to -1,
> > > > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > > > return the error code of the failed syscall to userspace.
> > >
> > > In the patch v2, we actually do the right thing. But as Björn Töpel's
> > > suggestion and we found cast long to ulong is implementation-defined
> > > behavior in C, so we change it to current form. So revert this patch and
> > > apply patch v2 should fix this issue. Patch v2 uses ths same way with
> > > other architectures.
> > >
> > > [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
> >
> > Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
> >
> > --8<--
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 05a16b1f0aee..51ebfd23e007 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> >
> > regs->epc += 4;
> > regs->orig_a0 = regs->a0;
> > + regs->a0 = -ENOSYS;
>
> Given that struct user_regs_struct doesn't have orig_a0, wouldn't this
> clobber a0 too early so that the tracer will get -ENOSYS in place of the
> first syscall argument?
No, that's ok. It's handled by various wrappers where the arguments
are pulled out.
Björn
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
@ 2024-06-27 10:23 ` Björn Töpel
0 siblings, 0 replies; 22+ messages in thread
From: Björn Töpel @ 2024-06-27 10:23 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Celeste Liu, Palmer Dabbelt, Paul Walmsley, Albert Ou, Guo Ren,
Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On Thu, Jun 27, 2024 at 11:52 AM Dmitry V. Levin <ldv@strace.io> wrote:
>
> On Thu, Jun 27, 2024 at 11:43:03AM +0200, Björn Töpel wrote:
> > On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
> > > On 2024-06-27 15:14, Dmitry V. Levin wrote:
> > >
> > > > Hi,
> > > >
> > > > On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> > > >> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> > > >> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> > > >> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> > > >> entry: Save a0 prior syscall_enter_from_user_mode()").
> > > >>
> > > >> After analysing code, we think that regs->a0 = -ENOSYS should only be
> > > >> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> > > >> this syscall with specified errno, they will set a0 to return number as
> > > >> syscall ABI, and then return -1. This return number is finally pass as
> > > >> return number of syscall_enter_from_user_mode, and then is compared with
> > > >> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> > > >> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> > > >> is always executed. It covered a0 set by seccomp, so we always get
> > > >> ENOSYS when match seccomp RET_ERRNO rule.
> > > >>
> > > >> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> > > >> Reported-by: Felix Yan <felixonmars@archlinux.org>
> > > >> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
> > > >> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
> > > >> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > > >> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
> > > >> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> > > >> Tested-by: Felix Yan <felixonmars@archlinux.org>
> > > >> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > > >> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> > > >> Reviewed-by: Guo Ren <guoren@kernel.org>
> > > >> ---
> > > >>
> > > >> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
> > > >> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> > > >> implementation-defined behavior, and make the judgment of syscall
> > > >> invalid more explicit
> > > >> v2 -> v3: use if-statement instead of set default value,
> > > >> clarify the type of syscall
> > > >> v1 -> v2: added explanation on why always got ENOSYS
> > > >>
> > > >> arch/riscv/kernel/traps.c | 6 +++---
> > > >> 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >>
> > > >> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > > >> index f910dfccbf5d2..729f79c97e2bf 100644
> > > >> --- a/arch/riscv/kernel/traps.c
> > > >> +++ b/arch/riscv/kernel/traps.c
> > > >> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> > > >> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > > >> {
> > > >> if (user_mode(regs)) {
> > > >> - ulong syscall = regs->a7;
> > > >> + long syscall = regs->a7;
> > > >>
> > > >> regs->epc += 4;
> > > >> regs->orig_a0 = regs->a0;
> > > >> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> > > >>
> > > >> syscall = syscall_enter_from_user_mode(regs, syscall);
> > > >>
> > > >> - if (syscall < NR_syscalls)
> > > >> + if (syscall >= 0 && syscall < NR_syscalls)
> > > >> syscall_handler(regs, syscall);
> > > >> - else
> > > >> + else if (syscall != -1)
> > > >> regs->a0 = -ENOSYS;
> > > >>
> > > >> syscall_exit_to_user_mode(regs);
> > > >
> > > > Unfortunately, this change introduced a regression: it broke strace
> > > > syscall tampering on riscv. When the tracer changes syscall number to -1,
> > > > the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> > > > return the error code of the failed syscall to userspace.
> > >
> > > In the patch v2, we actually do the right thing. But as Björn Töpel's
> > > suggestion and we found cast long to ulong is implementation-defined
> > > behavior in C, so we change it to current form. So revert this patch and
> > > apply patch v2 should fix this issue. Patch v2 uses ths same way with
> > > other architectures.
> > >
> > > [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
> >
> > Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
> >
> > --8<--
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 05a16b1f0aee..51ebfd23e007 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> >
> > regs->epc += 4;
> > regs->orig_a0 = regs->a0;
> > + regs->a0 = -ENOSYS;
>
> Given that struct user_regs_struct doesn't have orig_a0, wouldn't this
> clobber a0 too early so that the tracer will get -ENOSYS in place of the
> first syscall argument?
No, that's ok. It's handled by various wrappers where the arguments
are pulled out.
Björn
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
2024-06-27 9:43 ` Björn Töpel
@ 2024-06-27 10:11 ` Celeste Liu
-1 siblings, 0 replies; 22+ messages in thread
From: Celeste Liu @ 2024-06-27 10:11 UTC (permalink / raw)
To: Björn Töpel
Cc: Dmitry V. Levin, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Guo Ren, Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On 2024-06-27 17:43, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
>>
>> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>>
>>> Hi,
>>>
>>> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>>>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>>>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>>>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>>>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>>>
>>>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>>>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>>>> this syscall with specified errno, they will set a0 to return number as
>>>> syscall ABI, and then return -1. This return number is finally pass as
>>>> return number of syscall_enter_from_user_mode, and then is compared with
>>>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>>>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>>>> is always executed. It covered a0 set by seccomp, so we always get
>>>> ENOSYS when match seccomp RET_ERRNO rule.
>>>>
>>>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>>>> Reported-by: Felix Yan <felixonmars@archlinux.org>
>>>> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
>>>> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
>>>> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
>>>> Tested-by: Felix Yan <felixonmars@archlinux.org>
>>>> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Reviewed-by: Guo Ren <guoren@kernel.org>
>>>> ---
>>>>
>>>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>>>> implementation-defined behavior, and make the judgment of syscall
>>>> invalid more explicit
>>>> v2 -> v3: use if-statement instead of set default value,
>>>> clarify the type of syscall
>>>> v1 -> v2: added explanation on why always got ENOSYS
>>>>
>>>> arch/riscv/kernel/traps.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>>> index f910dfccbf5d2..729f79c97e2bf 100644
>>>> --- a/arch/riscv/kernel/traps.c
>>>> +++ b/arch/riscv/kernel/traps.c
>>>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>>>> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>> {
>>>> if (user_mode(regs)) {
>>>> - ulong syscall = regs->a7;
>>>> + long syscall = regs->a7;
>>>>
>>>> regs->epc += 4;
>>>> regs->orig_a0 = regs->a0;
>>>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>
>>>> syscall = syscall_enter_from_user_mode(regs, syscall);
>>>>
>>>> - if (syscall < NR_syscalls)
>>>> + if (syscall >= 0 && syscall < NR_syscalls)
>>>> syscall_handler(regs, syscall);
>>>> - else
>>>> + else if (syscall != -1)
>>>> regs->a0 = -ENOSYS;
>>>>
>>>> syscall_exit_to_user_mode(regs);
>>>
>>> Unfortunately, this change introduced a regression: it broke strace
>>> syscall tampering on riscv. When the tracer changes syscall number to -1,
>>> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
>>> return the error code of the failed syscall to userspace.
>>
>> In the patch v2, we actually do the right thing. But as Björn Töpel's
>> suggestion and we found cast long to ulong is implementation-defined
>> behavior in C, so we change it to current form. So revert this patch and
>> apply patch v2 should fix this issue. Patch v2 uses ths same way with
>> other architectures.
>>
>> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
Oh. I just want to describe what change we need, not to say actual 'git revert'.
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
>
> riscv_v_vstate_discard(regs);
>
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
> - else if (syscall != -1)
> - regs->a0 = -ENOSYS;
> +
> /*
> * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> * so the maximum stack offset is 1k bytes (10 bits).
> --8<--
This is also what I think.
> Celeste, do you want to cook that fix properly?
Yeah. I will sent patch to mail list soon.
>
>
> Björn
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
@ 2024-06-27 10:11 ` Celeste Liu
0 siblings, 0 replies; 22+ messages in thread
From: Celeste Liu @ 2024-06-27 10:11 UTC (permalink / raw)
To: Björn Töpel
Cc: Dmitry V. Levin, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Guo Ren, Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On 2024-06-27 17:43, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
>>
>> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>>
>>> Hi,
>>>
>>> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>>>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>>>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>>>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>>>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>>>
>>>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>>>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>>>> this syscall with specified errno, they will set a0 to return number as
>>>> syscall ABI, and then return -1. This return number is finally pass as
>>>> return number of syscall_enter_from_user_mode, and then is compared with
>>>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>>>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>>>> is always executed. It covered a0 set by seccomp, so we always get
>>>> ENOSYS when match seccomp RET_ERRNO rule.
>>>>
>>>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>>>> Reported-by: Felix Yan <felixonmars@archlinux.org>
>>>> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
>>>> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
>>>> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
>>>> Tested-by: Felix Yan <felixonmars@archlinux.org>
>>>> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Reviewed-by: Guo Ren <guoren@kernel.org>
>>>> ---
>>>>
>>>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>>>> implementation-defined behavior, and make the judgment of syscall
>>>> invalid more explicit
>>>> v2 -> v3: use if-statement instead of set default value,
>>>> clarify the type of syscall
>>>> v1 -> v2: added explanation on why always got ENOSYS
>>>>
>>>> arch/riscv/kernel/traps.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>>> index f910dfccbf5d2..729f79c97e2bf 100644
>>>> --- a/arch/riscv/kernel/traps.c
>>>> +++ b/arch/riscv/kernel/traps.c
>>>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>>>> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>> {
>>>> if (user_mode(regs)) {
>>>> - ulong syscall = regs->a7;
>>>> + long syscall = regs->a7;
>>>>
>>>> regs->epc += 4;
>>>> regs->orig_a0 = regs->a0;
>>>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>
>>>> syscall = syscall_enter_from_user_mode(regs, syscall);
>>>>
>>>> - if (syscall < NR_syscalls)
>>>> + if (syscall >= 0 && syscall < NR_syscalls)
>>>> syscall_handler(regs, syscall);
>>>> - else
>>>> + else if (syscall != -1)
>>>> regs->a0 = -ENOSYS;
>>>>
>>>> syscall_exit_to_user_mode(regs);
>>>
>>> Unfortunately, this change introduced a regression: it broke strace
>>> syscall tampering on riscv. When the tracer changes syscall number to -1,
>>> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
>>> return the error code of the failed syscall to userspace.
>>
>> In the patch v2, we actually do the right thing. But as Björn Töpel's
>> suggestion and we found cast long to ulong is implementation-defined
>> behavior in C, so we change it to current form. So revert this patch and
>> apply patch v2 should fix this issue. Patch v2 uses ths same way with
>> other architectures.
>>
>> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
Oh. I just want to describe what change we need, not to say actual 'git revert'.
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
>
> riscv_v_vstate_discard(regs);
>
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
> - else if (syscall != -1)
> - regs->a0 = -ENOSYS;
> +
> /*
> * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> * so the maximum stack offset is 1k bytes (10 bits).
> --8<--
This is also what I think.
> Celeste, do you want to cook that fix properly?
Yeah. I will sent patch to mail list soon.
>
>
> Björn
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
2024-06-27 9:43 ` Björn Töpel
@ 2024-06-27 10:38 ` Celeste Liu
-1 siblings, 0 replies; 22+ messages in thread
From: Celeste Liu @ 2024-06-27 10:38 UTC (permalink / raw)
To: Björn Töpel
Cc: Dmitry V. Levin, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Guo Ren, Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On 2024-06-27 17:43, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
>>
>> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>>
>>> Hi,
>>>
>>> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>>>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>>>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>>>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>>>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>>>
>>>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>>>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>>>> this syscall with specified errno, they will set a0 to return number as
>>>> syscall ABI, and then return -1. This return number is finally pass as
>>>> return number of syscall_enter_from_user_mode, and then is compared with
>>>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>>>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>>>> is always executed. It covered a0 set by seccomp, so we always get
>>>> ENOSYS when match seccomp RET_ERRNO rule.
>>>>
>>>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>>>> Reported-by: Felix Yan <felixonmars@archlinux.org>
>>>> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
>>>> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
>>>> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
>>>> Tested-by: Felix Yan <felixonmars@archlinux.org>
>>>> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Reviewed-by: Guo Ren <guoren@kernel.org>
>>>> ---
>>>>
>>>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>>>> implementation-defined behavior, and make the judgment of syscall
>>>> invalid more explicit
>>>> v2 -> v3: use if-statement instead of set default value,
>>>> clarify the type of syscall
>>>> v1 -> v2: added explanation on why always got ENOSYS
>>>>
>>>> arch/riscv/kernel/traps.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>>> index f910dfccbf5d2..729f79c97e2bf 100644
>>>> --- a/arch/riscv/kernel/traps.c
>>>> +++ b/arch/riscv/kernel/traps.c
>>>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>>>> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>> {
>>>> if (user_mode(regs)) {
>>>> - ulong syscall = regs->a7;
>>>> + long syscall = regs->a7;
>>>>
>>>> regs->epc += 4;
>>>> regs->orig_a0 = regs->a0;
>>>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>
>>>> syscall = syscall_enter_from_user_mode(regs, syscall);
>>>>
>>>> - if (syscall < NR_syscalls)
>>>> + if (syscall >= 0 && syscall < NR_syscalls)
>>>> syscall_handler(regs, syscall);
>>>> - else
>>>> + else if (syscall != -1)
>>>> regs->a0 = -ENOSYS;
>>>>
>>>> syscall_exit_to_user_mode(regs);
>>>
>>> Unfortunately, this change introduced a regression: it broke strace
>>> syscall tampering on riscv. When the tracer changes syscall number to -1,
>>> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
>>> return the error code of the failed syscall to userspace.
>>
>> In the patch v2, we actually do the right thing. But as Björn Töpel's
>> suggestion and we found cast long to ulong is implementation-defined
>> behavior in C, so we change it to current form. So revert this patch and
>> apply patch v2 should fix this issue. Patch v2 uses ths same way with
>> other architectures.
>>
>> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
>
> riscv_v_vstate_discard(regs);
>
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
> - else if (syscall != -1)
> - regs->a0 = -ENOSYS;
> +
> /*
> * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> * so the maximum stack offset is 1k bytes (10 bits).
> --8<--
>
> Celeste, do you want to cook that fix properly?
Patch has been sent.
https://lore.kernel.org/all/20240627103205.27914-2-CoelacanthusHex@gmail.com/
And linux-riscv mail list problably has some issue: there are two copies
of patch, orginal one and another one with linux-riscv foot.
>
>
> Björn
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
@ 2024-06-27 10:38 ` Celeste Liu
0 siblings, 0 replies; 22+ messages in thread
From: Celeste Liu @ 2024-06-27 10:38 UTC (permalink / raw)
To: Björn Töpel
Cc: Dmitry V. Levin, Palmer Dabbelt, Paul Walmsley, Albert Ou,
Guo Ren, Conor Dooley, linux-riscv, linux-kernel, Andreas Schwab,
David Laight, Felix Yan, Ruizhe Pan, Shiqi Zhang,
Emil Renner Berthing, Ivan A. Melnikov
On 2024-06-27 17:43, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex@gmail.com> wrote:
>>
>> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>>
>>> Hi,
>>>
>>> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>>>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>>>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>>>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>>>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>>>
>>>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>>>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>>>> this syscall with specified errno, they will set a0 to return number as
>>>> syscall ABI, and then return -1. This return number is finally pass as
>>>> return number of syscall_enter_from_user_mode, and then is compared with
>>>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>>>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>>>> is always executed. It covered a0 set by seccomp, so we always get
>>>> ENOSYS when match seccomp RET_ERRNO rule.
>>>>
>>>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>>>> Reported-by: Felix Yan <felixonmars@archlinux.org>
>>>> Co-developed-by: Ruizhe Pan <c141028@gmail.com>
>>>> Signed-off-by: Ruizhe Pan <c141028@gmail.com>
>>>> Co-developed-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Shiqi Zhang <shiqi@isrc.iscas.ac.cn>
>>>> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
>>>> Tested-by: Felix Yan <felixonmars@archlinux.org>
>>>> Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
>>>> Reviewed-by: Guo Ren <guoren@kernel.org>
>>>> ---
>>>>
>>>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing@canonical.com>
>>>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>>>> implementation-defined behavior, and make the judgment of syscall
>>>> invalid more explicit
>>>> v2 -> v3: use if-statement instead of set default value,
>>>> clarify the type of syscall
>>>> v1 -> v2: added explanation on why always got ENOSYS
>>>>
>>>> arch/riscv/kernel/traps.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>>> index f910dfccbf5d2..729f79c97e2bf 100644
>>>> --- a/arch/riscv/kernel/traps.c
>>>> +++ b/arch/riscv/kernel/traps.c
>>>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>>>> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>> {
>>>> if (user_mode(regs)) {
>>>> - ulong syscall = regs->a7;
>>>> + long syscall = regs->a7;
>>>>
>>>> regs->epc += 4;
>>>> regs->orig_a0 = regs->a0;
>>>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>
>>>> syscall = syscall_enter_from_user_mode(regs, syscall);
>>>>
>>>> - if (syscall < NR_syscalls)
>>>> + if (syscall >= 0 && syscall < NR_syscalls)
>>>> syscall_handler(regs, syscall);
>>>> - else
>>>> + else if (syscall != -1)
>>>> regs->a0 = -ENOSYS;
>>>>
>>>> syscall_exit_to_user_mode(regs);
>>>
>>> Unfortunately, this change introduced a regression: it broke strace
>>> syscall tampering on riscv. When the tracer changes syscall number to -1,
>>> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
>>> return the error code of the failed syscall to userspace.
>>
>> In the patch v2, we actually do the right thing. But as Björn Töpel's
>> suggestion and we found cast long to ulong is implementation-defined
>> behavior in C, so we change it to current form. So revert this patch and
>> apply patch v2 should fix this issue. Patch v2 uses ths same way with
>> other architectures.
>>
>> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:
>
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
>
> riscv_v_vstate_discard(regs);
>
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
> if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
> - else if (syscall != -1)
> - regs->a0 = -ENOSYS;
> +
> /*
> * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> * so the maximum stack offset is 1k bytes (10 bits).
> --8<--
>
> Celeste, do you want to cook that fix properly?
Patch has been sent.
https://lore.kernel.org/all/20240627103205.27914-2-CoelacanthusHex@gmail.com/
And linux-riscv mail list problably has some issue: there are two copies
of patch, orginal one and another one with linux-riscv foot.
>
>
> Björn
^ permalink raw reply [flat|nested] 22+ messages in thread