* [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
[not found] <20250113170925.GA392@strace.io>
@ 2025-01-13 17:10 ` Dmitry V. Levin
2025-01-13 17:34 ` Christophe Leroy
2025-01-14 13:00 ` Alexey Gladkov
2025-01-13 17:11 ` [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value() Dmitry V. Levin
2025-01-13 17:11 ` [PATCH v2 4/7] syscall.h: introduce syscall_set_nr() Dmitry V. Levin
2 siblings, 2 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-13 17:10 UTC (permalink / raw)
To: Oleg Nesterov, Michael Ellerman
Cc: Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, linuxppc-dev,
linux-kernel
Bring syscall_set_return_value() in sync with syscall_get_error(),
and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
syscall_set_return_value()").
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
---
arch/powerpc/include/asm/syscall.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index 3dd36c5e334a..422d7735ace6 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
*/
if (error) {
regs->ccr |= 0x10000000L;
- regs->gpr[3] = error;
+ /*
+ * In case of an error regs->gpr[3] contains
+ * a positive ERRORCODE.
+ */
+ regs->gpr[3] = -error;
} else {
regs->ccr &= ~0x10000000L;
regs->gpr[3] = val;
--
ldv
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
[not found] <20250113170925.GA392@strace.io>
2025-01-13 17:10 ` [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value() Dmitry V. Levin
@ 2025-01-13 17:11 ` Dmitry V. Levin
2025-01-16 2:20 ` Charlie Jenkins
2025-01-13 17:11 ` [PATCH v2 4/7] syscall.h: introduce syscall_set_nr() Dmitry V. Levin
2 siblings, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-13 17:11 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Vineet Gupta, Russell King,
Will Deacon, Guo Ren, Brian Cain, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Yoshinori Sato, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
Richard Weinberger, Anton Ivanov, Johannes Berg, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Chris Zankel, Max Filippov, Arnd Bergmann, linux-snps-arc,
linux-kernel, linux-arm-kernel, linux-csky, linux-hexagon,
loongarch, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev,
linux-riscv, linux-s390, linux-sh, sparclinux, linux-um,
linux-arch
These functions are going to be needed on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.
This partially reverts commit 7962c2eddbfe ("arch: remove unused
function syscall_set_arguments()") by reusing some of old
syscall_set_arguments() implementations.
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
---
Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by
looking at mips_get_syscall_arg() and the result passes tests in qemu on
mips O32, mips64 O32, mips64 N32, and mips64 N64.
arch/arc/include/asm/syscall.h | 14 +++++++++++
arch/arm/include/asm/syscall.h | 13 ++++++++++
arch/arm64/include/asm/syscall.h | 13 ++++++++++
arch/csky/include/asm/syscall.h | 13 ++++++++++
arch/hexagon/include/asm/syscall.h | 14 +++++++++++
arch/loongarch/include/asm/syscall.h | 8 ++++++
arch/mips/include/asm/syscall.h | 32 ++++++++++++++++++++++++
arch/nios2/include/asm/syscall.h | 11 ++++++++
arch/openrisc/include/asm/syscall.h | 7 ++++++
arch/parisc/include/asm/syscall.h | 12 +++++++++
arch/powerpc/include/asm/syscall.h | 10 ++++++++
arch/riscv/include/asm/syscall.h | 9 +++++++
arch/s390/include/asm/syscall.h | 12 +++++++++
arch/sh/include/asm/syscall_32.h | 12 +++++++++
arch/sparc/include/asm/syscall.h | 10 ++++++++
arch/um/include/asm/syscall-generic.h | 14 +++++++++++
arch/x86/include/asm/syscall.h | 36 +++++++++++++++++++++++++++
arch/xtensa/include/asm/syscall.h | 11 ++++++++
include/asm-generic/syscall.h | 16 ++++++++++++
19 files changed, 267 insertions(+)
diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 9709256e31c8..89c1e1736356 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
}
}
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ unsigned long *args)
+{
+ unsigned long *inside_ptregs = ®s->r0;
+ unsigned int n = 6;
+ unsigned int i = 0;
+
+ while (n--) {
+ *inside_ptregs = args[i++];
+ inside_ptregs--;
+ }
+}
+
static inline int
syscall_get_arch(struct task_struct *task)
{
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fe4326d938c1..21927fa0ae2b 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
+ /*
+ * Also copy the first argument into ARM_ORIG_r0
+ * so that syscall_get_arguments() would return it
+ * instead of the previous value.
+ */
+ regs->ARM_ORIG_r0 = regs->ARM_r0;
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
/* ARM tasks don't change audit architectures on the fly. */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ab8e14b96f68..76020b66286b 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
+ /*
+ * Also copy the first argument into orig_x0
+ * so that syscall_get_arguments() would return it
+ * instead of the previous value.
+ */
+ regs->orig_x0 = regs->regs[0];
+}
+
/*
* We don't care about endianness (__AUDIT_ARCH_LE bit) here because
* AArch64 has the same system calls both on little- and big- endian.
diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
index 0de5734950bf..30403f7a0487 100644
--- a/arch/csky/include/asm/syscall.h
+++ b/arch/csky/include/asm/syscall.h
@@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
memcpy(args, ®s->a1, 5 * sizeof(args[0]));
}
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ const unsigned long *args)
+{
+ memcpy(®s->a0, args, 6 * sizeof(regs->a0));
+ /*
+ * Also copy the first argument into orig_x0
+ * so that syscall_get_arguments() would return it
+ * instead of the previous value.
+ */
+ regs->orig_a0 = regs->a0;
+}
+
static inline int
syscall_get_arch(struct task_struct *task)
{
diff --git a/arch/hexagon/include/asm/syscall.h b/arch/hexagon/include/asm/syscall.h
index f6e454f18038..1024a6548d78 100644
--- a/arch/hexagon/include/asm/syscall.h
+++ b/arch/hexagon/include/asm/syscall.h
@@ -33,6 +33,13 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, &(®s->r00)[0], 6 * sizeof(args[0]));
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned long *args)
+{
+ memcpy(&(®s->r00)[0], args, 6 * sizeof(args[0]));
+}
+
static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs)
{
@@ -45,6 +52,13 @@ static inline long syscall_get_return_value(struct task_struct *task,
return regs->r00;
}
+static inline void syscall_set_return_value(struct task_struct *task,
+ struct pt_regs *regs,
+ int error, long val)
+{
+ regs->r00 = (long) error ?: val;
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
return AUDIT_ARCH_HEXAGON;
diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
index e286dc58476e..ff415b3c0a8e 100644
--- a/arch/loongarch/include/asm/syscall.h
+++ b/arch/loongarch/include/asm/syscall.h
@@ -61,6 +61,14 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned long *args)
+{
+ regs->orig_a0 = args[0];
+ memcpy(®s->regs[5], &args[1], 5 * sizeof(long));
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
return AUDIT_ARCH_LOONGARCH64;
diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
index 2f85f2d8f754..3163d1506fae 100644
--- a/arch/mips/include/asm/syscall.h
+++ b/arch/mips/include/asm/syscall.h
@@ -76,6 +76,23 @@ static inline void mips_get_syscall_arg(unsigned long *arg,
#endif
}
+static inline void mips_set_syscall_arg(unsigned long *arg,
+ struct task_struct *task, struct pt_regs *regs, unsigned int n)
+{
+#ifdef CONFIG_32BIT
+ switch (n) {
+ case 0: case 1: case 2: case 3:
+ regs->regs[4 + n] = *arg;
+ return;
+ case 4: case 5: case 6: case 7:
+ *arg = regs->pad0[n] = *arg;
+ return;
+ }
+#else
+ regs->regs[4 + n] = *arg;
+#endif
+}
+
static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs)
{
@@ -122,6 +139,21 @@ static inline void syscall_get_arguments(struct task_struct *task,
mips_get_syscall_arg(args++, task, regs, i++);
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned long *args)
+{
+ unsigned int i = 0;
+ unsigned int n = 6;
+
+ /* O32 ABI syscall() */
+ if (mips_syscall_is_indirect(task, regs))
+ i++;
+
+ while (n--)
+ mips_set_syscall_arg(args++, task, regs, i++);
+}
+
extern const unsigned long sys_call_table[];
extern const unsigned long sys32_call_table[];
extern const unsigned long sysn32_call_table[];
diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
index fff52205fb65..526449edd768 100644
--- a/arch/nios2/include/asm/syscall.h
+++ b/arch/nios2/include/asm/syscall.h
@@ -58,6 +58,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
*args = regs->r9;
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs, const unsigned long *args)
+{
+ regs->r4 = *args++;
+ regs->r5 = *args++;
+ regs->r6 = *args++;
+ regs->r7 = *args++;
+ regs->r8 = *args++;
+ regs->r9 = *args;
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
return AUDIT_ARCH_NIOS2;
diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
index 903ed882bdec..e6383be2a195 100644
--- a/arch/openrisc/include/asm/syscall.h
+++ b/arch/openrisc/include/asm/syscall.h
@@ -57,6 +57,13 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
memcpy(args, ®s->gpr[3], 6 * sizeof(args[0]));
}
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ const unsigned long *args)
+{
+ memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
return AUDIT_ARCH_OPENRISC;
diff --git a/arch/parisc/include/asm/syscall.h b/arch/parisc/include/asm/syscall.h
index 00b127a5e09b..b146d0ae4c77 100644
--- a/arch/parisc/include/asm/syscall.h
+++ b/arch/parisc/include/asm/syscall.h
@@ -29,6 +29,18 @@ static inline void syscall_get_arguments(struct task_struct *tsk,
args[0] = regs->gr[26];
}
+static inline void syscall_set_arguments(struct task_struct *tsk,
+ struct pt_regs *regs,
+ unsigned long *args)
+{
+ regs->gr[21] = args[5];
+ regs->gr[22] = args[4];
+ regs->gr[23] = args[3];
+ regs->gr[24] = args[2];
+ regs->gr[25] = args[1];
+ regs->gr[26] = args[0];
+}
+
static inline long syscall_get_error(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index 422d7735ace6..521f279e6b33 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -114,6 +114,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
}
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
+
+ /* Also copy the first argument into orig_gpr3 */
+ regs->orig_gpr3 = args[0];
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
if (is_tsk_32bit_task(task))
diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 121fff429dce..8d389ba995c8 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -66,6 +66,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
memcpy(args, ®s->a1, 5 * sizeof(args[0]));
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ regs->orig_a0 = args[0];
+ args++;
+ memcpy(®s->a1, args, 5 * sizeof(regs->a1));
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
#ifdef CONFIG_64BIT
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index 27e3d804b311..b3dd883699e7 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -78,6 +78,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[0] = regs->orig_gpr2 & mask;
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ unsigned int n = 6;
+
+ while (n-- > 0)
+ if (n > 0)
+ regs->gprs[2 + n] = args[n];
+ regs->orig_gpr2 = args[0];
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
#ifdef CONFIG_COMPAT
diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index d87738eebe30..cb51a7528384 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -57,6 +57,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[0] = regs->regs[4];
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ regs->regs[1] = args[5];
+ regs->regs[0] = args[4];
+ regs->regs[7] = args[3];
+ regs->regs[6] = args[2];
+ regs->regs[5] = args[1];
+ regs->regs[4] = args[0];
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
int arch = AUDIT_ARCH_SH;
diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 20c109ac8cc9..62a5a78804c4 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -117,6 +117,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
}
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ unsigned int i;
+
+ for (i = 0; i < 6; i++)
+ regs->u_regs[UREG_I0 + i] = args[i];
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
#if defined(CONFIG_SPARC64) && defined(CONFIG_COMPAT)
diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
index 172b74143c4b..2984feb9d576 100644
--- a/arch/um/include/asm/syscall-generic.h
+++ b/arch/um/include/asm/syscall-generic.h
@@ -62,6 +62,20 @@ static inline void syscall_get_arguments(struct task_struct *task,
*args = UPT_SYSCALL_ARG6(r);
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ struct uml_pt_regs *r = ®s->regs;
+
+ UPT_SYSCALL_ARG1(r) = *args++;
+ UPT_SYSCALL_ARG2(r) = *args++;
+ UPT_SYSCALL_ARG3(r) = *args++;
+ UPT_SYSCALL_ARG4(r) = *args++;
+ UPT_SYSCALL_ARG5(r) = *args++;
+ UPT_SYSCALL_ARG6(r) = *args;
+}
+
/* See arch/x86/um/asm/syscall.h for syscall_get_arch() definition. */
#endif /* __UM_SYSCALL_GENERIC_H */
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 7c488ff0c764..b9c249dd9e3d 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -90,6 +90,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[5] = regs->bp;
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ regs->bx = args[0];
+ regs->cx = args[1];
+ regs->dx = args[2];
+ regs->si = args[3];
+ regs->di = args[4];
+ regs->bp = args[5];
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
return AUDIT_ARCH_I386;
@@ -121,6 +133,30 @@ static inline void syscall_get_arguments(struct task_struct *task,
}
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+# ifdef CONFIG_IA32_EMULATION
+ if (task->thread_info.status & TS_COMPAT) {
+ regs->bx = *args++;
+ regs->cx = *args++;
+ regs->dx = *args++;
+ regs->si = *args++;
+ regs->di = *args++;
+ regs->bp = *args;
+ } else
+# endif
+ {
+ regs->di = *args++;
+ regs->si = *args++;
+ regs->dx = *args++;
+ regs->r10 = *args++;
+ regs->r8 = *args++;
+ regs->r9 = *args;
+ }
+}
+
static inline int syscall_get_arch(struct task_struct *task)
{
/* x32 tasks should be considered AUDIT_ARCH_X86_64. */
diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
index 5ee974bf8330..f9a671cbf933 100644
--- a/arch/xtensa/include/asm/syscall.h
+++ b/arch/xtensa/include/asm/syscall.h
@@ -68,6 +68,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
args[i] = regs->areg[reg[i]];
}
+static inline void syscall_set_arguments(struct task_struct *task,
+ struct pt_regs *regs,
+ const unsigned long *args)
+{
+ static const unsigned int reg[] = XTENSA_SYSCALL_ARGUMENT_REGS;
+ unsigned int i;
+
+ for (i = 0; i < 6; ++i)
+ regs->areg[reg[i]] = args[i];
+}
+
asmlinkage long xtensa_rt_sigreturn(void);
asmlinkage long xtensa_shmat(int, char __user *, int);
asmlinkage long xtensa_fadvise64_64(int, int,
diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 5a80fe728dc8..0f7b9a493de7 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -117,6 +117,22 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
unsigned long *args);
+/**
+ * syscall_set_arguments - change system call parameter value
+ * @task: task of interest, must be in system call entry tracing
+ * @regs: task_pt_regs() of @task
+ * @args: array of argument values to store
+ *
+ * Changes 6 arguments to the system call.
+ * The first argument gets value @args[0], and so on.
+ *
+ * It's only valid to call this when @task is stopped for tracing on
+ * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
+ * %SYSCALL_WORK_SYSCALL_AUDIT.
+ */
+void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+ const unsigned long *args);
+
/**
* syscall_get_arch - return the AUDIT_ARCH for the current system call
* @task: task of interest, must be blocked
--
ldv
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 4/7] syscall.h: introduce syscall_set_nr()
[not found] <20250113170925.GA392@strace.io>
2025-01-13 17:10 ` [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value() Dmitry V. Levin
2025-01-13 17:11 ` [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value() Dmitry V. Levin
@ 2025-01-13 17:11 ` Dmitry V. Levin
2025-01-16 2:20 ` Charlie Jenkins
2 siblings, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-13 17:11 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Vineet Gupta, Russell King,
Catalin Marinas, Will Deacon, Brian Cain, Huacai Chen,
WANG Xuerui, Geert Uytterhoeven, Michal Simek,
Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, James E.J. Bottomley, Helge Deller,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Madhavan Srinivasan, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Yoshinori Sato, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
Richard Weinberger, Anton Ivanov, Johannes Berg, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Chris Zankel, Max Filippov, Arnd Bergmann, linux-snps-arc,
linux-kernel, linux-arm-kernel, linux-hexagon, loongarch,
linux-m68k, linux-mips, linux-openrisc, linux-parisc,
linuxppc-dev, linux-riscv, linux-s390, linux-sh, sparclinux,
linux-um, linux-arch
Similar to syscall_set_arguments() that complements
syscall_get_arguments(), introduce syscall_set_nr()
that complements syscall_get_nr().
syscall_set_nr() is going to be needed along with
syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
---
arch/arc/include/asm/syscall.h | 11 +++++++++++
arch/arm/include/asm/syscall.h | 24 ++++++++++++++++++++++++
arch/arm64/include/asm/syscall.h | 16 ++++++++++++++++
arch/hexagon/include/asm/syscall.h | 7 +++++++
arch/loongarch/include/asm/syscall.h | 7 +++++++
arch/m68k/include/asm/syscall.h | 7 +++++++
arch/microblaze/include/asm/syscall.h | 7 +++++++
arch/mips/include/asm/syscall.h | 14 ++++++++++++++
arch/nios2/include/asm/syscall.h | 5 +++++
arch/openrisc/include/asm/syscall.h | 6 ++++++
arch/parisc/include/asm/syscall.h | 7 +++++++
arch/powerpc/include/asm/syscall.h | 10 ++++++++++
arch/riscv/include/asm/syscall.h | 7 +++++++
arch/s390/include/asm/syscall.h | 12 ++++++++++++
arch/sh/include/asm/syscall_32.h | 12 ++++++++++++
arch/sparc/include/asm/syscall.h | 12 ++++++++++++
arch/um/include/asm/syscall-generic.h | 5 +++++
arch/x86/include/asm/syscall.h | 7 +++++++
arch/xtensa/include/asm/syscall.h | 7 +++++++
include/asm-generic/syscall.h | 14 ++++++++++++++
20 files changed, 197 insertions(+)
diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
index 89c1e1736356..728d625a10f1 100644
--- a/arch/arc/include/asm/syscall.h
+++ b/arch/arc/include/asm/syscall.h
@@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return -1;
}
+static inline void
+syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+ /*
+ * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+ * the target task is stopped for tracing on entering syscall, so
+ * there is no need to have the same check syscall_get_nr() has.
+ */
+ regs->r8 = nr;
+}
+
static inline void
syscall_rollback(struct task_struct *task, struct pt_regs *regs)
{
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 21927fa0ae2b..18b102a30741 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct task_struct *task,
regs->ARM_r0 = (long) error ? error : val;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ if (nr == -1) {
+ task_thread_info(task)->abi_syscall = -1;
+ /*
+ * When the syscall number is set to -1, the syscall will be
+ * skipped. In this case the syscall return value has to be
+ * set explicitly, otherwise the first syscall argument is
+ * returned as the syscall return value.
+ */
+ syscall_set_return_value(task, regs, -ENOSYS, 0);
+ return;
+ }
+ if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
+ task_thread_info(task)->abi_syscall = nr;
+ return;
+ }
+ task_thread_info(task)->abi_syscall =
+ (task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
+ (nr & __NR_SYSCALL_MASK);
+}
+
#define SYSCALL_MAX_ARGS 7
static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index 76020b66286b..712daa90e643 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct task_struct *task,
regs->regs[0] = val;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->syscallno = nr;
+ if (nr == -1) {
+ /*
+ * When the syscall number is set to -1, the syscall will be
+ * skipped. In this case the syscall return value has to be
+ * set explicitly, otherwise the first syscall argument is
+ * returned as the syscall return value.
+ */
+ syscall_set_return_value(task, regs, -ENOSYS, 0);
+ }
+}
+
#define SYSCALL_MAX_ARGS 6
static inline void syscall_get_arguments(struct task_struct *task,
diff --git a/arch/hexagon/include/asm/syscall.h b/arch/hexagon/include/asm/syscall.h
index 1024a6548d78..70637261817a 100644
--- a/arch/hexagon/include/asm/syscall.h
+++ b/arch/hexagon/include/asm/syscall.h
@@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->r06;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->r06 = nr;
+}
+
static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned long *args)
diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
index ff415b3c0a8e..81d2733f7b94 100644
--- a/arch/loongarch/include/asm/syscall.h
+++ b/arch/loongarch/include/asm/syscall.h
@@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->regs[11];
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->regs[11] = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/m68k/include/asm/syscall.h b/arch/m68k/include/asm/syscall.h
index d1453e850cdd..bf84b160c2eb 100644
--- a/arch/m68k/include/asm/syscall.h
+++ b/arch/m68k/include/asm/syscall.h
@@ -14,6 +14,13 @@ static inline int syscall_get_nr(struct task_struct *task,
return regs->orig_d0;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->orig_d0 = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
index 5eb3f624cc59..b5b6b91fae3e 100644
--- a/arch/microblaze/include/asm/syscall.h
+++ b/arch/microblaze/include/asm/syscall.h
@@ -14,6 +14,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->r12;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->r12 = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
index 3163d1506fae..58d68205fd2c 100644
--- a/arch/mips/include/asm/syscall.h
+++ b/arch/mips/include/asm/syscall.h
@@ -41,6 +41,20 @@ static inline long syscall_get_nr(struct task_struct *task,
return task_thread_info(task)->syscall;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ /*
+ * New syscall number has to be assigned to regs[2] because
+ * syscall_trace_entry() loads it from there unconditionally.
+ *
+ * Consequently, if the syscall was indirect and nr != __NR_syscall,
+ * then after this assignment the syscall will cease to be indirect.
+ */
+ task_thread_info(task)->syscall = regs->regs[2] = nr;
+}
+
static inline void mips_syscall_update_nr(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
index 526449edd768..8e3eb1d689bb 100644
--- a/arch/nios2/include/asm/syscall.h
+++ b/arch/nios2/include/asm/syscall.h
@@ -15,6 +15,11 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return regs->r2;
}
+static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+ regs->r2 = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
index e6383be2a195..5e037d9659c5 100644
--- a/arch/openrisc/include/asm/syscall.h
+++ b/arch/openrisc/include/asm/syscall.h
@@ -25,6 +25,12 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return regs->orig_gpr11;
}
+static inline void
+syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+ regs->orig_gpr11 = nr;
+}
+
static inline void
syscall_rollback(struct task_struct *task, struct pt_regs *regs)
{
diff --git a/arch/parisc/include/asm/syscall.h b/arch/parisc/include/asm/syscall.h
index b146d0ae4c77..c11222798ab2 100644
--- a/arch/parisc/include/asm/syscall.h
+++ b/arch/parisc/include/asm/syscall.h
@@ -17,6 +17,13 @@ static inline long syscall_get_nr(struct task_struct *tsk,
return regs->gr[20];
}
+static inline void syscall_set_nr(struct task_struct *tsk,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->gr[20] = nr;
+}
+
static inline void syscall_get_arguments(struct task_struct *tsk,
struct pt_regs *regs,
unsigned long *args)
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index 521f279e6b33..7505dcfed247 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -39,6 +39,16 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return -1;
}
+static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+ /*
+ * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+ * the target task is stopped for tracing on entering syscall, so
+ * there is no need to have the same check syscall_get_nr() has.
+ */
+ regs->gpr[0] = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 8d389ba995c8..a5281cdf2b10 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -30,6 +30,13 @@ static inline int syscall_get_nr(struct task_struct *task,
return regs->a7;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->a7 = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index b3dd883699e7..12cd0c60c07b 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -24,6 +24,18 @@ static inline long syscall_get_nr(struct task_struct *task,
(regs->int_code & 0xffff) : -1;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ /*
+ * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+ * the target task is stopped for tracing on entering syscall, so
+ * there is no need to have the same check syscall_get_nr() has.
+ */
+ regs->int_code = (regs->int_code & ~0xffff) | (nr & 0xffff);
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index cb51a7528384..7027d87d901d 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -15,6 +15,18 @@ static inline long syscall_get_nr(struct task_struct *task,
return (regs->tra >= 0) ? regs->regs[3] : -1L;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ /*
+ * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+ * the target task is stopped for tracing on entering syscall, so
+ * there is no need to have the same check syscall_get_nr() has.
+ */
+ regs->regs[3] = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 62a5a78804c4..b0233924d323 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -25,6 +25,18 @@ static inline long syscall_get_nr(struct task_struct *task,
return (syscall_p ? regs->u_regs[UREG_G1] : -1L);
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ /*
+ * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
+ * the target task is stopped for tracing on entering syscall, so
+ * there is no need to have the same check syscall_get_nr() has.
+ */
+ regs->u_regs[UREG_G1] = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
index 2984feb9d576..bcd73bcfe577 100644
--- a/arch/um/include/asm/syscall-generic.h
+++ b/arch/um/include/asm/syscall-generic.h
@@ -21,6 +21,11 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return PT_REGS_SYSCALL_NR(regs);
}
+static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
+{
+ PT_REGS_SYSCALL_NR(regs) = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index b9c249dd9e3d..c10dbb74cd00 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -38,6 +38,13 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
return regs->orig_ax;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->orig_ax = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
index f9a671cbf933..7db3b489c8ad 100644
--- a/arch/xtensa/include/asm/syscall.h
+++ b/arch/xtensa/include/asm/syscall.h
@@ -28,6 +28,13 @@ static inline long syscall_get_nr(struct task_struct *task,
return regs->syscall;
}
+static inline void syscall_set_nr(struct task_struct *task,
+ struct pt_regs *regs,
+ int nr)
+{
+ regs->syscall = nr;
+}
+
static inline void syscall_rollback(struct task_struct *task,
struct pt_regs *regs)
{
diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 0f7b9a493de7..e33fd4e783c1 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -37,6 +37,20 @@ struct pt_regs;
*/
int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
+/**
+ * syscall_set_nr - change the system call a task is executing
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ * @nr: system call number
+ *
+ * Changes the system call number @task is about to execute.
+ *
+ * It's only valid to call this when @task is stopped for tracing on
+ * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
+ * %SYSCALL_WORK_SYSCALL_AUDIT.
+ */
+void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr);
+
/**
* syscall_rollback - roll back registers after an aborted system call
* @task: task of interest, must be in system call exit tracing
--
ldv
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-13 17:10 ` [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value() Dmitry V. Levin
@ 2025-01-13 17:34 ` Christophe Leroy
2025-01-13 17:54 ` Dmitry V. Levin
2025-01-14 17:04 ` Dmitry V. Levin
2025-01-14 13:00 ` Alexey Gladkov
1 sibling, 2 replies; 39+ messages in thread
From: Christophe Leroy @ 2025-01-13 17:34 UTC (permalink / raw)
To: Dmitry V. Levin, Oleg Nesterov, Michael Ellerman
Cc: Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> Bring syscall_set_return_value() in sync with syscall_get_error(),
> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>
> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> syscall_set_return_value()").
There is a clear detailed explanation in that commit of why it needs to
be done.
If you think that commit is wrong you have to explain why with at least
the same level of details.
>
> Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> ---
> arch/powerpc/include/asm/syscall.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> index 3dd36c5e334a..422d7735ace6 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> */
> if (error) {
> regs->ccr |= 0x10000000L;
> - regs->gpr[3] = error;
> + /*
> + * In case of an error regs->gpr[3] contains
> + * a positive ERRORCODE.
> + */
> + regs->gpr[3] = -error;
> } else {
> regs->ccr &= ~0x10000000L;
> regs->gpr[3] = val;
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-13 17:34 ` Christophe Leroy
@ 2025-01-13 17:54 ` Dmitry V. Levin
2025-01-14 17:04 ` Dmitry V. Levin
1 sibling, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-13 17:54 UTC (permalink / raw)
To: Christophe Leroy
Cc: Oleg Nesterov, Michael Ellerman, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > Bring syscall_set_return_value() in sync with syscall_get_error(),
> > and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >
> > This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > syscall_set_return_value()").
>
> There is a clear detailed explanation in that commit of why it needs to
> be done.
>
> If you think that commit is wrong you have to explain why with at least
> the same level of details.
I'm sorry, I'm not by any means a powerpc expert to explain why that
commit was added in the first place, I wish Michael would be able to do it
himself. All I can say is that for some mysterious reason current
syscall_set_return_value() implementation assumes that in case of an error
regs->gpr[3] has to be negative, while, according to well-tested
syscall_get_error(), it has to be positive.
This is very visible with PTRACE_SET_SYSCALL_INFO that exposes
syscall_set_return_value() to userspace, and, in particular, with the
architecture-agnostic ptrace/set_syscall_info selftest added later in the
series.
> > diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> > index 3dd36c5e334a..422d7735ace6 100644
> > --- a/arch/powerpc/include/asm/syscall.h
> > +++ b/arch/powerpc/include/asm/syscall.h
> > @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> > */
> > if (error) {
> > regs->ccr |= 0x10000000L;
> > - regs->gpr[3] = error;
> > + /*
> > + * In case of an error regs->gpr[3] contains
> > + * a positive ERRORCODE.
> > + */
> > + regs->gpr[3] = -error;
> > } else {
> > regs->ccr &= ~0x10000000L;
> > regs->gpr[3] = val;
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-13 17:10 ` [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value() Dmitry V. Levin
2025-01-13 17:34 ` Christophe Leroy
@ 2025-01-14 13:00 ` Alexey Gladkov
2025-01-14 13:48 ` Dmitry V. Levin
1 sibling, 1 reply; 39+ messages in thread
From: Alexey Gladkov @ 2025-01-14 13:00 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Oleg Nesterov, Michael Ellerman, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 13, 2025 at 07:10:54PM +0200, Dmitry V. Levin wrote:
> Bring syscall_set_return_value() in sync with syscall_get_error(),
> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>
> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> syscall_set_return_value()").
>
> Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> ---
> arch/powerpc/include/asm/syscall.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> index 3dd36c5e334a..422d7735ace6 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> */
> if (error) {
> regs->ccr |= 0x10000000L;
> - regs->gpr[3] = error;
> + /*
> + * In case of an error regs->gpr[3] contains
> + * a positive ERRORCODE.
> + */
> + regs->gpr[3] = -error;
After this change the syscall_get_error() will return positive value if
the system call failed. Since syscall_get_error() still believes
regs->gpr[3] is still positive in case !trap_is_scv().
Or am I missing something?
It looks like the selftest you mentioned in the commit message doesn't
check the !trap_is_scv() branch.
> } else {
> regs->ccr &= ~0x10000000L;
> regs->gpr[3] = val;
> --
> ldv
--
Rgrds, legion
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-14 13:00 ` Alexey Gladkov
@ 2025-01-14 13:48 ` Dmitry V. Levin
2025-01-14 14:53 ` Alexey Gladkov
0 siblings, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-14 13:48 UTC (permalink / raw)
To: Alexey Gladkov
Cc: Oleg Nesterov, Michael Ellerman, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, linuxppc-dev, linux-kernel
On Tue, Jan 14, 2025 at 02:00:16PM +0100, Alexey Gladkov wrote:
> On Mon, Jan 13, 2025 at 07:10:54PM +0200, Dmitry V. Levin wrote:
> > Bring syscall_set_return_value() in sync with syscall_get_error(),
> > and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >
> > This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > syscall_set_return_value()").
> >
> > Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> > ---
> > arch/powerpc/include/asm/syscall.h | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> > index 3dd36c5e334a..422d7735ace6 100644
> > --- a/arch/powerpc/include/asm/syscall.h
> > +++ b/arch/powerpc/include/asm/syscall.h
> > @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> > */
> > if (error) {
> > regs->ccr |= 0x10000000L;
> > - regs->gpr[3] = error;
> > + /*
> > + * In case of an error regs->gpr[3] contains
> > + * a positive ERRORCODE.
> > + */
> > + regs->gpr[3] = -error;
>
> After this change the syscall_get_error() will return positive value if
> the system call failed. Since syscall_get_error() still believes
> regs->gpr[3] is still positive in case !trap_is_scv().
>
> Or am I missing something?
syscall_get_error() does the following in case of !trap_is_scv():
/*
* If the system call failed,
* regs->gpr[3] contains a positive ERRORCODE.
*/
return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
That is, in !trap_is_scv() case it assumes that regs->gpr[3] is positive
and is going to return a negative value (-ERRORCODE).
> It looks like the selftest you mentioned in the commit message doesn't
> check the !trap_is_scv() branch.
The selftest is architecture-agnostic, it just executes syscalls and
checks whether the data returned by PTRACE_GET_SYSCALL_INFO meets
expectations. Do you mean that syscall() is not good enough for syscall
invocation from coverage perspective on powerpc?
See also commit d72500f99284 ("powerpc/64s/syscall: Fix ptrace syscall
info with scv syscalls").
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-14 13:48 ` Dmitry V. Levin
@ 2025-01-14 14:53 ` Alexey Gladkov
0 siblings, 0 replies; 39+ messages in thread
From: Alexey Gladkov @ 2025-01-14 14:53 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Oleg Nesterov, Michael Ellerman, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Christophe Leroy,
Naveen N Rao, linuxppc-dev, linux-kernel
On Tue, Jan 14, 2025 at 03:48:44PM +0200, Dmitry V. Levin wrote:
> On Tue, Jan 14, 2025 at 02:00:16PM +0100, Alexey Gladkov wrote:
> > On Mon, Jan 13, 2025 at 07:10:54PM +0200, Dmitry V. Levin wrote:
> > > Bring syscall_set_return_value() in sync with syscall_get_error(),
> > > and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > >
> > > This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > > syscall_set_return_value()").
> > >
> > > Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> > > ---
> > > arch/powerpc/include/asm/syscall.h | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> > > index 3dd36c5e334a..422d7735ace6 100644
> > > --- a/arch/powerpc/include/asm/syscall.h
> > > +++ b/arch/powerpc/include/asm/syscall.h
> > > @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> > > */
> > > if (error) {
> > > regs->ccr |= 0x10000000L;
> > > - regs->gpr[3] = error;
> > > + /*
> > > + * In case of an error regs->gpr[3] contains
> > > + * a positive ERRORCODE.
> > > + */
> > > + regs->gpr[3] = -error;
> >
> > After this change the syscall_get_error() will return positive value if
> > the system call failed. Since syscall_get_error() still believes
> > regs->gpr[3] is still positive in case !trap_is_scv().
> >
> > Or am I missing something?
>
> syscall_get_error() does the following in case of !trap_is_scv():
>
> /*
> * If the system call failed,
> * regs->gpr[3] contains a positive ERRORCODE.
> */
> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>
> That is, in !trap_is_scv() case it assumes that regs->gpr[3] is positive
> and is going to return a negative value (-ERRORCODE).
Yeah. Now I see it.
if (trap_is_scv(regs)) {
regs->result = -EINTR;
regs->gpr[3] = -EINTR;
} else {
regs->result = -EINTR;
regs->gpr[3] = EINTR;
regs->ccr |= 0x10000000;
}
Two different APIs imply gpr[3] with a different sign.
You can add:
Reviewed-by: Alexey Gladkov <legion@kernel.org>
> > It looks like the selftest you mentioned in the commit message doesn't
> > check the !trap_is_scv() branch.
>
> The selftest is architecture-agnostic, it just executes syscalls and
> checks whether the data returned by PTRACE_GET_SYSCALL_INFO meets
> expectations. Do you mean that syscall() is not good enough for syscall
> invocation from coverage perspective on powerpc?
>
> See also commit d72500f99284 ("powerpc/64s/syscall: Fix ptrace syscall
> info with scv syscalls").
>
>
> --
> ldv
--
Rgrds, legion
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-13 17:34 ` Christophe Leroy
2025-01-13 17:54 ` Dmitry V. Levin
@ 2025-01-14 17:04 ` Dmitry V. Levin
2025-01-20 13:51 ` Christophe Leroy
1 sibling, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-14 17:04 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > Bring syscall_set_return_value() in sync with syscall_get_error(),
> > and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >
> > This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > syscall_set_return_value()").
>
> There is a clear detailed explanation in that commit of why it needs to
> be done.
>
> If you think that commit is wrong you have to explain why with at least
> the same level of details.
OK, please have a look whether this explanation is clear and detailed enough:
=======
powerpc: properly negate error in syscall_set_return_value()
When syscall_set_return_value() is used to set an error code, the caller
specifies it as a negative value in -ERRORCODE form.
In !trap_is_scv case the error code is traditionally stored as follows:
gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
Here are a few examples to illustrate this convention. The first one
is from syscall_get_error():
/*
* If the system call failed,
* regs->gpr[3] contains a positive ERRORCODE.
*/
return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
The second example is from regs_return_value():
if (is_syscall_success(regs))
return regs->gpr[3];
else
return -regs->gpr[3];
The third example is from check_syscall_restart():
regs->result = -EINTR;
regs->gpr[3] = EINTR;
regs->ccr |= 0x10000000;
Compared with these examples, the failure of syscall_set_return_value()
to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
/*
* In the general case it's not obvious that we must deal with
* CCR here, as the syscall exit path will also do that for us.
* However there are some places, eg. the signal code, which
* check ccr to decide if the value in r3 is actually an error.
*/
if (error) {
regs->ccr |= 0x10000000L;
regs->gpr[3] = error;
} else {
regs->ccr &= ~0x10000000L;
regs->gpr[3] = val;
}
This fix brings syscall_set_return_value() in sync with syscall_get_error()
and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
=======
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
2025-01-13 17:11 ` [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value() Dmitry V. Levin
@ 2025-01-16 2:20 ` Charlie Jenkins
2025-01-17 0:59 ` H. Peter Anvin
0 siblings, 1 reply; 39+ messages in thread
From: Charlie Jenkins @ 2025-01-16 2:20 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Oleg Nesterov, Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Vineet Gupta, Russell King,
Will Deacon, Guo Ren, Brian Cain, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Yoshinori Sato, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
Richard Weinberger, Anton Ivanov, Johannes Berg, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Chris Zankel, Max Filippov, Arnd Bergmann, linux-snps-arc,
linux-kernel, linux-arm-kernel, linux-csky, linux-hexagon,
loongarch, linux-mips, linux-openrisc, linux-parisc, linuxppc-dev,
linux-riscv, linux-s390, linux-sh, sparclinux, linux-um,
linux-arch
On Mon, Jan 13, 2025 at 07:11:40PM +0200, Dmitry V. Levin wrote:
> These functions are going to be needed on all HAVE_ARCH_TRACEHOOK
> architectures to implement PTRACE_SET_SYSCALL_INFO API.
>
> This partially reverts commit 7962c2eddbfe ("arch: remove unused
> function syscall_set_arguments()") by reusing some of old
> syscall_set_arguments() implementations.
>
> Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> ---
>
> Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by
> looking at mips_get_syscall_arg() and the result passes tests in qemu on
> mips O32, mips64 O32, mips64 N32, and mips64 N64.
>
> arch/arc/include/asm/syscall.h | 14 +++++++++++
> arch/arm/include/asm/syscall.h | 13 ++++++++++
> arch/arm64/include/asm/syscall.h | 13 ++++++++++
> arch/csky/include/asm/syscall.h | 13 ++++++++++
> arch/hexagon/include/asm/syscall.h | 14 +++++++++++
> arch/loongarch/include/asm/syscall.h | 8 ++++++
> arch/mips/include/asm/syscall.h | 32 ++++++++++++++++++++++++
> arch/nios2/include/asm/syscall.h | 11 ++++++++
> arch/openrisc/include/asm/syscall.h | 7 ++++++
> arch/parisc/include/asm/syscall.h | 12 +++++++++
> arch/powerpc/include/asm/syscall.h | 10 ++++++++
> arch/riscv/include/asm/syscall.h | 9 +++++++
> arch/s390/include/asm/syscall.h | 12 +++++++++
> arch/sh/include/asm/syscall_32.h | 12 +++++++++
> arch/sparc/include/asm/syscall.h | 10 ++++++++
> arch/um/include/asm/syscall-generic.h | 14 +++++++++++
> arch/x86/include/asm/syscall.h | 36 +++++++++++++++++++++++++++
> arch/xtensa/include/asm/syscall.h | 11 ++++++++
> include/asm-generic/syscall.h | 16 ++++++++++++
> 19 files changed, 267 insertions(+)
>
> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
> index 9709256e31c8..89c1e1736356 100644
> --- a/arch/arc/include/asm/syscall.h
> +++ b/arch/arc/include/asm/syscall.h
> @@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
> }
> }
>
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> + unsigned long *args)
> +{
> + unsigned long *inside_ptregs = ®s->r0;
> + unsigned int n = 6;
> + unsigned int i = 0;
> +
> + while (n--) {
> + *inside_ptregs = args[i++];
> + inside_ptregs--;
> + }
> +}
> +
> static inline int
> syscall_get_arch(struct task_struct *task)
> {
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index fe4326d938c1..21927fa0ae2b 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
> memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
> + /*
> + * Also copy the first argument into ARM_ORIG_r0
> + * so that syscall_get_arguments() would return it
> + * instead of the previous value.
> + */
> + regs->ARM_ORIG_r0 = regs->ARM_r0;
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> /* ARM tasks don't change audit architectures on the fly. */
> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> index ab8e14b96f68..76020b66286b 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
> memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
> + /*
> + * Also copy the first argument into orig_x0
> + * so that syscall_get_arguments() would return it
> + * instead of the previous value.
> + */
> + regs->orig_x0 = regs->regs[0];
> +}
> +
> /*
> * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
> * AArch64 has the same system calls both on little- and big- endian.
> diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
> index 0de5734950bf..30403f7a0487 100644
> --- a/arch/csky/include/asm/syscall.h
> +++ b/arch/csky/include/asm/syscall.h
> @@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
> memcpy(args, ®s->a1, 5 * sizeof(args[0]));
> }
>
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + memcpy(®s->a0, args, 6 * sizeof(regs->a0));
> + /*
> + * Also copy the first argument into orig_x0
> + * so that syscall_get_arguments() would return it
> + * instead of the previous value.
> + */
> + regs->orig_a0 = regs->a0;
> +}
> +
> static inline int
> syscall_get_arch(struct task_struct *task)
> {
> diff --git a/arch/hexagon/include/asm/syscall.h b/arch/hexagon/include/asm/syscall.h
> index f6e454f18038..1024a6548d78 100644
> --- a/arch/hexagon/include/asm/syscall.h
> +++ b/arch/hexagon/include/asm/syscall.h
> @@ -33,6 +33,13 @@ static inline void syscall_get_arguments(struct task_struct *task,
> memcpy(args, &(®s->r00)[0], 6 * sizeof(args[0]));
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + unsigned long *args)
> +{
> + memcpy(&(®s->r00)[0], args, 6 * sizeof(args[0]));
> +}
> +
> static inline long syscall_get_error(struct task_struct *task,
> struct pt_regs *regs)
> {
> @@ -45,6 +52,13 @@ static inline long syscall_get_return_value(struct task_struct *task,
> return regs->r00;
> }
>
> +static inline void syscall_set_return_value(struct task_struct *task,
> + struct pt_regs *regs,
> + int error, long val)
> +{
> + regs->r00 = (long) error ?: val;
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> return AUDIT_ARCH_HEXAGON;
> diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
> index e286dc58476e..ff415b3c0a8e 100644
> --- a/arch/loongarch/include/asm/syscall.h
> +++ b/arch/loongarch/include/asm/syscall.h
> @@ -61,6 +61,14 @@ static inline void syscall_get_arguments(struct task_struct *task,
> memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + unsigned long *args)
> +{
> + regs->orig_a0 = args[0];
> + memcpy(®s->regs[5], &args[1], 5 * sizeof(long));
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> return AUDIT_ARCH_LOONGARCH64;
> diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
> index 2f85f2d8f754..3163d1506fae 100644
> --- a/arch/mips/include/asm/syscall.h
> +++ b/arch/mips/include/asm/syscall.h
> @@ -76,6 +76,23 @@ static inline void mips_get_syscall_arg(unsigned long *arg,
> #endif
> }
>
> +static inline void mips_set_syscall_arg(unsigned long *arg,
> + struct task_struct *task, struct pt_regs *regs, unsigned int n)
> +{
> +#ifdef CONFIG_32BIT
> + switch (n) {
> + case 0: case 1: case 2: case 3:
> + regs->regs[4 + n] = *arg;
> + return;
> + case 4: case 5: case 6: case 7:
> + *arg = regs->pad0[n] = *arg;
> + return;
> + }
> +#else
> + regs->regs[4 + n] = *arg;
> +#endif
> +}
> +
> static inline long syscall_get_error(struct task_struct *task,
> struct pt_regs *regs)
> {
> @@ -122,6 +139,21 @@ static inline void syscall_get_arguments(struct task_struct *task,
> mips_get_syscall_arg(args++, task, regs, i++);
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + unsigned long *args)
> +{
> + unsigned int i = 0;
> + unsigned int n = 6;
> +
> + /* O32 ABI syscall() */
> + if (mips_syscall_is_indirect(task, regs))
> + i++;
> +
> + while (n--)
> + mips_set_syscall_arg(args++, task, regs, i++);
> +}
> +
> extern const unsigned long sys_call_table[];
> extern const unsigned long sys32_call_table[];
> extern const unsigned long sysn32_call_table[];
> diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
> index fff52205fb65..526449edd768 100644
> --- a/arch/nios2/include/asm/syscall.h
> +++ b/arch/nios2/include/asm/syscall.h
> @@ -58,6 +58,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
> *args = regs->r9;
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs, const unsigned long *args)
> +{
> + regs->r4 = *args++;
> + regs->r5 = *args++;
> + regs->r6 = *args++;
> + regs->r7 = *args++;
> + regs->r8 = *args++;
> + regs->r9 = *args;
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> return AUDIT_ARCH_NIOS2;
> diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
> index 903ed882bdec..e6383be2a195 100644
> --- a/arch/openrisc/include/asm/syscall.h
> +++ b/arch/openrisc/include/asm/syscall.h
> @@ -57,6 +57,13 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
> memcpy(args, ®s->gpr[3], 6 * sizeof(args[0]));
> }
>
> +static inline void
> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> return AUDIT_ARCH_OPENRISC;
> diff --git a/arch/parisc/include/asm/syscall.h b/arch/parisc/include/asm/syscall.h
> index 00b127a5e09b..b146d0ae4c77 100644
> --- a/arch/parisc/include/asm/syscall.h
> +++ b/arch/parisc/include/asm/syscall.h
> @@ -29,6 +29,18 @@ static inline void syscall_get_arguments(struct task_struct *tsk,
> args[0] = regs->gr[26];
> }
>
> +static inline void syscall_set_arguments(struct task_struct *tsk,
> + struct pt_regs *regs,
> + unsigned long *args)
> +{
> + regs->gr[21] = args[5];
> + regs->gr[22] = args[4];
> + regs->gr[23] = args[3];
> + regs->gr[24] = args[2];
> + regs->gr[25] = args[1];
> + regs->gr[26] = args[0];
> +}
> +
> static inline long syscall_get_error(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> index 422d7735ace6..521f279e6b33 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -114,6 +114,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
> }
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
> +
> + /* Also copy the first argument into orig_gpr3 */
> + regs->orig_gpr3 = args[0];
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> if (is_tsk_32bit_task(task))
> diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
> index 121fff429dce..8d389ba995c8 100644
> --- a/arch/riscv/include/asm/syscall.h
> +++ b/arch/riscv/include/asm/syscall.h
> @@ -66,6 +66,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
> memcpy(args, ®s->a1, 5 * sizeof(args[0]));
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + regs->orig_a0 = args[0];
> + args++;
> + memcpy(®s->a1, args, 5 * sizeof(regs->a1));
> +}
Looks good for riscv.
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> #ifdef CONFIG_64BIT
> diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
> index 27e3d804b311..b3dd883699e7 100644
> --- a/arch/s390/include/asm/syscall.h
> +++ b/arch/s390/include/asm/syscall.h
> @@ -78,6 +78,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
> args[0] = regs->orig_gpr2 & mask;
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + unsigned int n = 6;
> +
> + while (n-- > 0)
> + if (n > 0)
> + regs->gprs[2 + n] = args[n];
> + regs->orig_gpr2 = args[0];
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> #ifdef CONFIG_COMPAT
> diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
> index d87738eebe30..cb51a7528384 100644
> --- a/arch/sh/include/asm/syscall_32.h
> +++ b/arch/sh/include/asm/syscall_32.h
> @@ -57,6 +57,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
> args[0] = regs->regs[4];
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + regs->regs[1] = args[5];
> + regs->regs[0] = args[4];
> + regs->regs[7] = args[3];
> + regs->regs[6] = args[2];
> + regs->regs[5] = args[1];
> + regs->regs[4] = args[0];
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> int arch = AUDIT_ARCH_SH;
> diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
> index 20c109ac8cc9..62a5a78804c4 100644
> --- a/arch/sparc/include/asm/syscall.h
> +++ b/arch/sparc/include/asm/syscall.h
> @@ -117,6 +117,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
> }
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < 6; i++)
> + regs->u_regs[UREG_I0 + i] = args[i];
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> #if defined(CONFIG_SPARC64) && defined(CONFIG_COMPAT)
> diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
> index 172b74143c4b..2984feb9d576 100644
> --- a/arch/um/include/asm/syscall-generic.h
> +++ b/arch/um/include/asm/syscall-generic.h
> @@ -62,6 +62,20 @@ static inline void syscall_get_arguments(struct task_struct *task,
> *args = UPT_SYSCALL_ARG6(r);
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + struct uml_pt_regs *r = ®s->regs;
> +
> + UPT_SYSCALL_ARG1(r) = *args++;
> + UPT_SYSCALL_ARG2(r) = *args++;
> + UPT_SYSCALL_ARG3(r) = *args++;
> + UPT_SYSCALL_ARG4(r) = *args++;
> + UPT_SYSCALL_ARG5(r) = *args++;
> + UPT_SYSCALL_ARG6(r) = *args;
> +}
> +
> /* See arch/x86/um/asm/syscall.h for syscall_get_arch() definition. */
>
> #endif /* __UM_SYSCALL_GENERIC_H */
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index 7c488ff0c764..b9c249dd9e3d 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -90,6 +90,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
> args[5] = regs->bp;
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + regs->bx = args[0];
> + regs->cx = args[1];
> + regs->dx = args[2];
> + regs->si = args[3];
> + regs->di = args[4];
> + regs->bp = args[5];
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> return AUDIT_ARCH_I386;
> @@ -121,6 +133,30 @@ static inline void syscall_get_arguments(struct task_struct *task,
> }
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> +# ifdef CONFIG_IA32_EMULATION
> + if (task->thread_info.status & TS_COMPAT) {
> + regs->bx = *args++;
> + regs->cx = *args++;
> + regs->dx = *args++;
> + regs->si = *args++;
> + regs->di = *args++;
> + regs->bp = *args;
> + } else
> +# endif
> + {
> + regs->di = *args++;
> + regs->si = *args++;
> + regs->dx = *args++;
> + regs->r10 = *args++;
> + regs->r8 = *args++;
> + regs->r9 = *args;
> + }
> +}
> +
> static inline int syscall_get_arch(struct task_struct *task)
> {
> /* x32 tasks should be considered AUDIT_ARCH_X86_64. */
> diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
> index 5ee974bf8330..f9a671cbf933 100644
> --- a/arch/xtensa/include/asm/syscall.h
> +++ b/arch/xtensa/include/asm/syscall.h
> @@ -68,6 +68,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
> args[i] = regs->areg[reg[i]];
> }
>
> +static inline void syscall_set_arguments(struct task_struct *task,
> + struct pt_regs *regs,
> + const unsigned long *args)
> +{
> + static const unsigned int reg[] = XTENSA_SYSCALL_ARGUMENT_REGS;
> + unsigned int i;
> +
> + for (i = 0; i < 6; ++i)
> + regs->areg[reg[i]] = args[i];
> +}
> +
> asmlinkage long xtensa_rt_sigreturn(void);
> asmlinkage long xtensa_shmat(int, char __user *, int);
> asmlinkage long xtensa_fadvise64_64(int, int,
> diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
> index 5a80fe728dc8..0f7b9a493de7 100644
> --- a/include/asm-generic/syscall.h
> +++ b/include/asm-generic/syscall.h
> @@ -117,6 +117,22 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
> void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
> unsigned long *args);
>
> +/**
> + * syscall_set_arguments - change system call parameter value
> + * @task: task of interest, must be in system call entry tracing
> + * @regs: task_pt_regs() of @task
> + * @args: array of argument values to store
> + *
> + * Changes 6 arguments to the system call.
> + * The first argument gets value @args[0], and so on.
> + *
> + * It's only valid to call this when @task is stopped for tracing on
> + * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
> + * %SYSCALL_WORK_SYSCALL_AUDIT.
> + */
> +void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
> + const unsigned long *args);
> +
> /**
> * syscall_get_arch - return the AUDIT_ARCH for the current system call
> * @task: task of interest, must be blocked
> --
> ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 4/7] syscall.h: introduce syscall_set_nr()
2025-01-13 17:11 ` [PATCH v2 4/7] syscall.h: introduce syscall_set_nr() Dmitry V. Levin
@ 2025-01-16 2:20 ` Charlie Jenkins
0 siblings, 0 replies; 39+ messages in thread
From: Charlie Jenkins @ 2025-01-16 2:20 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Oleg Nesterov, Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Vineet Gupta, Russell King,
Catalin Marinas, Will Deacon, Brian Cain, Huacai Chen,
WANG Xuerui, Geert Uytterhoeven, Michal Simek,
Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, James E.J. Bottomley, Helge Deller,
Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
Madhavan Srinivasan, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Yoshinori Sato, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
Richard Weinberger, Anton Ivanov, Johannes Berg, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Chris Zankel, Max Filippov, Arnd Bergmann, linux-snps-arc,
linux-kernel, linux-arm-kernel, linux-hexagon, loongarch,
linux-m68k, linux-mips, linux-openrisc, linux-parisc,
linuxppc-dev, linux-riscv, linux-s390, linux-sh, sparclinux,
linux-um, linux-arch
On Mon, Jan 13, 2025 at 07:11:51PM +0200, Dmitry V. Levin wrote:
> Similar to syscall_set_arguments() that complements
> syscall_get_arguments(), introduce syscall_set_nr()
> that complements syscall_get_nr().
>
> syscall_set_nr() is going to be needed along with
> syscall_set_arguments() on all HAVE_ARCH_TRACEHOOK
> architectures to implement PTRACE_SET_SYSCALL_INFO API.
>
> Signed-off-by: Dmitry V. Levin <ldv@strace.io>
> ---
> arch/arc/include/asm/syscall.h | 11 +++++++++++
> arch/arm/include/asm/syscall.h | 24 ++++++++++++++++++++++++
> arch/arm64/include/asm/syscall.h | 16 ++++++++++++++++
> arch/hexagon/include/asm/syscall.h | 7 +++++++
> arch/loongarch/include/asm/syscall.h | 7 +++++++
> arch/m68k/include/asm/syscall.h | 7 +++++++
> arch/microblaze/include/asm/syscall.h | 7 +++++++
> arch/mips/include/asm/syscall.h | 14 ++++++++++++++
> arch/nios2/include/asm/syscall.h | 5 +++++
> arch/openrisc/include/asm/syscall.h | 6 ++++++
> arch/parisc/include/asm/syscall.h | 7 +++++++
> arch/powerpc/include/asm/syscall.h | 10 ++++++++++
> arch/riscv/include/asm/syscall.h | 7 +++++++
> arch/s390/include/asm/syscall.h | 12 ++++++++++++
> arch/sh/include/asm/syscall_32.h | 12 ++++++++++++
> arch/sparc/include/asm/syscall.h | 12 ++++++++++++
> arch/um/include/asm/syscall-generic.h | 5 +++++
> arch/x86/include/asm/syscall.h | 7 +++++++
> arch/xtensa/include/asm/syscall.h | 7 +++++++
> include/asm-generic/syscall.h | 14 ++++++++++++++
> 20 files changed, 197 insertions(+)
>
> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
> index 89c1e1736356..728d625a10f1 100644
> --- a/arch/arc/include/asm/syscall.h
> +++ b/arch/arc/include/asm/syscall.h
> @@ -23,6 +23,17 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return -1;
> }
>
> +static inline void
> +syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + /*
> + * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> + * the target task is stopped for tracing on entering syscall, so
> + * there is no need to have the same check syscall_get_nr() has.
> + */
> + regs->r8 = nr;
> +}
> +
> static inline void
> syscall_rollback(struct task_struct *task, struct pt_regs *regs)
> {
> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
> index 21927fa0ae2b..18b102a30741 100644
> --- a/arch/arm/include/asm/syscall.h
> +++ b/arch/arm/include/asm/syscall.h
> @@ -68,6 +68,30 @@ static inline void syscall_set_return_value(struct task_struct *task,
> regs->ARM_r0 = (long) error ? error : val;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + if (nr == -1) {
> + task_thread_info(task)->abi_syscall = -1;
> + /*
> + * When the syscall number is set to -1, the syscall will be
> + * skipped. In this case the syscall return value has to be
> + * set explicitly, otherwise the first syscall argument is
> + * returned as the syscall return value.
> + */
> + syscall_set_return_value(task, regs, -ENOSYS, 0);
> + return;
> + }
> + if ((IS_ENABLED(CONFIG_AEABI) && !IS_ENABLED(CONFIG_OABI_COMPAT))) {
> + task_thread_info(task)->abi_syscall = nr;
> + return;
> + }
> + task_thread_info(task)->abi_syscall =
> + (task_thread_info(task)->abi_syscall & ~__NR_SYSCALL_MASK) |
> + (nr & __NR_SYSCALL_MASK);
> +}
> +
> #define SYSCALL_MAX_ARGS 7
>
> static inline void syscall_get_arguments(struct task_struct *task,
> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
> index 76020b66286b..712daa90e643 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -61,6 +61,22 @@ static inline void syscall_set_return_value(struct task_struct *task,
> regs->regs[0] = val;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->syscallno = nr;
> + if (nr == -1) {
> + /*
> + * When the syscall number is set to -1, the syscall will be
> + * skipped. In this case the syscall return value has to be
> + * set explicitly, otherwise the first syscall argument is
> + * returned as the syscall return value.
> + */
> + syscall_set_return_value(task, regs, -ENOSYS, 0);
> + }
> +}
> +
> #define SYSCALL_MAX_ARGS 6
>
> static inline void syscall_get_arguments(struct task_struct *task,
> diff --git a/arch/hexagon/include/asm/syscall.h b/arch/hexagon/include/asm/syscall.h
> index 1024a6548d78..70637261817a 100644
> --- a/arch/hexagon/include/asm/syscall.h
> +++ b/arch/hexagon/include/asm/syscall.h
> @@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
> return regs->r06;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->r06 = nr;
> +}
> +
> static inline void syscall_get_arguments(struct task_struct *task,
> struct pt_regs *regs,
> unsigned long *args)
> diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
> index ff415b3c0a8e..81d2733f7b94 100644
> --- a/arch/loongarch/include/asm/syscall.h
> +++ b/arch/loongarch/include/asm/syscall.h
> @@ -26,6 +26,13 @@ static inline long syscall_get_nr(struct task_struct *task,
> return regs->regs[11];
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->regs[11] = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/m68k/include/asm/syscall.h b/arch/m68k/include/asm/syscall.h
> index d1453e850cdd..bf84b160c2eb 100644
> --- a/arch/m68k/include/asm/syscall.h
> +++ b/arch/m68k/include/asm/syscall.h
> @@ -14,6 +14,13 @@ static inline int syscall_get_nr(struct task_struct *task,
> return regs->orig_d0;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->orig_d0 = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
> index 5eb3f624cc59..b5b6b91fae3e 100644
> --- a/arch/microblaze/include/asm/syscall.h
> +++ b/arch/microblaze/include/asm/syscall.h
> @@ -14,6 +14,13 @@ static inline long syscall_get_nr(struct task_struct *task,
> return regs->r12;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->r12 = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
> index 3163d1506fae..58d68205fd2c 100644
> --- a/arch/mips/include/asm/syscall.h
> +++ b/arch/mips/include/asm/syscall.h
> @@ -41,6 +41,20 @@ static inline long syscall_get_nr(struct task_struct *task,
> return task_thread_info(task)->syscall;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + /*
> + * New syscall number has to be assigned to regs[2] because
> + * syscall_trace_entry() loads it from there unconditionally.
> + *
> + * Consequently, if the syscall was indirect and nr != __NR_syscall,
> + * then after this assignment the syscall will cease to be indirect.
> + */
> + task_thread_info(task)->syscall = regs->regs[2] = nr;
> +}
> +
> static inline void mips_syscall_update_nr(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
> index 526449edd768..8e3eb1d689bb 100644
> --- a/arch/nios2/include/asm/syscall.h
> +++ b/arch/nios2/include/asm/syscall.h
> @@ -15,6 +15,11 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return regs->r2;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + regs->r2 = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
> index e6383be2a195..5e037d9659c5 100644
> --- a/arch/openrisc/include/asm/syscall.h
> +++ b/arch/openrisc/include/asm/syscall.h
> @@ -25,6 +25,12 @@ syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return regs->orig_gpr11;
> }
>
> +static inline void
> +syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + regs->orig_gpr11 = nr;
> +}
> +
> static inline void
> syscall_rollback(struct task_struct *task, struct pt_regs *regs)
> {
> diff --git a/arch/parisc/include/asm/syscall.h b/arch/parisc/include/asm/syscall.h
> index b146d0ae4c77..c11222798ab2 100644
> --- a/arch/parisc/include/asm/syscall.h
> +++ b/arch/parisc/include/asm/syscall.h
> @@ -17,6 +17,13 @@ static inline long syscall_get_nr(struct task_struct *tsk,
> return regs->gr[20];
> }
>
> +static inline void syscall_set_nr(struct task_struct *tsk,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->gr[20] = nr;
> +}
> +
> static inline void syscall_get_arguments(struct task_struct *tsk,
> struct pt_regs *regs,
> unsigned long *args)
> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> index 521f279e6b33..7505dcfed247 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -39,6 +39,16 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return -1;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + /*
> + * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> + * the target task is stopped for tracing on entering syscall, so
> + * there is no need to have the same check syscall_get_nr() has.
> + */
> + regs->gpr[0] = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
> index 8d389ba995c8..a5281cdf2b10 100644
> --- a/arch/riscv/include/asm/syscall.h
> +++ b/arch/riscv/include/asm/syscall.h
> @@ -30,6 +30,13 @@ static inline int syscall_get_nr(struct task_struct *task,
> return regs->a7;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->a7 = nr;
> +}
Looks good for riscv.
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
> index b3dd883699e7..12cd0c60c07b 100644
> --- a/arch/s390/include/asm/syscall.h
> +++ b/arch/s390/include/asm/syscall.h
> @@ -24,6 +24,18 @@ static inline long syscall_get_nr(struct task_struct *task,
> (regs->int_code & 0xffff) : -1;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + /*
> + * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> + * the target task is stopped for tracing on entering syscall, so
> + * there is no need to have the same check syscall_get_nr() has.
> + */
> + regs->int_code = (regs->int_code & ~0xffff) | (nr & 0xffff);
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
> index cb51a7528384..7027d87d901d 100644
> --- a/arch/sh/include/asm/syscall_32.h
> +++ b/arch/sh/include/asm/syscall_32.h
> @@ -15,6 +15,18 @@ static inline long syscall_get_nr(struct task_struct *task,
> return (regs->tra >= 0) ? regs->regs[3] : -1L;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + /*
> + * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> + * the target task is stopped for tracing on entering syscall, so
> + * there is no need to have the same check syscall_get_nr() has.
> + */
> + regs->regs[3] = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
> index 62a5a78804c4..b0233924d323 100644
> --- a/arch/sparc/include/asm/syscall.h
> +++ b/arch/sparc/include/asm/syscall.h
> @@ -25,6 +25,18 @@ static inline long syscall_get_nr(struct task_struct *task,
> return (syscall_p ? regs->u_regs[UREG_G1] : -1L);
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + /*
> + * Unlike syscall_get_nr(), syscall_set_nr() can be called only when
> + * the target task is stopped for tracing on entering syscall, so
> + * there is no need to have the same check syscall_get_nr() has.
> + */
> + regs->u_regs[UREG_G1] = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
> index 2984feb9d576..bcd73bcfe577 100644
> --- a/arch/um/include/asm/syscall-generic.h
> +++ b/arch/um/include/asm/syscall-generic.h
> @@ -21,6 +21,11 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return PT_REGS_SYSCALL_NR(regs);
> }
>
> +static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr)
> +{
> + PT_REGS_SYSCALL_NR(regs) = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
> index b9c249dd9e3d..c10dbb74cd00 100644
> --- a/arch/x86/include/asm/syscall.h
> +++ b/arch/x86/include/asm/syscall.h
> @@ -38,6 +38,13 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
> return regs->orig_ax;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->orig_ax = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
> index f9a671cbf933..7db3b489c8ad 100644
> --- a/arch/xtensa/include/asm/syscall.h
> +++ b/arch/xtensa/include/asm/syscall.h
> @@ -28,6 +28,13 @@ static inline long syscall_get_nr(struct task_struct *task,
> return regs->syscall;
> }
>
> +static inline void syscall_set_nr(struct task_struct *task,
> + struct pt_regs *regs,
> + int nr)
> +{
> + regs->syscall = nr;
> +}
> +
> static inline void syscall_rollback(struct task_struct *task,
> struct pt_regs *regs)
> {
> diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
> index 0f7b9a493de7..e33fd4e783c1 100644
> --- a/include/asm-generic/syscall.h
> +++ b/include/asm-generic/syscall.h
> @@ -37,6 +37,20 @@ struct pt_regs;
> */
> int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
>
> +/**
> + * syscall_set_nr - change the system call a task is executing
> + * @task: task of interest, must be blocked
> + * @regs: task_pt_regs() of @task
> + * @nr: system call number
> + *
> + * Changes the system call number @task is about to execute.
> + *
> + * It's only valid to call this when @task is stopped for tracing on
> + * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
> + * %SYSCALL_WORK_SYSCALL_AUDIT.
> + */
> +void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int nr);
> +
> /**
> * syscall_rollback - roll back registers after an aborted system call
> * @task: task of interest, must be in system call exit tracing
> --
> ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
2025-01-16 2:20 ` Charlie Jenkins
@ 2025-01-17 0:59 ` H. Peter Anvin
2025-01-17 15:45 ` Eugene Syromyatnikov
0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2025-01-17 0:59 UTC (permalink / raw)
To: Charlie Jenkins, Dmitry V. Levin
Cc: Oleg Nesterov, Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Vineet Gupta, Russell King,
Will Deacon, Guo Ren, Brian Cain, Huacai Chen, WANG Xuerui,
Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn, Stefan Kristiansson,
Stafford Horne, James E.J. Bottomley, Helge Deller,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Naveen N Rao, Paul Walmsley, Palmer Dabbelt,
Albert Ou, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
Christian Borntraeger, Sven Schnelle, Yoshinori Sato, Rich Felker,
John Paul Adrian Glaubitz, David S. Miller, Andreas Larsson,
Richard Weinberger, Anton Ivanov, Johannes Berg, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Chris Zankel,
Max Filippov, Arnd Bergmann, linux-snps-arc, linux-kernel,
linux-arm-kernel, linux-csky, linux-hexagon, loongarch,
linux-mips, linux-openrisc, linux-parisc, linuxppc-dev,
linux-riscv, linux-s390, linux-sh, sparclinux, linux-um,
linux-arch
I link the concept of this patchset, but *please* make it clear in the
comments that this does not solve the issue of 64-bit kernel arguments
on 32-bit systems being ABI specific.
This isn't unique to this patch in any way; the only way to handle it is
by keeping track of each ABI.
On 1/15/25 18:20, Charlie Jenkins wrote:
> On Mon, Jan 13, 2025 at 07:11:40PM +0200, Dmitry V. Levin wrote:
>> These functions are going to be needed on all HAVE_ARCH_TRACEHOOK
>> architectures to implement PTRACE_SET_SYSCALL_INFO API.
>>
>> This partially reverts commit 7962c2eddbfe ("arch: remove unused
>> function syscall_set_arguments()") by reusing some of old
>> syscall_set_arguments() implementations.
>>
>> Signed-off-by: Dmitry V. Levin <ldv@strace.io>
>> ---
>>
>> Note that I'm not a MIPS expert, I just added mips_set_syscall_arg() by
>> looking at mips_get_syscall_arg() and the result passes tests in qemu on
>> mips O32, mips64 O32, mips64 N32, and mips64 N64.
>>
>> arch/arc/include/asm/syscall.h | 14 +++++++++++
>> arch/arm/include/asm/syscall.h | 13 ++++++++++
>> arch/arm64/include/asm/syscall.h | 13 ++++++++++
>> arch/csky/include/asm/syscall.h | 13 ++++++++++
>> arch/hexagon/include/asm/syscall.h | 14 +++++++++++
>> arch/loongarch/include/asm/syscall.h | 8 ++++++
>> arch/mips/include/asm/syscall.h | 32 ++++++++++++++++++++++++
>> arch/nios2/include/asm/syscall.h | 11 ++++++++
>> arch/openrisc/include/asm/syscall.h | 7 ++++++
>> arch/parisc/include/asm/syscall.h | 12 +++++++++
>> arch/powerpc/include/asm/syscall.h | 10 ++++++++
>> arch/riscv/include/asm/syscall.h | 9 +++++++
>> arch/s390/include/asm/syscall.h | 12 +++++++++
>> arch/sh/include/asm/syscall_32.h | 12 +++++++++
>> arch/sparc/include/asm/syscall.h | 10 ++++++++
>> arch/um/include/asm/syscall-generic.h | 14 +++++++++++
>> arch/x86/include/asm/syscall.h | 36 +++++++++++++++++++++++++++
>> arch/xtensa/include/asm/syscall.h | 11 ++++++++
>> include/asm-generic/syscall.h | 16 ++++++++++++
>> 19 files changed, 267 insertions(+)
>>
>> diff --git a/arch/arc/include/asm/syscall.h b/arch/arc/include/asm/syscall.h
>> index 9709256e31c8..89c1e1736356 100644
>> --- a/arch/arc/include/asm/syscall.h
>> +++ b/arch/arc/include/asm/syscall.h
>> @@ -67,6 +67,20 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
>> }
>> }
>>
>> +static inline void
>> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
>> + unsigned long *args)
>> +{
>> + unsigned long *inside_ptregs = ®s->r0;
>> + unsigned int n = 6;
>> + unsigned int i = 0;
>> +
>> + while (n--) {
>> + *inside_ptregs = args[i++];
>> + inside_ptregs--;
>> + }
>> +}
>> +
>> static inline int
>> syscall_get_arch(struct task_struct *task)
>> {
>> diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
>> index fe4326d938c1..21927fa0ae2b 100644
>> --- a/arch/arm/include/asm/syscall.h
>> +++ b/arch/arm/include/asm/syscall.h
>> @@ -80,6 +80,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> memcpy(args, ®s->ARM_r0 + 1, 5 * sizeof(args[0]));
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + memcpy(®s->ARM_r0, args, 6 * sizeof(args[0]));
>> + /*
>> + * Also copy the first argument into ARM_ORIG_r0
>> + * so that syscall_get_arguments() would return it
>> + * instead of the previous value.
>> + */
>> + regs->ARM_ORIG_r0 = regs->ARM_r0;
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> /* ARM tasks don't change audit architectures on the fly. */
>> diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
>> index ab8e14b96f68..76020b66286b 100644
>> --- a/arch/arm64/include/asm/syscall.h
>> +++ b/arch/arm64/include/asm/syscall.h
>> @@ -73,6 +73,19 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> memcpy(args, ®s->regs[1], 5 * sizeof(args[0]));
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + memcpy(®s->regs[0], args, 6 * sizeof(args[0]));
>> + /*
>> + * Also copy the first argument into orig_x0
>> + * so that syscall_get_arguments() would return it
>> + * instead of the previous value.
>> + */
>> + regs->orig_x0 = regs->regs[0];
>> +}
>> +
>> /*
>> * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
>> * AArch64 has the same system calls both on little- and big- endian.
>> diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
>> index 0de5734950bf..30403f7a0487 100644
>> --- a/arch/csky/include/asm/syscall.h
>> +++ b/arch/csky/include/asm/syscall.h
>> @@ -59,6 +59,19 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
>> memcpy(args, ®s->a1, 5 * sizeof(args[0]));
>> }
>>
>> +static inline void
>> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + memcpy(®s->a0, args, 6 * sizeof(regs->a0));
>> + /*
>> + * Also copy the first argument into orig_x0
>> + * so that syscall_get_arguments() would return it
>> + * instead of the previous value.
>> + */
>> + regs->orig_a0 = regs->a0;
>> +}
>> +
>> static inline int
>> syscall_get_arch(struct task_struct *task)
>> {
>> diff --git a/arch/hexagon/include/asm/syscall.h b/arch/hexagon/include/asm/syscall.h
>> index f6e454f18038..1024a6548d78 100644
>> --- a/arch/hexagon/include/asm/syscall.h
>> +++ b/arch/hexagon/include/asm/syscall.h
>> @@ -33,6 +33,13 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> memcpy(args, &(®s->r00)[0], 6 * sizeof(args[0]));
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + unsigned long *args)
>> +{
>> + memcpy(&(®s->r00)[0], args, 6 * sizeof(args[0]));
>> +}
>> +
>> static inline long syscall_get_error(struct task_struct *task,
>> struct pt_regs *regs)
>> {
>> @@ -45,6 +52,13 @@ static inline long syscall_get_return_value(struct task_struct *task,
>> return regs->r00;
>> }
>>
>> +static inline void syscall_set_return_value(struct task_struct *task,
>> + struct pt_regs *regs,
>> + int error, long val)
>> +{
>> + regs->r00 = (long) error ?: val;
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> return AUDIT_ARCH_HEXAGON;
>> diff --git a/arch/loongarch/include/asm/syscall.h b/arch/loongarch/include/asm/syscall.h
>> index e286dc58476e..ff415b3c0a8e 100644
>> --- a/arch/loongarch/include/asm/syscall.h
>> +++ b/arch/loongarch/include/asm/syscall.h
>> @@ -61,6 +61,14 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> memcpy(&args[1], ®s->regs[5], 5 * sizeof(long));
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + unsigned long *args)
>> +{
>> + regs->orig_a0 = args[0];
>> + memcpy(®s->regs[5], &args[1], 5 * sizeof(long));
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> return AUDIT_ARCH_LOONGARCH64;
>> diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
>> index 2f85f2d8f754..3163d1506fae 100644
>> --- a/arch/mips/include/asm/syscall.h
>> +++ b/arch/mips/include/asm/syscall.h
>> @@ -76,6 +76,23 @@ static inline void mips_get_syscall_arg(unsigned long *arg,
>> #endif
>> }
>>
>> +static inline void mips_set_syscall_arg(unsigned long *arg,
>> + struct task_struct *task, struct pt_regs *regs, unsigned int n)
>> +{
>> +#ifdef CONFIG_32BIT
>> + switch (n) {
>> + case 0: case 1: case 2: case 3:
>> + regs->regs[4 + n] = *arg;
>> + return;
>> + case 4: case 5: case 6: case 7:
>> + *arg = regs->pad0[n] = *arg;
>> + return;
>> + }
>> +#else
>> + regs->regs[4 + n] = *arg;
>> +#endif
>> +}
>> +
>> static inline long syscall_get_error(struct task_struct *task,
>> struct pt_regs *regs)
>> {
>> @@ -122,6 +139,21 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> mips_get_syscall_arg(args++, task, regs, i++);
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + unsigned long *args)
>> +{
>> + unsigned int i = 0;
>> + unsigned int n = 6;
>> +
>> + /* O32 ABI syscall() */
>> + if (mips_syscall_is_indirect(task, regs))
>> + i++;
>> +
>> + while (n--)
>> + mips_set_syscall_arg(args++, task, regs, i++);
>> +}
>> +
>> extern const unsigned long sys_call_table[];
>> extern const unsigned long sys32_call_table[];
>> extern const unsigned long sysn32_call_table[];
>> diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
>> index fff52205fb65..526449edd768 100644
>> --- a/arch/nios2/include/asm/syscall.h
>> +++ b/arch/nios2/include/asm/syscall.h
>> @@ -58,6 +58,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> *args = regs->r9;
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs, const unsigned long *args)
>> +{
>> + regs->r4 = *args++;
>> + regs->r5 = *args++;
>> + regs->r6 = *args++;
>> + regs->r7 = *args++;
>> + regs->r8 = *args++;
>> + regs->r9 = *args;
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> return AUDIT_ARCH_NIOS2;
>> diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
>> index 903ed882bdec..e6383be2a195 100644
>> --- a/arch/openrisc/include/asm/syscall.h
>> +++ b/arch/openrisc/include/asm/syscall.h
>> @@ -57,6 +57,13 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
>> memcpy(args, ®s->gpr[3], 6 * sizeof(args[0]));
>> }
>>
>> +static inline void
>> +syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> return AUDIT_ARCH_OPENRISC;
>> diff --git a/arch/parisc/include/asm/syscall.h b/arch/parisc/include/asm/syscall.h
>> index 00b127a5e09b..b146d0ae4c77 100644
>> --- a/arch/parisc/include/asm/syscall.h
>> +++ b/arch/parisc/include/asm/syscall.h
>> @@ -29,6 +29,18 @@ static inline void syscall_get_arguments(struct task_struct *tsk,
>> args[0] = regs->gr[26];
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *tsk,
>> + struct pt_regs *regs,
>> + unsigned long *args)
>> +{
>> + regs->gr[21] = args[5];
>> + regs->gr[22] = args[4];
>> + regs->gr[23] = args[3];
>> + regs->gr[24] = args[2];
>> + regs->gr[25] = args[1];
>> + regs->gr[26] = args[0];
>> +}
>> +
>> static inline long syscall_get_error(struct task_struct *task,
>> struct pt_regs *regs)
>> {
>> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
>> index 422d7735ace6..521f279e6b33 100644
>> --- a/arch/powerpc/include/asm/syscall.h
>> +++ b/arch/powerpc/include/asm/syscall.h
>> @@ -114,6 +114,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> }
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + memcpy(®s->gpr[3], args, 6 * sizeof(args[0]));
>> +
>> + /* Also copy the first argument into orig_gpr3 */
>> + regs->orig_gpr3 = args[0];
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> if (is_tsk_32bit_task(task))
>> diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
>> index 121fff429dce..8d389ba995c8 100644
>> --- a/arch/riscv/include/asm/syscall.h
>> +++ b/arch/riscv/include/asm/syscall.h
>> @@ -66,6 +66,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> memcpy(args, ®s->a1, 5 * sizeof(args[0]));
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + regs->orig_a0 = args[0];
>> + args++;
>> + memcpy(®s->a1, args, 5 * sizeof(regs->a1));
>> +}
>
> Looks good for riscv.
>
> Tested-by: Charlie Jenkins <charlie@rivosinc.com>
> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com
>
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> #ifdef CONFIG_64BIT
>> diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
>> index 27e3d804b311..b3dd883699e7 100644
>> --- a/arch/s390/include/asm/syscall.h
>> +++ b/arch/s390/include/asm/syscall.h
>> @@ -78,6 +78,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> args[0] = regs->orig_gpr2 & mask;
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + unsigned int n = 6;
>> +
>> + while (n-- > 0)
>> + if (n > 0)
>> + regs->gprs[2 + n] = args[n];
>> + regs->orig_gpr2 = args[0];
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> #ifdef CONFIG_COMPAT
>> diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
>> index d87738eebe30..cb51a7528384 100644
>> --- a/arch/sh/include/asm/syscall_32.h
>> +++ b/arch/sh/include/asm/syscall_32.h
>> @@ -57,6 +57,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> args[0] = regs->regs[4];
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + regs->regs[1] = args[5];
>> + regs->regs[0] = args[4];
>> + regs->regs[7] = args[3];
>> + regs->regs[6] = args[2];
>> + regs->regs[5] = args[1];
>> + regs->regs[4] = args[0];
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> int arch = AUDIT_ARCH_SH;
>> diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
>> index 20c109ac8cc9..62a5a78804c4 100644
>> --- a/arch/sparc/include/asm/syscall.h
>> +++ b/arch/sparc/include/asm/syscall.h
>> @@ -117,6 +117,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> }
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + unsigned int i;
>> +
>> + for (i = 0; i < 6; i++)
>> + regs->u_regs[UREG_I0 + i] = args[i];
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> #if defined(CONFIG_SPARC64) && defined(CONFIG_COMPAT)
>> diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
>> index 172b74143c4b..2984feb9d576 100644
>> --- a/arch/um/include/asm/syscall-generic.h
>> +++ b/arch/um/include/asm/syscall-generic.h
>> @@ -62,6 +62,20 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> *args = UPT_SYSCALL_ARG6(r);
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + struct uml_pt_regs *r = ®s->regs;
>> +
>> + UPT_SYSCALL_ARG1(r) = *args++;
>> + UPT_SYSCALL_ARG2(r) = *args++;
>> + UPT_SYSCALL_ARG3(r) = *args++;
>> + UPT_SYSCALL_ARG4(r) = *args++;
>> + UPT_SYSCALL_ARG5(r) = *args++;
>> + UPT_SYSCALL_ARG6(r) = *args;
>> +}
>> +
>> /* See arch/x86/um/asm/syscall.h for syscall_get_arch() definition. */
>>
>> #endif /* __UM_SYSCALL_GENERIC_H */
>> diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
>> index 7c488ff0c764..b9c249dd9e3d 100644
>> --- a/arch/x86/include/asm/syscall.h
>> +++ b/arch/x86/include/asm/syscall.h
>> @@ -90,6 +90,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> args[5] = regs->bp;
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + regs->bx = args[0];
>> + regs->cx = args[1];
>> + regs->dx = args[2];
>> + regs->si = args[3];
>> + regs->di = args[4];
>> + regs->bp = args[5];
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> return AUDIT_ARCH_I386;
>> @@ -121,6 +133,30 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> }
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> +# ifdef CONFIG_IA32_EMULATION
>> + if (task->thread_info.status & TS_COMPAT) {
>> + regs->bx = *args++;
>> + regs->cx = *args++;
>> + regs->dx = *args++;
>> + regs->si = *args++;
>> + regs->di = *args++;
>> + regs->bp = *args;
>> + } else
>> +# endif
>> + {
>> + regs->di = *args++;
>> + regs->si = *args++;
>> + regs->dx = *args++;
>> + regs->r10 = *args++;
>> + regs->r8 = *args++;
>> + regs->r9 = *args;
>> + }
>> +}
>> +
>> static inline int syscall_get_arch(struct task_struct *task)
>> {
>> /* x32 tasks should be considered AUDIT_ARCH_X86_64. */
>> diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
>> index 5ee974bf8330..f9a671cbf933 100644
>> --- a/arch/xtensa/include/asm/syscall.h
>> +++ b/arch/xtensa/include/asm/syscall.h
>> @@ -68,6 +68,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
>> args[i] = regs->areg[reg[i]];
>> }
>>
>> +static inline void syscall_set_arguments(struct task_struct *task,
>> + struct pt_regs *regs,
>> + const unsigned long *args)
>> +{
>> + static const unsigned int reg[] = XTENSA_SYSCALL_ARGUMENT_REGS;
>> + unsigned int i;
>> +
>> + for (i = 0; i < 6; ++i)
>> + regs->areg[reg[i]] = args[i];
>> +}
>> +
>> asmlinkage long xtensa_rt_sigreturn(void);
>> asmlinkage long xtensa_shmat(int, char __user *, int);
>> asmlinkage long xtensa_fadvise64_64(int, int,
>> diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
>> index 5a80fe728dc8..0f7b9a493de7 100644
>> --- a/include/asm-generic/syscall.h
>> +++ b/include/asm-generic/syscall.h
>> @@ -117,6 +117,22 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
>> void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
>> unsigned long *args);
>>
>> +/**
>> + * syscall_set_arguments - change system call parameter value
>> + * @task: task of interest, must be in system call entry tracing
>> + * @regs: task_pt_regs() of @task
>> + * @args: array of argument values to store
>> + *
>> + * Changes 6 arguments to the system call.
>> + * The first argument gets value @args[0], and so on.
>> + *
>> + * It's only valid to call this when @task is stopped for tracing on
>> + * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
>> + * %SYSCALL_WORK_SYSCALL_AUDIT.
>> + */
>> +void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
>> + const unsigned long *args);
>> +
>> /**
>> * syscall_get_arch - return the AUDIT_ARCH for the current system call
>> * @task: task of interest, must be blocked
>> --
>> ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
2025-01-17 0:59 ` H. Peter Anvin
@ 2025-01-17 15:45 ` Eugene Syromyatnikov
2025-01-18 4:34 ` H. Peter Anvin
0 siblings, 1 reply; 39+ messages in thread
From: Eugene Syromyatnikov @ 2025-01-17 15:45 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Charlie Jenkins, Dmitry V. Levin, Oleg Nesterov, Mike Frysinger,
Renzo Davoli, Davide Berardi, strace-devel, Vineet Gupta,
Russell King, Will Deacon, Guo Ren, Brian Cain, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
David S. Miller, Andreas Larsson, Richard Weinberger,
Anton Ivanov, Johannes Berg, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Chris Zankel, Max Filippov,
Arnd Bergmann, linux-snps-arc, linux-kernel, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-mips, linux-openrisc,
linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
sparclinux, linux-um, linux-arch
On Fri, Jan 17, 2025 at 2:03 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> I link the concept of this patchset, but *please* make it clear in the
> comments that this does not solve the issue of 64-bit kernel arguments
> on 32-bit systems being ABI specific.
Sorry, but I don't see how this is relevant; each architecture has its
own ABI with its own set of peculiarities, and there's a lot of
(completely unrelated) work needed in order to make an ABI that is
architecture-agnostic. All this patch set does is provides a
consistent way to manipulate scno and args across architectures; it
doesn't address the fact that some architectures have mmap2/mmap_pgoff
syscall, or that some have fadvise64_64 in addition to fadvise64, or
the existence of clone2, or socketcall, or ipc; or that some
architectures don't have open or stat; or that scnos on different
architectures or even different bit-widths within the "same"
architecture are different.
> This isn't unique to this patch in any way; the only way to handle it is
> by keeping track of each ABI.
That's true, but this patch doesn't even try to address that.
--
Eugene Syromyatnikov
mailto:evgsyr@gmail.com
xmpp:esyr@jabber.{ru|org}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value()
2025-01-17 15:45 ` Eugene Syromyatnikov
@ 2025-01-18 4:34 ` H. Peter Anvin
0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2025-01-18 4:34 UTC (permalink / raw)
To: Eugene Syromyatnikov
Cc: Charlie Jenkins, Dmitry V. Levin, Oleg Nesterov, Mike Frysinger,
Renzo Davoli, Davide Berardi, strace-devel, Vineet Gupta,
Russell King, Will Deacon, Guo Ren, Brian Cain, Huacai Chen,
WANG Xuerui, Thomas Bogendoerfer, Dinh Nguyen, Jonas Bonn,
Stefan Kristiansson, Stafford Horne, James E.J. Bottomley,
Helge Deller, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Naveen N Rao, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz,
David S. Miller, Andreas Larsson, Richard Weinberger,
Anton Ivanov, Johannes Berg, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Chris Zankel, Max Filippov,
Arnd Bergmann, linux-snps-arc, linux-kernel, linux-arm-kernel,
linux-csky, linux-hexagon, loongarch, linux-mips, linux-openrisc,
linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-sh,
sparclinux, linux-um, linux-arch
On January 17, 2025 7:45:02 AM PST, Eugene Syromyatnikov <evgsyr@gmail.com> wrote:
>On Fri, Jan 17, 2025 at 2:03 AM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> I link the concept of this patchset, but *please* make it clear in the
>> comments that this does not solve the issue of 64-bit kernel arguments
>> on 32-bit systems being ABI specific.
>
>Sorry, but I don't see how this is relevant; each architecture has its
>own ABI with its own set of peculiarities, and there's a lot of
>(completely unrelated) work needed in order to make an ABI that is
>architecture-agnostic. All this patch set does is provides a
>consistent way to manipulate scno and args across architectures; it
>doesn't address the fact that some architectures have mmap2/mmap_pgoff
>syscall, or that some have fadvise64_64 in addition to fadvise64, or
>the existence of clone2, or socketcall, or ipc; or that some
>architectures don't have open or stat; or that scnos on different
>architectures or even different bit-widths within the "same"
>architecture are different.
>
>> This isn't unique to this patch in any way; the only way to handle it is
>> by keeping track of each ABI.
>
>That's true, but this patch doesn't even try to address that.
>
I just want it noted in the comment, that's all.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-14 17:04 ` Dmitry V. Levin
@ 2025-01-20 13:51 ` Christophe Leroy
2025-01-20 17:12 ` Dmitry V. Levin
2025-01-23 18:28 ` Dmitry V. Levin
0 siblings, 2 replies; 39+ messages in thread
From: Christophe Leroy @ 2025-01-20 13:51 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>
>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>> syscall_set_return_value()").
>>
>> There is a clear detailed explanation in that commit of why it needs to
>> be done.
>>
>> If you think that commit is wrong you have to explain why with at least
>> the same level of details.
>
> OK, please have a look whether this explanation is clear and detailed enough:
>
> =======
> powerpc: properly negate error in syscall_set_return_value()
>
> When syscall_set_return_value() is used to set an error code, the caller
> specifies it as a negative value in -ERRORCODE form.
>
> In !trap_is_scv case the error code is traditionally stored as follows:
> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> Here are a few examples to illustrate this convention. The first one
> is from syscall_get_error():
> /*
> * If the system call failed,
> * regs->gpr[3] contains a positive ERRORCODE.
> */
> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>
> The second example is from regs_return_value():
> if (is_syscall_success(regs))
> return regs->gpr[3];
> else
> return -regs->gpr[3];
>
> The third example is from check_syscall_restart():
> regs->result = -EINTR;
> regs->gpr[3] = EINTR;
> regs->ccr |= 0x10000000;
>
> Compared with these examples, the failure of syscall_set_return_value()
> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> /*
> * In the general case it's not obvious that we must deal with
> * CCR here, as the syscall exit path will also do that for us.
> * However there are some places, eg. the signal code, which
> * check ccr to decide if the value in r3 is actually an error.
> */
> if (error) {
> regs->ccr |= 0x10000000L;
> regs->gpr[3] = error;
> } else {
> regs->ccr &= ~0x10000000L;
> regs->gpr[3] = val;
> }
>
> This fix brings syscall_set_return_value() in sync with syscall_get_error()
> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>
> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> =======
>
>
I think there is still something going wrong.
do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
Then it calls __secure_computing() which returns what __seccomp_filter()
returns.
In case of error, __seccomp_filter() calls syscall_set_return_value()
with a negative value then returns -1
do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
do_seccomp() doesn't return 0.
do_syscall_trace_enter() is called by system_call_exception() and
returns -1, so syscall_exception() returns regs->gpr[3]
In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
called with the return of syscall_exception() as first parameter, which
leads to:
if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
r3 = -r3;
regs->ccr |= 0x10000000; /* Set SO bit in CR */
}
}
By chance, because you have already changed the sign of gpr[3], the
above test fails and nothing is done to r3, and because you have also
already set regs->ccr it works.
But all this looks inconsistent with the fact that do_seccomp sets
-ENOSYS as default value
Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
syscall number and when it is wrong it goes to skip: which sets
regs->gpr[3] = -ENOSYS;
So really I think it is not in line with your changes to set positive
value in gpr[3].
Maybe your change is still correct but it needs to be handled completely
in that case.
Christophe
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-20 13:51 ` Christophe Leroy
@ 2025-01-20 17:12 ` Dmitry V. Levin
2025-01-21 11:13 ` Madhavan Srinivasan
2025-01-23 18:28 ` Dmitry V. Levin
1 sibling, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-20 17:12 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >>>
> >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> >>> syscall_set_return_value()").
> >>
> >> There is a clear detailed explanation in that commit of why it needs to
> >> be done.
> >>
> >> If you think that commit is wrong you have to explain why with at least
> >> the same level of details.
> >
> > OK, please have a look whether this explanation is clear and detailed enough:
> >
> > =======
> > powerpc: properly negate error in syscall_set_return_value()
> >
> > When syscall_set_return_value() is used to set an error code, the caller
> > specifies it as a negative value in -ERRORCODE form.
> >
> > In !trap_is_scv case the error code is traditionally stored as follows:
> > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> > Here are a few examples to illustrate this convention. The first one
> > is from syscall_get_error():
> > /*
> > * If the system call failed,
> > * regs->gpr[3] contains a positive ERRORCODE.
> > */
> > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> >
> > The second example is from regs_return_value():
> > if (is_syscall_success(regs))
> > return regs->gpr[3];
> > else
> > return -regs->gpr[3];
> >
> > The third example is from check_syscall_restart():
> > regs->result = -EINTR;
> > regs->gpr[3] = EINTR;
> > regs->ccr |= 0x10000000;
> >
> > Compared with these examples, the failure of syscall_set_return_value()
> > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > /*
> > * In the general case it's not obvious that we must deal with
> > * CCR here, as the syscall exit path will also do that for us.
> > * However there are some places, eg. the signal code, which
> > * check ccr to decide if the value in r3 is actually an error.
> > */
> > if (error) {
> > regs->ccr |= 0x10000000L;
> > regs->gpr[3] = error;
> > } else {
> > regs->ccr &= ~0x10000000L;
> > regs->gpr[3] = val;
> > }
> >
> > This fix brings syscall_set_return_value() in sync with syscall_get_error()
> > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >
> > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> > =======
> >
> >
>
> I think there is still something going wrong.
>
> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>
> Then it calls __secure_computing() which returns what __seccomp_filter()
> returns.
>
> In case of error, __seccomp_filter() calls syscall_set_return_value()
> with a negative value then returns -1
>
> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
> do_seccomp() doesn't return 0.
>
> do_syscall_trace_enter() is called by system_call_exception() and
> returns -1, so syscall_exception() returns regs->gpr[3]
>
> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
> called with the return of syscall_exception() as first parameter, which
> leads to:
>
> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> r3 = -r3;
> regs->ccr |= 0x10000000; /* Set SO bit in CR */
> }
> }
Note the "unlikely" keyword here reminding us once more that in !scv case
regs->gpr[3] does not normally have -ERRORCODE form.
> By chance, because you have already changed the sign of gpr[3], the
> above test fails and nothing is done to r3, and because you have also
> already set regs->ccr it works.
>
> But all this looks inconsistent with the fact that do_seccomp sets
> -ENOSYS as default value
>
> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
> syscall number and when it is wrong it goes to skip: which sets
> regs->gpr[3] = -ENOSYS;
It looks like do_seccomp() and do_syscall_trace_enter() get away by sheer
luck, implicitly relying on syscall_exit_prepare() transparently fixing
regs->gpr[3] for them.
> So really I think it is not in line with your changes to set positive
> value in gpr[3].
>
> Maybe your change is still correct but it needs to be handled completely
> in that case.
By the way, is there any reasons why do_seccomp() and
do_syscall_trace_enter() don't use syscall_set_return_value() yet?
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-20 17:12 ` Dmitry V. Levin
@ 2025-01-21 11:13 ` Madhavan Srinivasan
2025-01-21 11:28 ` Christophe Leroy
0 siblings, 1 reply; 39+ messages in thread
From: Madhavan Srinivasan @ 2025-01-21 11:13 UTC (permalink / raw)
To: Dmitry V. Levin, Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Nicholas Piggin, Naveen N Rao,
linuxppc-dev, linux-kernel
On 1/20/25 10:42 PM, Dmitry V. Levin wrote:
> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>
Sorry for getting to this thread late.
Tried the series without this patch in
1) power9 PowerNV system and in power10 pSeries lpar
# ./set_syscall_info
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN global.set_syscall_info ...
# OK global.set_syscall_info
ok 1 global.set_syscall_info
# PASSED: 1 / 1 tests passed.
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
and in both case set_syscall_info passes.
Will look at it further.
Maddy
>>>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>>>> syscall_set_return_value()").
>>>>
>>>> There is a clear detailed explanation in that commit of why it needs to
>>>> be done.
>>>>
>>>> If you think that commit is wrong you have to explain why with at least
>>>> the same level of details.
>>>
>>> OK, please have a look whether this explanation is clear and detailed enough:
>>>
>>> =======
>>> powerpc: properly negate error in syscall_set_return_value()
>>>
>>> When syscall_set_return_value() is used to set an error code, the caller
>>> specifies it as a negative value in -ERRORCODE form.
>>>
>>> In !trap_is_scv case the error code is traditionally stored as follows:
>>> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>>> Here are a few examples to illustrate this convention. The first one
>>> is from syscall_get_error():
>>> /*
>>> * If the system call failed,
>>> * regs->gpr[3] contains a positive ERRORCODE.
>>> */
>>> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>>>
>>> The second example is from regs_return_value():
>>> if (is_syscall_success(regs))
>>> return regs->gpr[3];
>>> else
>>> return -regs->gpr[3];
>>>
>>> The third example is from check_syscall_restart():
>>> regs->result = -EINTR;
>>> regs->gpr[3] = EINTR;
>>> regs->ccr |= 0x10000000;
>>>
>>> Compared with these examples, the failure of syscall_set_return_value()
>>> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>>> /*
>>> * In the general case it's not obvious that we must deal with
>>> * CCR here, as the syscall exit path will also do that for us.
>>> * However there are some places, eg. the signal code, which
>>> * check ccr to decide if the value in r3 is actually an error.
>>> */
>>> if (error) {
>>> regs->ccr |= 0x10000000L;
>>> regs->gpr[3] = error;
>>> } else {
>>> regs->ccr &= ~0x10000000L;
>>> regs->gpr[3] = val;
>>> }
>>>
>>> This fix brings syscall_set_return_value() in sync with syscall_get_error()
>>> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>
>>> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>>> =======
>>>
>>>
>>
>> I think there is still something going wrong.
>>
>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>
>> Then it calls __secure_computing() which returns what __seccomp_filter()
>> returns.
>>
>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>> with a negative value then returns -1
>>
>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>> do_seccomp() doesn't return 0.
>>
>> do_syscall_trace_enter() is called by system_call_exception() and
>> returns -1, so syscall_exception() returns regs->gpr[3]
>>
>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>> called with the return of syscall_exception() as first parameter, which
>> leads to:
>>
>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>> r3 = -r3;
>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>> }
>> }
>
> Note the "unlikely" keyword here reminding us once more that in !scv case
> regs->gpr[3] does not normally have -ERRORCODE form.
>
>> By chance, because you have already changed the sign of gpr[3], the
>> above test fails and nothing is done to r3, and because you have also
>> already set regs->ccr it works.
>>
>> But all this looks inconsistent with the fact that do_seccomp sets
>> -ENOSYS as default value
>>
>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>> syscall number and when it is wrong it goes to skip: which sets
>> regs->gpr[3] = -ENOSYS;
>
> It looks like do_seccomp() and do_syscall_trace_enter() get away by sheer
> luck, implicitly relying on syscall_exit_prepare() transparently fixing
> regs->gpr[3] for them.
>
>> So really I think it is not in line with your changes to set positive
>> value in gpr[3].
>>
>> Maybe your change is still correct but it needs to be handled completely
>> in that case.
>
> By the way, is there any reasons why do_seccomp() and
> do_syscall_trace_enter() don't use syscall_set_return_value() yet?
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-21 11:13 ` Madhavan Srinivasan
@ 2025-01-21 11:28 ` Christophe Leroy
2025-01-21 12:25 ` Madhavan Srinivasan
0 siblings, 1 reply; 39+ messages in thread
From: Christophe Leroy @ 2025-01-21 11:28 UTC (permalink / raw)
To: Madhavan Srinivasan, Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Nicholas Piggin, Naveen N Rao,
linuxppc-dev, linux-kernel
Le 21/01/2025 à 12:13, Madhavan Srinivasan a écrit :
>
>
> On 1/20/25 10:42 PM, Dmitry V. Levin wrote:
>> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>>
>
> Sorry for getting to this thread late.
>
> Tried the series without this patch in
>
> 1) power9 PowerNV system and in power10 pSeries lpar
>
> # ./set_syscall_info
> TAP version 13
> 1..1
> # Starting 1 tests from 1 test cases.
> # RUN global.set_syscall_info ...
> # OK global.set_syscall_info
> ok 1 global.set_syscall_info
> # PASSED: 1 / 1 tests passed.
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> and in both case set_syscall_info passes.
> Will look at it further.
I guess it works because power9/10 are using scv not sc for system call,
hence using the new ABI ?
Christophe
>
> Maddy
>
>>>>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>>>>> syscall_set_return_value()").
>>>>>
>>>>> There is a clear detailed explanation in that commit of why it needs to
>>>>> be done.
>>>>>
>>>>> If you think that commit is wrong you have to explain why with at least
>>>>> the same level of details.
>>>>
>>>> OK, please have a look whether this explanation is clear and detailed enough:
>>>>
>>>> =======
>>>> powerpc: properly negate error in syscall_set_return_value()
>>>>
>>>> When syscall_set_return_value() is used to set an error code, the caller
>>>> specifies it as a negative value in -ERRORCODE form.
>>>>
>>>> In !trap_is_scv case the error code is traditionally stored as follows:
>>>> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>>>> Here are a few examples to illustrate this convention. The first one
>>>> is from syscall_get_error():
>>>> /*
>>>> * If the system call failed,
>>>> * regs->gpr[3] contains a positive ERRORCODE.
>>>> */
>>>> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>>>>
>>>> The second example is from regs_return_value():
>>>> if (is_syscall_success(regs))
>>>> return regs->gpr[3];
>>>> else
>>>> return -regs->gpr[3];
>>>>
>>>> The third example is from check_syscall_restart():
>>>> regs->result = -EINTR;
>>>> regs->gpr[3] = EINTR;
>>>> regs->ccr |= 0x10000000;
>>>>
>>>> Compared with these examples, the failure of syscall_set_return_value()
>>>> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>>>> /*
>>>> * In the general case it's not obvious that we must deal with
>>>> * CCR here, as the syscall exit path will also do that for us.
>>>> * However there are some places, eg. the signal code, which
>>>> * check ccr to decide if the value in r3 is actually an error.
>>>> */
>>>> if (error) {
>>>> regs->ccr |= 0x10000000L;
>>>> regs->gpr[3] = error;
>>>> } else {
>>>> regs->ccr &= ~0x10000000L;
>>>> regs->gpr[3] = val;
>>>> }
>>>>
>>>> This fix brings syscall_set_return_value() in sync with syscall_get_error()
>>>> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>
>>>> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>>>> =======
>>>>
>>>>
>>>
>>> I think there is still something going wrong.
>>>
>>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>>
>>> Then it calls __secure_computing() which returns what __seccomp_filter()
>>> returns.
>>>
>>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>>> with a negative value then returns -1
>>>
>>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>>> do_seccomp() doesn't return 0.
>>>
>>> do_syscall_trace_enter() is called by system_call_exception() and
>>> returns -1, so syscall_exception() returns regs->gpr[3]
>>>
>>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>>> called with the return of syscall_exception() as first parameter, which
>>> leads to:
>>>
>>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>>> r3 = -r3;
>>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>>> }
>>> }
>>
>> Note the "unlikely" keyword here reminding us once more that in !scv case
>> regs->gpr[3] does not normally have -ERRORCODE form.
>>
>>> By chance, because you have already changed the sign of gpr[3], the
>>> above test fails and nothing is done to r3, and because you have also
>>> already set regs->ccr it works.
>>>
>>> But all this looks inconsistent with the fact that do_seccomp sets
>>> -ENOSYS as default value
>>>
>>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>>> syscall number and when it is wrong it goes to skip: which sets
>>> regs->gpr[3] = -ENOSYS;
>>
>> It looks like do_seccomp() and do_syscall_trace_enter() get away by sheer
>> luck, implicitly relying on syscall_exit_prepare() transparently fixing
>> regs->gpr[3] for them.
>>
>>> So really I think it is not in line with your changes to set positive
>>> value in gpr[3].
>>>
>>> Maybe your change is still correct but it needs to be handled completely
>>> in that case.
>>
>> By the way, is there any reasons why do_seccomp() and
>> do_syscall_trace_enter() don't use syscall_set_return_value() yet?
>>
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-21 11:28 ` Christophe Leroy
@ 2025-01-21 12:25 ` Madhavan Srinivasan
2025-01-21 12:42 ` Dmitry V. Levin
0 siblings, 1 reply; 39+ messages in thread
From: Madhavan Srinivasan @ 2025-01-21 12:25 UTC (permalink / raw)
To: Christophe Leroy, Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Nicholas Piggin, Naveen N Rao,
linuxppc-dev, linux-kernel
On 1/21/25 4:58 PM, Christophe Leroy wrote:
>
>
> Le 21/01/2025 à 12:13, Madhavan Srinivasan a écrit :
>>
>>
>> On 1/20/25 10:42 PM, Dmitry V. Levin wrote:
>>> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>>>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>>>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>>>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>>>
>>
>> Sorry for getting to this thread late.
>>
>> Tried the series without this patch in
>>
>> 1) power9 PowerNV system and in power10 pSeries lpar
>>
>> # ./set_syscall_info
>> TAP version 13
>> 1..1
>> # Starting 1 tests from 1 test cases.
>> # RUN global.set_syscall_info ...
>> # OK global.set_syscall_info
>> ok 1 global.set_syscall_info
>> # PASSED: 1 / 1 tests passed.
>> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>
>> and in both case set_syscall_info passes.
>> Will look at it further.
>
> I guess it works because power9/10 are using scv not sc for system call, hence using the new ABI ?
>
yeah, I guess.
This is from the a Power8 pSeries lpar without this patch
# ./set_syscall_info
TAP version 13
1..1
# Starting 1 tests from 1 test cases.
# RUN global.set_syscall_info ...
# set_syscall_info.c:428:set_syscall_info:wait #5: unexpected stop signal 11
# set_syscall_info: Test terminated by assertion
# FAIL global.set_syscall_info
not ok 1 global.set_syscall_info
# FAILED: 0 / 1 tests passed.
# Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
Maddy
> Christophe
>
>>
>> Maddy
>>
>>>>>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>>>>>> syscall_set_return_value()").
>>>>>>
>>>>>> There is a clear detailed explanation in that commit of why it needs to
>>>>>> be done.
>>>>>>
>>>>>> If you think that commit is wrong you have to explain why with at least
>>>>>> the same level of details.
>>>>>
>>>>> OK, please have a look whether this explanation is clear and detailed enough:
>>>>>
>>>>> =======
>>>>> powerpc: properly negate error in syscall_set_return_value()
>>>>>
>>>>> When syscall_set_return_value() is used to set an error code, the caller
>>>>> specifies it as a negative value in -ERRORCODE form.
>>>>>
>>>>> In !trap_is_scv case the error code is traditionally stored as follows:
>>>>> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>>>>> Here are a few examples to illustrate this convention. The first one
>>>>> is from syscall_get_error():
>>>>> /*
>>>>> * If the system call failed,
>>>>> * regs->gpr[3] contains a positive ERRORCODE.
>>>>> */
>>>>> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>>>>>
>>>>> The second example is from regs_return_value():
>>>>> if (is_syscall_success(regs))
>>>>> return regs->gpr[3];
>>>>> else
>>>>> return -regs->gpr[3];
>>>>>
>>>>> The third example is from check_syscall_restart():
>>>>> regs->result = -EINTR;
>>>>> regs->gpr[3] = EINTR;
>>>>> regs->ccr |= 0x10000000;
>>>>>
>>>>> Compared with these examples, the failure of syscall_set_return_value()
>>>>> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>>>>> /*
>>>>> * In the general case it's not obvious that we must deal with
>>>>> * CCR here, as the syscall exit path will also do that for us.
>>>>> * However there are some places, eg. the signal code, which
>>>>> * check ccr to decide if the value in r3 is actually an error.
>>>>> */
>>>>> if (error) {
>>>>> regs->ccr |= 0x10000000L;
>>>>> regs->gpr[3] = error;
>>>>> } else {
>>>>> regs->ccr &= ~0x10000000L;
>>>>> regs->gpr[3] = val;
>>>>> }
>>>>>
>>>>> This fix brings syscall_set_return_value() in sync with syscall_get_error()
>>>>> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>
>>>>> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>>>>> =======
>>>>>
>>>>>
>>>>
>>>> I think there is still something going wrong.
>>>>
>>>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>>>
>>>> Then it calls __secure_computing() which returns what __seccomp_filter()
>>>> returns.
>>>>
>>>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>>>> with a negative value then returns -1
>>>>
>>>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>>>> do_seccomp() doesn't return 0.
>>>>
>>>> do_syscall_trace_enter() is called by system_call_exception() and
>>>> returns -1, so syscall_exception() returns regs->gpr[3]
>>>>
>>>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>>>> called with the return of syscall_exception() as first parameter, which
>>>> leads to:
>>>>
>>>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>>>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>>>> r3 = -r3;
>>>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>>>> }
>>>> }
>>>
>>> Note the "unlikely" keyword here reminding us once more that in !scv case
>>> regs->gpr[3] does not normally have -ERRORCODE form.
>>>
>>>> By chance, because you have already changed the sign of gpr[3], the
>>>> above test fails and nothing is done to r3, and because you have also
>>>> already set regs->ccr it works.
>>>>
>>>> But all this looks inconsistent with the fact that do_seccomp sets
>>>> -ENOSYS as default value
>>>>
>>>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>>>> syscall number and when it is wrong it goes to skip: which sets
>>>> regs->gpr[3] = -ENOSYS;
>>>
>>> It looks like do_seccomp() and do_syscall_trace_enter() get away by sheer
>>> luck, implicitly relying on syscall_exit_prepare() transparently fixing
>>> regs->gpr[3] for them.
>>>
>>>> So really I think it is not in line with your changes to set positive
>>>> value in gpr[3].
>>>>
>>>> Maybe your change is still correct but it needs to be handled completely
>>>> in that case.
>>>
>>> By the way, is there any reasons why do_seccomp() and
>>> do_syscall_trace_enter() don't use syscall_set_return_value() yet?
>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-21 12:25 ` Madhavan Srinivasan
@ 2025-01-21 12:42 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-21 12:42 UTC (permalink / raw)
To: Madhavan Srinivasan
Cc: Christophe Leroy, Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Nicholas Piggin, Naveen N Rao,
linuxppc-dev, linux-kernel
On Tue, Jan 21, 2025 at 05:55:40PM +0530, Madhavan Srinivasan wrote:
> On 1/21/25 4:58 PM, Christophe Leroy wrote:
> > Le 21/01/2025 à 12:13, Madhavan Srinivasan a écrit :
> >> On 1/20/25 10:42 PM, Dmitry V. Levin wrote:
> >>> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> >>>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> >>>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> >>>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> >>>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> >>>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >>
> >> Sorry for getting to this thread late.
> >>
> >> Tried the series without this patch in
> >>
> >> 1) power9 PowerNV system and in power10 pSeries lpar
> >>
> >> # ./set_syscall_info
> >> TAP version 13
> >> 1..1
> >> # Starting 1 tests from 1 test cases.
> >> # RUN global.set_syscall_info ...
> >> # OK global.set_syscall_info
> >> ok 1 global.set_syscall_info
> >> # PASSED: 1 / 1 tests passed.
> >> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >>
> >> and in both case set_syscall_info passes.
> >> Will look at it further.
> >
> > I guess it works because power9/10 are using scv not sc for system call, hence using the new ABI ?
>
> yeah, I guess.
> This is from the a Power8 pSeries lpar without this patch
>
> # ./set_syscall_info
> TAP version 13
> 1..1
> # Starting 1 tests from 1 test cases.
> # RUN global.set_syscall_info ...
> # set_syscall_info.c:428:set_syscall_info:wait #5: unexpected stop signal 11
> # set_syscall_info: Test terminated by assertion
> # FAIL global.set_syscall_info
> not ok 1 global.set_syscall_info
> # FAILED: 0 / 1 tests passed.
> # Totals: pass:0 fail:1 xfail:0 xpass:0 skip:0 error:0
I've enhanced error diagnostics of the test a bit. Inspired by this
powerpc bug, in the next iteration of the patchset the test would also
invoke PTRACE_GET_SYSCALL_INFO right after PTRACE_SET_SYSCALL_INFO to
check whether the changes are applied by the kernel correctly.
Without the fix, in non-svc case the test would complain this way:
# set_syscall_info.c:119:set_syscall_info:Expected exp_exit->rval (-38) == info->exit.rval (38)
# set_syscall_info.c:120:set_syscall_info:wait #4: PTRACE_GET_SYSCALL_INFO #2: exit stop mismatch
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-20 13:51 ` Christophe Leroy
2025-01-20 17:12 ` Dmitry V. Levin
@ 2025-01-23 18:28 ` Dmitry V. Levin
2025-01-23 19:11 ` Eugene Syromyatnikov
` (3 more replies)
1 sibling, 4 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-23 18:28 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >>>
> >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> >>> syscall_set_return_value()").
> >>
> >> There is a clear detailed explanation in that commit of why it needs to
> >> be done.
> >>
> >> If you think that commit is wrong you have to explain why with at least
> >> the same level of details.
> >
> > OK, please have a look whether this explanation is clear and detailed enough:
> >
> > =======
> > powerpc: properly negate error in syscall_set_return_value()
> >
> > When syscall_set_return_value() is used to set an error code, the caller
> > specifies it as a negative value in -ERRORCODE form.
> >
> > In !trap_is_scv case the error code is traditionally stored as follows:
> > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> > Here are a few examples to illustrate this convention. The first one
> > is from syscall_get_error():
> > /*
> > * If the system call failed,
> > * regs->gpr[3] contains a positive ERRORCODE.
> > */
> > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> >
> > The second example is from regs_return_value():
> > if (is_syscall_success(regs))
> > return regs->gpr[3];
> > else
> > return -regs->gpr[3];
> >
> > The third example is from check_syscall_restart():
> > regs->result = -EINTR;
> > regs->gpr[3] = EINTR;
> > regs->ccr |= 0x10000000;
> >
> > Compared with these examples, the failure of syscall_set_return_value()
> > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > /*
> > * In the general case it's not obvious that we must deal with
> > * CCR here, as the syscall exit path will also do that for us.
> > * However there are some places, eg. the signal code, which
> > * check ccr to decide if the value in r3 is actually an error.
> > */
> > if (error) {
> > regs->ccr |= 0x10000000L;
> > regs->gpr[3] = error;
> > } else {
> > regs->ccr &= ~0x10000000L;
> > regs->gpr[3] = val;
> > }
> >
> > This fix brings syscall_set_return_value() in sync with syscall_get_error()
> > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> >
> > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> > =======
>
> I think there is still something going wrong.
>
> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>
> Then it calls __secure_computing() which returns what __seccomp_filter()
> returns.
>
> In case of error, __seccomp_filter() calls syscall_set_return_value()
> with a negative value then returns -1
>
> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
> do_seccomp() doesn't return 0.
>
> do_syscall_trace_enter() is called by system_call_exception() and
> returns -1, so syscall_exception() returns regs->gpr[3]
>
> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
> called with the return of syscall_exception() as first parameter, which
> leads to:
>
> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> r3 = -r3;
> regs->ccr |= 0x10000000; /* Set SO bit in CR */
> }
> }
>
> By chance, because you have already changed the sign of gpr[3], the
> above test fails and nothing is done to r3, and because you have also
> already set regs->ccr it works.
>
> But all this looks inconsistent with the fact that do_seccomp sets
> -ENOSYS as default value
>
> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
> syscall number and when it is wrong it goes to skip: which sets
> regs->gpr[3] = -ENOSYS;
>
> So really I think it is not in line with your changes to set positive
> value in gpr[3].
>
> Maybe your change is still correct but it needs to be handled completely
> in that case.
Indeed, there is an inconsistency in !trap_is_scv case.
In some places such as syscall_get_error() and regs_return_value() the
semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
therefore cannot be changed.
In some other places like do_seccomp() and do_syscall_trace_enter() the
semantics is similar to the trap_is_scv case: gpr[3] contains a negative
ERRORCODE and ccr is unchanged. In addition, system_call_exception()
returns the system call function return value when it is executed, and
gpr[3] otherwise. The value returned by system_call_exception() is passed
on to syscall_exit_prepare() which performs the conversion you mentioned.
What's remarkable is that in those places that are a part of the ABI the
traditional semantics is kept, while in other places the implementation
follows the trap_is_scv-like semantics, while traditional semantics is
also supported there.
The only case where I see some intersection is do_seccomp() where the
tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
is not the place where the tracer *reads* the system call exit status,
so whatever was written in gpr[3] before __secure_computing() is not
really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
this patch applied as well as without it.
After looking at system_call_exception() I doubt this inconsistency can be
easily avoided, so I don't see how this patch could be enhanced further,
and what else could I do with the patch besides dropping it and letting
!trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
would be unfortunate.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 18:28 ` Dmitry V. Levin
@ 2025-01-23 19:11 ` Eugene Syromyatnikov
2025-01-23 22:16 ` Dmitry V. Levin
2025-01-23 22:07 ` Christophe Leroy
` (2 subsequent siblings)
3 siblings, 1 reply; 39+ messages in thread
From: Eugene Syromyatnikov @ 2025-01-23 19:11 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Christophe Leroy, Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
On Thu, Jan 23, 2025 at 7:28 PM Dmitry V. Levin <ldv@strace.io> wrote:
> Indeed, there is an inconsistency in !trap_is_scv case.
>
> In some places such as syscall_get_error() and regs_return_value() the
> semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> therefore cannot be changed.
>
> In some other places like do_seccomp() and do_syscall_trace_enter() the
> semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> returns the system call function return value when it is executed, and
> gpr[3] otherwise. The value returned by system_call_exception() is passed
> on to syscall_exit_prepare() which performs the conversion you mentioned.
>
> What's remarkable is that in those places that are a part of the ABI the
> traditional semantics is kept, while in other places the implementation
> follows the trap_is_scv-like semantics, while traditional semantics is
> also supported there.
>
> The only case where I see some intersection is do_seccomp() where the
> tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> is not the place where the tracer *reads* the system call exit status,
> so whatever was written in gpr[3] before __secure_computing() is not
> really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> this patch applied as well as without it.
>
> After looking at system_call_exception() I doubt this inconsistency can be
> easily avoided, so I don't see how this patch could be enhanced further,
> and what else could I do with the patch besides dropping it and letting
> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> would be unfortunate.
The semantics of r3 on syscall return (including the negatedness of
the errno value) is documented in [1] (at least for the 64-bit case,
but I conjecture the 32-bit one is the same, sans the lack of the v2
ABI and scv there), so I would suggest to consider any deviation from
that a kernel programming error to be fixed.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arch/powerpc/syscall64-abi.rst?id=v6.13#n30
--
Eugene Syromyatnikov
mailto:evgsyr@gmail.com
xmpp:esyr@jabber.{ru|org}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 18:28 ` Dmitry V. Levin
2025-01-23 19:11 ` Eugene Syromyatnikov
@ 2025-01-23 22:07 ` Christophe Leroy
2025-01-23 22:35 ` Dmitry V. Levin
2025-01-27 11:20 ` Dmitry V. Levin
2025-01-23 23:43 ` Dmitry V. Levin
2025-01-25 12:17 ` Michael Ellerman
3 siblings, 2 replies; 39+ messages in thread
From: Christophe Leroy @ 2025-01-23 22:07 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
Le 23/01/2025 à 19:28, Dmitry V. Levin a écrit :
> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>>> On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>>>> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>>>>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>>>>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>>>
>>>>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>>>>> syscall_set_return_value()").
>>>>
>>>> There is a clear detailed explanation in that commit of why it needs to
>>>> be done.
>>>>
>>>> If you think that commit is wrong you have to explain why with at least
>>>> the same level of details.
>>>
>>> OK, please have a look whether this explanation is clear and detailed enough:
>>>
>>> =======
>>> powerpc: properly negate error in syscall_set_return_value()
>>>
>>> When syscall_set_return_value() is used to set an error code, the caller
>>> specifies it as a negative value in -ERRORCODE form.
>>>
>>> In !trap_is_scv case the error code is traditionally stored as follows:
>>> gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>>> Here are a few examples to illustrate this convention. The first one
>>> is from syscall_get_error():
>>> /*
>>> * If the system call failed,
>>> * regs->gpr[3] contains a positive ERRORCODE.
>>> */
>>> return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>>>
>>> The second example is from regs_return_value():
>>> if (is_syscall_success(regs))
>>> return regs->gpr[3];
>>> else
>>> return -regs->gpr[3];
>>>
>>> The third example is from check_syscall_restart():
>>> regs->result = -EINTR;
>>> regs->gpr[3] = EINTR;
>>> regs->ccr |= 0x10000000;
>>>
>>> Compared with these examples, the failure of syscall_set_return_value()
>>> to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>>> /*
>>> * In the general case it's not obvious that we must deal with
>>> * CCR here, as the syscall exit path will also do that for us.
>>> * However there are some places, eg. the signal code, which
>>> * check ccr to decide if the value in r3 is actually an error.
>>> */
>>> if (error) {
>>> regs->ccr |= 0x10000000L;
>>> regs->gpr[3] = error;
>>> } else {
>>> regs->ccr &= ~0x10000000L;
>>> regs->gpr[3] = val;
>>> }
>>>
>>> This fix brings syscall_set_return_value() in sync with syscall_get_error()
>>> and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>>>
>>> Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>>> =======
>>
>> I think there is still something going wrong.
>>
>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>
>> Then it calls __secure_computing() which returns what __seccomp_filter()
>> returns.
>>
>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>> with a negative value then returns -1
>>
>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>> do_seccomp() doesn't return 0.
>>
>> do_syscall_trace_enter() is called by system_call_exception() and
>> returns -1, so syscall_exception() returns regs->gpr[3]
>>
>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>> called with the return of syscall_exception() as first parameter, which
>> leads to:
>>
>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>> r3 = -r3;
>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>> }
>> }
>>
>> By chance, because you have already changed the sign of gpr[3], the
>> above test fails and nothing is done to r3, and because you have also
>> already set regs->ccr it works.
>>
>> But all this looks inconsistent with the fact that do_seccomp sets
>> -ENOSYS as default value
>>
>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>> syscall number and when it is wrong it goes to skip: which sets
>> regs->gpr[3] = -ENOSYS;
>>
>> So really I think it is not in line with your changes to set positive
>> value in gpr[3].
>>
>> Maybe your change is still correct but it needs to be handled completely
>> in that case.
>
> Indeed, there is an inconsistency in !trap_is_scv case.
>
> In some places such as syscall_get_error() and regs_return_value() the
> semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> therefore cannot be changed.
>
> In some other places like do_seccomp() and do_syscall_trace_enter() the
> semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> returns the system call function return value when it is executed, and
> gpr[3] otherwise. The value returned by system_call_exception() is passed
> on to syscall_exit_prepare() which performs the conversion you mentioned.
>
> What's remarkable is that in those places that are a part of the ABI the
> traditional semantics is kept, while in other places the implementation
> follows the trap_is_scv-like semantics, while traditional semantics is
> also supported there.
>
> The only case where I see some intersection is do_seccomp() where the
> tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> is not the place where the tracer *reads* the system call exit status,
> so whatever was written in gpr[3] before __secure_computing() is not
> really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> this patch applied as well as without it.
>
> After looking at system_call_exception() I doubt this inconsistency can be
> easily avoided, so I don't see how this patch could be enhanced further,
> and what else could I do with the patch besides dropping it and letting
> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> would be unfortunate.
>
>
To add a bit more to the confusion, a task can be flagged with
TIF_NOERROR by calling force_successful_syscall_return(), in which case
even if gpr[3] contains a negative between -MAX_ERRNO and -1 the syscall
will be handled as successfull hence CCR[SO] won't be set. But it seems
this is not handled by syscall_set_return_value(). So what will happen
with time() when approaching year 2036 for instance ?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 19:11 ` Eugene Syromyatnikov
@ 2025-01-23 22:16 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-23 22:16 UTC (permalink / raw)
To: Eugene Syromyatnikov
Cc: Christophe Leroy, Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
On Thu, Jan 23, 2025 at 08:11:44PM +0100, Eugene Syromyatnikov wrote:
> On Thu, Jan 23, 2025 at 7:28 PM Dmitry V. Levin <ldv@strace.io> wrote:
> > Indeed, there is an inconsistency in !trap_is_scv case.
> >
> > In some places such as syscall_get_error() and regs_return_value() the
> > semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> > and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> > therefore cannot be changed.
> >
> > In some other places like do_seccomp() and do_syscall_trace_enter() the
> > semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> > ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> > returns the system call function return value when it is executed, and
> > gpr[3] otherwise. The value returned by system_call_exception() is passed
> > on to syscall_exit_prepare() which performs the conversion you mentioned.
> >
> > What's remarkable is that in those places that are a part of the ABI the
> > traditional semantics is kept, while in other places the implementation
> > follows the trap_is_scv-like semantics, while traditional semantics is
> > also supported there.
> >
> > The only case where I see some intersection is do_seccomp() where the
> > tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> > is not the place where the tracer *reads* the system call exit status,
> > so whatever was written in gpr[3] before __secure_computing() is not
> > really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> > this patch applied as well as without it.
> >
> > After looking at system_call_exception() I doubt this inconsistency can be
> > easily avoided, so I don't see how this patch could be enhanced further,
> > and what else could I do with the patch besides dropping it and letting
> > !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> > would be unfortunate.
>
> The semantics of r3 on syscall return (including the negatedness of
> the errno value) is documented in [1] (at least for the 64-bit case,
> but I conjecture the 32-bit one is the same, sans the lack of the v2
> ABI and scv there), so I would suggest to consider any deviation from
> that a kernel programming error to be fixed.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/arch/powerpc/syscall64-abi.rst?id=v6.13#n30
The semantics of r3 on syscall return is correct, thanks to
syscall_exit_prepare() that performs necessary manipulations with gpr[3].
What's wrong on powerpc in !trap_is_scv case is that its current
implementation of syscall_set_return_value() follows a different semantics,
making it unusable on syscall return. While syscall_set_return_value() was
used only on entering syscall via do_seccomp(), it was not a problem yet.
It became a problem when we started to use it on syscall return, in the
same state when its sibling syscall_get_error() is used. Note that among
all the architectures in the kernel tree powerpc in !trap_is_scv case is
the only one that has this problem. My patch is intended to address this
without breaking anything else.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 22:07 ` Christophe Leroy
@ 2025-01-23 22:35 ` Dmitry V. Levin
2025-01-27 11:20 ` Dmitry V. Levin
1 sibling, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-23 22:35 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
[...]
> To add a bit more to the confusion, a task can be flagged with
> TIF_NOERROR by calling force_successful_syscall_return(), in which case
> even if gpr[3] contains a negative between -MAX_ERRNO and -1 the syscall
> will be handled as successfull hence CCR[SO] won't be set. But it seems
> this is not handled by syscall_set_return_value(). So what will happen
> with time() when approaching year 2036 for instance ?
syscall_set_return_value() takes both "int error" and "long val"
arguments. It doesn't and shouldn't take TIF_NOERROR into account.
With my patch applied, when it's called by PTRACE_SET_SYSCALL_INFO
from do_syscall_trace_leave(), it will properly update gpr[3] and ccr
regardless of TIF_NOERROR. If tracer wants to set an error status for
a syscall that cannot return an error, it's up to the tracer to face the
consequences. Tracers can do it now via PTRACE_SETREGS* anyway.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 18:28 ` Dmitry V. Levin
2025-01-23 19:11 ` Eugene Syromyatnikov
2025-01-23 22:07 ` Christophe Leroy
@ 2025-01-23 23:43 ` Dmitry V. Levin
2025-01-24 15:18 ` Alexey Gladkov
2025-01-25 12:17 ` Michael Ellerman
2025-01-25 12:17 ` Michael Ellerman
3 siblings, 2 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-23 23:43 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> > Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> > >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> > >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > >>>
> > >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > >>> syscall_set_return_value()").
> > >>
> > >> There is a clear detailed explanation in that commit of why it needs to
> > >> be done.
> > >>
> > >> If you think that commit is wrong you have to explain why with at least
> > >> the same level of details.
> > >
> > > OK, please have a look whether this explanation is clear and detailed enough:
> > >
> > > =======
> > > powerpc: properly negate error in syscall_set_return_value()
> > >
> > > When syscall_set_return_value() is used to set an error code, the caller
> > > specifies it as a negative value in -ERRORCODE form.
> > >
> > > In !trap_is_scv case the error code is traditionally stored as follows:
> > > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> > > Here are a few examples to illustrate this convention. The first one
> > > is from syscall_get_error():
> > > /*
> > > * If the system call failed,
> > > * regs->gpr[3] contains a positive ERRORCODE.
> > > */
> > > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> > >
> > > The second example is from regs_return_value():
> > > if (is_syscall_success(regs))
> > > return regs->gpr[3];
> > > else
> > > return -regs->gpr[3];
> > >
> > > The third example is from check_syscall_restart():
> > > regs->result = -EINTR;
> > > regs->gpr[3] = EINTR;
> > > regs->ccr |= 0x10000000;
> > >
> > > Compared with these examples, the failure of syscall_set_return_value()
> > > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > > /*
> > > * In the general case it's not obvious that we must deal with
> > > * CCR here, as the syscall exit path will also do that for us.
> > > * However there are some places, eg. the signal code, which
> > > * check ccr to decide if the value in r3 is actually an error.
> > > */
> > > if (error) {
> > > regs->ccr |= 0x10000000L;
> > > regs->gpr[3] = error;
> > > } else {
> > > regs->ccr &= ~0x10000000L;
> > > regs->gpr[3] = val;
> > > }
> > >
> > > This fix brings syscall_set_return_value() in sync with syscall_get_error()
> > > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > >
> > > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> > > =======
> >
> > I think there is still something going wrong.
> >
> > do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
> >
> > Then it calls __secure_computing() which returns what __seccomp_filter()
> > returns.
> >
> > In case of error, __seccomp_filter() calls syscall_set_return_value()
> > with a negative value then returns -1
> >
> > do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
> > do_seccomp() doesn't return 0.
> >
> > do_syscall_trace_enter() is called by system_call_exception() and
> > returns -1, so syscall_exception() returns regs->gpr[3]
> >
> > In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
> > called with the return of syscall_exception() as first parameter, which
> > leads to:
> >
> > if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> > if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> > r3 = -r3;
> > regs->ccr |= 0x10000000; /* Set SO bit in CR */
> > }
> > }
> >
> > By chance, because you have already changed the sign of gpr[3], the
> > above test fails and nothing is done to r3, and because you have also
> > already set regs->ccr it works.
> >
> > But all this looks inconsistent with the fact that do_seccomp sets
> > -ENOSYS as default value
> >
> > Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
> > syscall number and when it is wrong it goes to skip: which sets
> > regs->gpr[3] = -ENOSYS;
> >
> > So really I think it is not in line with your changes to set positive
> > value in gpr[3].
> >
> > Maybe your change is still correct but it needs to be handled completely
> > in that case.
>
> Indeed, there is an inconsistency in !trap_is_scv case.
>
> In some places such as syscall_get_error() and regs_return_value() the
> semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> therefore cannot be changed.
>
> In some other places like do_seccomp() and do_syscall_trace_enter() the
> semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> returns the system call function return value when it is executed, and
> gpr[3] otherwise. The value returned by system_call_exception() is passed
> on to syscall_exit_prepare() which performs the conversion you mentioned.
>
> What's remarkable is that in those places that are a part of the ABI the
> traditional semantics is kept, while in other places the implementation
> follows the trap_is_scv-like semantics, while traditional semantics is
> also supported there.
>
> The only case where I see some intersection is do_seccomp() where the
> tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> is not the place where the tracer *reads* the system call exit status,
> so whatever was written in gpr[3] before __secure_computing() is not
> really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> this patch applied as well as without it.
>
> After looking at system_call_exception() I doubt this inconsistency can be
> easily avoided, so I don't see how this patch could be enhanced further,
> and what else could I do with the patch besides dropping it and letting
> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> would be unfortunate.
If you say this would bring some consistency, I can extend the patch with
something like this:
diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
index 727ed4a14545..dda276a934fd 100644
--- a/arch/powerpc/kernel/ptrace/ptrace.c
+++ b/arch/powerpc/kernel/ptrace/ptrace.c
@@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
* syscall parameter. This is different to the ptrace ABI where
* both r3 and orig_gpr3 contain the first syscall parameter.
*/
- regs->gpr[3] = -ENOSYS;
+ syscall_set_return_value(current, regs, -ENOSYS, 0);
/*
* We use the __ version here because we have already checked
@@ -225,7 +225,7 @@ static int do_seccomp(struct pt_regs *regs)
* modify the first syscall parameter (in orig_gpr3) and also
* allow the syscall to proceed.
*/
- regs->gpr[3] = regs->orig_gpr3;
+ syscall_set_return_value(current, regs, 0, regs->orig_gpr3);
return 0;
}
@@ -315,7 +315,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
* If we are aborting explicitly, or if the syscall number is
* now invalid, set the return value to -ENOSYS.
*/
- regs->gpr[3] = -ENOSYS;
+ syscall_set_return_value(current, regs, -ENOSYS, 0);
return -1;
}
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index aa17e62f3754..c921e0cb54b8 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -229,14 +229,8 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
regs_add_return_ip(regs, -4);
regs->result = 0;
} else {
- if (trap_is_scv(regs)) {
- regs->result = -EINTR;
- regs->gpr[3] = -EINTR;
- } else {
- regs->result = -EINTR;
- regs->gpr[3] = EINTR;
- regs->ccr |= 0x10000000;
- }
+ regs->result = -EINTR;
+ syscall_set_return_value(current, regs, -EINTR, 0);
}
}
--
ldv
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 23:43 ` Dmitry V. Levin
@ 2025-01-24 15:18 ` Alexey Gladkov
2025-01-25 0:25 ` Dmitry V. Levin
2025-01-25 12:18 ` Michael Ellerman
2025-01-25 12:17 ` Michael Ellerman
1 sibling, 2 replies; 39+ messages in thread
From: Alexey Gladkov @ 2025-01-24 15:18 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Christophe Leroy, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Fri, Jan 24, 2025 at 01:43:22AM +0200, Dmitry V. Levin wrote:
> On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> > On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> > > Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > > > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> > > >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > > >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> > > >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > >>>
> > > >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > > >>> syscall_set_return_value()").
> > > >>
> > > >> There is a clear detailed explanation in that commit of why it needs to
> > > >> be done.
> > > >>
> > > >> If you think that commit is wrong you have to explain why with at least
> > > >> the same level of details.
> > > >
> > > > OK, please have a look whether this explanation is clear and detailed enough:
> > > >
> > > > =======
> > > > powerpc: properly negate error in syscall_set_return_value()
> > > >
> > > > When syscall_set_return_value() is used to set an error code, the caller
> > > > specifies it as a negative value in -ERRORCODE form.
> > > >
> > > > In !trap_is_scv case the error code is traditionally stored as follows:
> > > > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> > > > Here are a few examples to illustrate this convention. The first one
> > > > is from syscall_get_error():
> > > > /*
> > > > * If the system call failed,
> > > > * regs->gpr[3] contains a positive ERRORCODE.
> > > > */
> > > > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> > > >
> > > > The second example is from regs_return_value():
> > > > if (is_syscall_success(regs))
> > > > return regs->gpr[3];
> > > > else
> > > > return -regs->gpr[3];
> > > >
> > > > The third example is from check_syscall_restart():
> > > > regs->result = -EINTR;
> > > > regs->gpr[3] = EINTR;
> > > > regs->ccr |= 0x10000000;
> > > >
> > > > Compared with these examples, the failure of syscall_set_return_value()
> > > > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > > > /*
> > > > * In the general case it's not obvious that we must deal with
> > > > * CCR here, as the syscall exit path will also do that for us.
> > > > * However there are some places, eg. the signal code, which
> > > > * check ccr to decide if the value in r3 is actually an error.
> > > > */
> > > > if (error) {
> > > > regs->ccr |= 0x10000000L;
> > > > regs->gpr[3] = error;
> > > > } else {
> > > > regs->ccr &= ~0x10000000L;
> > > > regs->gpr[3] = val;
> > > > }
> > > >
> > > > This fix brings syscall_set_return_value() in sync with syscall_get_error()
> > > > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > >
> > > > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> > > > =======
> > >
> > > I think there is still something going wrong.
> > >
> > > do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
> > >
> > > Then it calls __secure_computing() which returns what __seccomp_filter()
> > > returns.
> > >
> > > In case of error, __seccomp_filter() calls syscall_set_return_value()
> > > with a negative value then returns -1
> > >
> > > do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
> > > do_seccomp() doesn't return 0.
> > >
> > > do_syscall_trace_enter() is called by system_call_exception() and
> > > returns -1, so syscall_exception() returns regs->gpr[3]
> > >
> > > In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
> > > called with the return of syscall_exception() as first parameter, which
> > > leads to:
> > >
> > > if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> > > if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> > > r3 = -r3;
> > > regs->ccr |= 0x10000000; /* Set SO bit in CR */
> > > }
> > > }
> > >
> > > By chance, because you have already changed the sign of gpr[3], the
> > > above test fails and nothing is done to r3, and because you have also
> > > already set regs->ccr it works.
> > >
> > > But all this looks inconsistent with the fact that do_seccomp sets
> > > -ENOSYS as default value
> > >
> > > Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
> > > syscall number and when it is wrong it goes to skip: which sets
> > > regs->gpr[3] = -ENOSYS;
> > >
> > > So really I think it is not in line with your changes to set positive
> > > value in gpr[3].
> > >
> > > Maybe your change is still correct but it needs to be handled completely
> > > in that case.
> >
> > Indeed, there is an inconsistency in !trap_is_scv case.
> >
> > In some places such as syscall_get_error() and regs_return_value() the
> > semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> > and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> > therefore cannot be changed.
> >
> > In some other places like do_seccomp() and do_syscall_trace_enter() the
> > semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> > ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> > returns the system call function return value when it is executed, and
> > gpr[3] otherwise. The value returned by system_call_exception() is passed
> > on to syscall_exit_prepare() which performs the conversion you mentioned.
> >
> > What's remarkable is that in those places that are a part of the ABI the
> > traditional semantics is kept, while in other places the implementation
> > follows the trap_is_scv-like semantics, while traditional semantics is
> > also supported there.
> >
> > The only case where I see some intersection is do_seccomp() where the
> > tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> > is not the place where the tracer *reads* the system call exit status,
> > so whatever was written in gpr[3] before __secure_computing() is not
> > really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> > this patch applied as well as without it.
> >
> > After looking at system_call_exception() I doubt this inconsistency can be
> > easily avoided, so I don't see how this patch could be enhanced further,
> > and what else could I do with the patch besides dropping it and letting
> > !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> > would be unfortunate.
>
> If you say this would bring some consistency, I can extend the patch with
> something like this:
>
> diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
> index 727ed4a14545..dda276a934fd 100644
> --- a/arch/powerpc/kernel/ptrace/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> @@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
> * syscall parameter. This is different to the ptrace ABI where
> * both r3 and orig_gpr3 contain the first syscall parameter.
> */
> - regs->gpr[3] = -ENOSYS;
> + syscall_set_return_value(current, regs, -ENOSYS, 0);
>
> /*
> * We use the __ version here because we have already checked
> @@ -225,7 +225,7 @@ static int do_seccomp(struct pt_regs *regs)
> * modify the first syscall parameter (in orig_gpr3) and also
> * allow the syscall to proceed.
> */
> - regs->gpr[3] = regs->orig_gpr3;
> + syscall_set_return_value(current, regs, 0, regs->orig_gpr3);
>
> return 0;
> }
> @@ -315,7 +315,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> * If we are aborting explicitly, or if the syscall number is
> * now invalid, set the return value to -ENOSYS.
> */
> - regs->gpr[3] = -ENOSYS;
> + syscall_set_return_value(current, regs, -ENOSYS, 0);
> return -1;
> }
>
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index aa17e62f3754..c921e0cb54b8 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -229,14 +229,8 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
> regs_add_return_ip(regs, -4);
> regs->result = 0;
> } else {
> - if (trap_is_scv(regs)) {
> - regs->result = -EINTR;
> - regs->gpr[3] = -EINTR;
> - } else {
> - regs->result = -EINTR;
> - regs->gpr[3] = EINTR;
> - regs->ccr |= 0x10000000;
> - }
> + regs->result = -EINTR;
> + syscall_set_return_value(current, regs, -EINTR, 0);
> }
> }
I'm not a powerpc expert but shouldn't be used regs->gpr[3] via a
regs_return_value() in system_call_exception() ?
notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
{
...
r0 = do_syscall_trace_enter(regs);
if (unlikely(r0 >= NR_syscalls))
return regs->gpr[3];
} else if (unlikely(r0 >= NR_syscalls)) {
if (unlikely(trap_is_unsupported_scv(regs))) {
/* Unsupported scv vector */
_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);
return regs->gpr[3];
}
return -ENOSYS;
}
}
--
Rgrds, legion
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-24 15:18 ` Alexey Gladkov
@ 2025-01-25 0:25 ` Dmitry V. Levin
2025-01-25 12:18 ` Michael Ellerman
1 sibling, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-25 0:25 UTC (permalink / raw)
To: Alexey Gladkov
Cc: Christophe Leroy, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Fri, Jan 24, 2025 at 04:18:10PM +0100, Alexey Gladkov wrote:
> On Fri, Jan 24, 2025 at 01:43:22AM +0200, Dmitry V. Levin wrote:
> > On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> > > On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
> > > > Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
> > > > > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
> > > > >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
> > > > >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
> > > > >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > > >>>
> > > > >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
> > > > >>> syscall_set_return_value()").
> > > > >>
> > > > >> There is a clear detailed explanation in that commit of why it needs to
> > > > >> be done.
> > > > >>
> > > > >> If you think that commit is wrong you have to explain why with at least
> > > > >> the same level of details.
> > > > >
> > > > > OK, please have a look whether this explanation is clear and detailed enough:
> > > > >
> > > > > =======
> > > > > powerpc: properly negate error in syscall_set_return_value()
> > > > >
> > > > > When syscall_set_return_value() is used to set an error code, the caller
> > > > > specifies it as a negative value in -ERRORCODE form.
> > > > >
> > > > > In !trap_is_scv case the error code is traditionally stored as follows:
> > > > > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
> > > > > Here are a few examples to illustrate this convention. The first one
> > > > > is from syscall_get_error():
> > > > > /*
> > > > > * If the system call failed,
> > > > > * regs->gpr[3] contains a positive ERRORCODE.
> > > > > */
> > > > > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> > > > >
> > > > > The second example is from regs_return_value():
> > > > > if (is_syscall_success(regs))
> > > > > return regs->gpr[3];
> > > > > else
> > > > > return -regs->gpr[3];
> > > > >
> > > > > The third example is from check_syscall_restart():
> > > > > regs->result = -EINTR;
> > > > > regs->gpr[3] = EINTR;
> > > > > regs->ccr |= 0x10000000;
> > > > >
> > > > > Compared with these examples, the failure of syscall_set_return_value()
> > > > > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
> > > > > /*
> > > > > * In the general case it's not obvious that we must deal with
> > > > > * CCR here, as the syscall exit path will also do that for us.
> > > > > * However there are some places, eg. the signal code, which
> > > > > * check ccr to decide if the value in r3 is actually an error.
> > > > > */
> > > > > if (error) {
> > > > > regs->ccr |= 0x10000000L;
> > > > > regs->gpr[3] = error;
> > > > > } else {
> > > > > regs->ccr &= ~0x10000000L;
> > > > > regs->gpr[3] = val;
> > > > > }
> > > > >
> > > > > This fix brings syscall_set_return_value() in sync with syscall_get_error()
> > > > > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
> > > > >
> > > > > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
> > > > > =======
> > > >
> > > > I think there is still something going wrong.
> > > >
> > > > do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
> > > >
> > > > Then it calls __secure_computing() which returns what __seccomp_filter()
> > > > returns.
> > > >
> > > > In case of error, __seccomp_filter() calls syscall_set_return_value()
> > > > with a negative value then returns -1
> > > >
> > > > do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
> > > > do_seccomp() doesn't return 0.
> > > >
> > > > do_syscall_trace_enter() is called by system_call_exception() and
> > > > returns -1, so syscall_exception() returns regs->gpr[3]
> > > >
> > > > In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
> > > > called with the return of syscall_exception() as first parameter, which
> > > > leads to:
> > > >
> > > > if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
> > > > if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
> > > > r3 = -r3;
> > > > regs->ccr |= 0x10000000; /* Set SO bit in CR */
> > > > }
> > > > }
> > > >
> > > > By chance, because you have already changed the sign of gpr[3], the
> > > > above test fails and nothing is done to r3, and because you have also
> > > > already set regs->ccr it works.
> > > >
> > > > But all this looks inconsistent with the fact that do_seccomp sets
> > > > -ENOSYS as default value
> > > >
> > > > Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
> > > > syscall number and when it is wrong it goes to skip: which sets
> > > > regs->gpr[3] = -ENOSYS;
> > > >
> > > > So really I think it is not in line with your changes to set positive
> > > > value in gpr[3].
> > > >
> > > > Maybe your change is still correct but it needs to be handled completely
> > > > in that case.
> > >
> > > Indeed, there is an inconsistency in !trap_is_scv case.
> > >
> > > In some places such as syscall_get_error() and regs_return_value() the
> > > semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> > > and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> > > therefore cannot be changed.
> > >
> > > In some other places like do_seccomp() and do_syscall_trace_enter() the
> > > semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> > > ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> > > returns the system call function return value when it is executed, and
> > > gpr[3] otherwise. The value returned by system_call_exception() is passed
> > > on to syscall_exit_prepare() which performs the conversion you mentioned.
> > >
> > > What's remarkable is that in those places that are a part of the ABI the
> > > traditional semantics is kept, while in other places the implementation
> > > follows the trap_is_scv-like semantics, while traditional semantics is
> > > also supported there.
> > >
> > > The only case where I see some intersection is do_seccomp() where the
> > > tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> > > is not the place where the tracer *reads* the system call exit status,
> > > so whatever was written in gpr[3] before __secure_computing() is not
> > > really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> > > this patch applied as well as without it.
> > >
> > > After looking at system_call_exception() I doubt this inconsistency can be
> > > easily avoided, so I don't see how this patch could be enhanced further,
> > > and what else could I do with the patch besides dropping it and letting
> > > !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> > > would be unfortunate.
> >
> > If you say this would bring some consistency, I can extend the patch with
> > something like this:
> >
> > diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
> > index 727ed4a14545..dda276a934fd 100644
> > --- a/arch/powerpc/kernel/ptrace/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> > @@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
> > * syscall parameter. This is different to the ptrace ABI where
> > * both r3 and orig_gpr3 contain the first syscall parameter.
> > */
> > - regs->gpr[3] = -ENOSYS;
> > + syscall_set_return_value(current, regs, -ENOSYS, 0);
> >
> > /*
> > * We use the __ version here because we have already checked
> > @@ -225,7 +225,7 @@ static int do_seccomp(struct pt_regs *regs)
> > * modify the first syscall parameter (in orig_gpr3) and also
> > * allow the syscall to proceed.
> > */
> > - regs->gpr[3] = regs->orig_gpr3;
> > + syscall_set_return_value(current, regs, 0, regs->orig_gpr3);
> >
> > return 0;
> > }
> > @@ -315,7 +315,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> > * If we are aborting explicitly, or if the syscall number is
> > * now invalid, set the return value to -ENOSYS.
> > */
> > - regs->gpr[3] = -ENOSYS;
> > + syscall_set_return_value(current, regs, -ENOSYS, 0);
> > return -1;
> > }
> >
> > diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> > index aa17e62f3754..c921e0cb54b8 100644
> > --- a/arch/powerpc/kernel/signal.c
> > +++ b/arch/powerpc/kernel/signal.c
> > @@ -229,14 +229,8 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
> > regs_add_return_ip(regs, -4);
> > regs->result = 0;
> > } else {
> > - if (trap_is_scv(regs)) {
> > - regs->result = -EINTR;
> > - regs->gpr[3] = -EINTR;
> > - } else {
> > - regs->result = -EINTR;
> > - regs->gpr[3] = EINTR;
> > - regs->ccr |= 0x10000000;
> > - }
> > + regs->result = -EINTR;
> > + syscall_set_return_value(current, regs, -EINTR, 0);
> > }
> > }
>
> I'm not a powerpc expert but shouldn't be used regs->gpr[3] via a
> regs_return_value() in system_call_exception() ?
This would ensure that system_call_exception() returns errors in -ERRORCODE
form, which wouldn't have any practical difference given that the return
code is passed on to syscall_exit_prepare() which performs the conversion.
However, this could bring more consistency when applied along with other
consistency-related changes.
I wish the people responsible for powerpc would be more specific about
the level of consistency they are ready to maintain.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 18:28 ` Dmitry V. Levin
` (2 preceding siblings ...)
2025-01-23 23:43 ` Dmitry V. Levin
@ 2025-01-25 12:17 ` Michael Ellerman
2025-01-25 21:25 ` Dmitry V. Levin
3 siblings, 1 reply; 39+ messages in thread
From: Michael Ellerman @ 2025-01-25 12:17 UTC (permalink / raw)
To: Dmitry V. Levin, Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
"Dmitry V. Levin" <ldv@strace.io> writes:
> On Mon, Jan 20, 2025 at 02:51:38PM +0100, Christophe Leroy wrote:
>> Le 14/01/2025 à 18:04, Dmitry V. Levin a écrit :
>> > On Mon, Jan 13, 2025 at 06:34:44PM +0100, Christophe Leroy wrote:
>> >> Le 13/01/2025 à 18:10, Dmitry V. Levin a écrit :
>> >>> Bring syscall_set_return_value() in sync with syscall_get_error(),
>> >>> and let upcoming ptrace/set_syscall_info selftest pass on powerpc.
>> >>>
>> >>> This reverts commit 1b1a3702a65c ("powerpc: Don't negate error in
>> >>> syscall_set_return_value()").
>> >>
>> >> There is a clear detailed explanation in that commit of why it needs to
>> >> be done.
>> >>
>> >> If you think that commit is wrong you have to explain why with at least
>> >> the same level of details.
>> >
>> > OK, please have a look whether this explanation is clear and detailed enough:
>> >
>> > =======
>> > powerpc: properly negate error in syscall_set_return_value()
>> >
>> > When syscall_set_return_value() is used to set an error code, the caller
>> > specifies it as a negative value in -ERRORCODE form.
>> >
>> > In !trap_is_scv case the error code is traditionally stored as follows:
>> > gpr[3] contains a positive ERRORCODE, and ccr has 0x10000000 flag set.
>> > Here are a few examples to illustrate this convention. The first one
>> > is from syscall_get_error():
>> > /*
>> > * If the system call failed,
>> > * regs->gpr[3] contains a positive ERRORCODE.
>> > */
>> > return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
>> >
>> > The second example is from regs_return_value():
>> > if (is_syscall_success(regs))
>> > return regs->gpr[3];
>> > else
>> > return -regs->gpr[3];
>> >
>> > The third example is from check_syscall_restart():
>> > regs->result = -EINTR;
>> > regs->gpr[3] = EINTR;
>> > regs->ccr |= 0x10000000;
>> >
>> > Compared with these examples, the failure of syscall_set_return_value()
>> > to assign a positive ERRORCODE into regs->gpr[3] is clearly visible:
>> > /*
>> > * In the general case it's not obvious that we must deal with
>> > * CCR here, as the syscall exit path will also do that for us.
>> > * However there are some places, eg. the signal code, which
>> > * check ccr to decide if the value in r3 is actually an error.
>> > */
>> > if (error) {
>> > regs->ccr |= 0x10000000L;
>> > regs->gpr[3] = error;
>> > } else {
>> > regs->ccr &= ~0x10000000L;
>> > regs->gpr[3] = val;
>> > }
>> >
>> > This fix brings syscall_set_return_value() in sync with syscall_get_error()
>> > and lets upcoming ptrace/set_syscall_info selftest pass on powerpc.
>> >
>> > Fixes: 1b1a3702a65c ("powerpc: Don't negate error in syscall_set_return_value()").
>> > =======
>>
>> I think there is still something going wrong.
>>
>> do_seccomp() sets regs->gpr[3] = -ENOSYS; by default.
>>
>> Then it calls __secure_computing() which returns what __seccomp_filter()
>> returns.
>>
>> In case of error, __seccomp_filter() calls syscall_set_return_value()
>> with a negative value then returns -1
>>
>> do_seccomp() is called by do_syscall_trace_enter() which returns -1 when
>> do_seccomp() doesn't return 0.
>>
>> do_syscall_trace_enter() is called by system_call_exception() and
>> returns -1, so syscall_exception() returns regs->gpr[3]
>>
>> In entry_32.S, transfer_to_syscall, syscall_exit_prepare() is then
>> called with the return of syscall_exception() as first parameter, which
>> leads to:
>>
>> if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && is_not_scv) {
>> if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) {
>> r3 = -r3;
>> regs->ccr |= 0x10000000; /* Set SO bit in CR */
>> }
>> }
>>
>> By chance, because you have already changed the sign of gpr[3], the
>> above test fails and nothing is done to r3, and because you have also
>> already set regs->ccr it works.
>>
>> But all this looks inconsistent with the fact that do_seccomp sets
>> -ENOSYS as default value
>>
>> Also, when do_seccomp() returns 0, do_syscall_trace_enter() check the
>> syscall number and when it is wrong it goes to skip: which sets
>> regs->gpr[3] = -ENOSYS;
>>
>> So really I think it is not in line with your changes to set positive
>> value in gpr[3].
>>
>> Maybe your change is still correct but it needs to be handled completely
>> in that case.
>
> Indeed, there is an inconsistency in !trap_is_scv case.
>
> In some places such as syscall_get_error() and regs_return_value() the
> semantics is as I described earlier: gpr[3] contains a positive ERRORCODE
> and ccr has 0x10000000 flag set. This semantics is a part of the ABI and
> therefore cannot be changed.
>
> In some other places like do_seccomp() and do_syscall_trace_enter() the
> semantics is similar to the trap_is_scv case: gpr[3] contains a negative
> ERRORCODE and ccr is unchanged. In addition, system_call_exception()
> returns the system call function return value when it is executed, and
> gpr[3] otherwise. The value returned by system_call_exception() is passed
> on to syscall_exit_prepare() which performs the conversion you mentioned.
>
> What's remarkable is that in those places that are a part of the ABI the
> traditional semantics is kept, while in other places the implementation
> follows the trap_is_scv-like semantics, while traditional semantics is
> also supported there.
scv didn't exist when the seccomp code was written so that's not really
the right way to look at it.
The distinction was between the in-kernel semantic of negative
ERRORCODE, which is used everywhere, vs the original (non-scv) syscall
ABI which uses positive ERRORCODE and CCR.SO.
The way I wrote it at the time was to try and maintain the negative
ERRORCODE semantic in the kernel, and only flip to positive ERRORCODE
when we actually exit to userspace.
But even back then syscall_set_return_value() needed to set CCR.SO to
make some cases work, so it was probably the wrong design.
> The only case where I see some intersection is do_seccomp() where the
> tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> is not the place where the tracer *reads* the system call exit status,
> so whatever was written in gpr[3] before __secure_computing() is not
> really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> this patch applied as well as without it.
IIRC it is important for a tracer that blocks the syscall but doesn't
explicitly set the return value. But it's only important that the
default return value is syscall failure (ie. ENOSYS/-ENOSYS), the actual
sign of the r3 value should be irrelevant to the tracer.
If the selftest still passes then that's probably sufficient.
cheers
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 23:43 ` Dmitry V. Levin
2025-01-24 15:18 ` Alexey Gladkov
@ 2025-01-25 12:17 ` Michael Ellerman
2025-01-25 20:48 ` Dmitry V. Levin
1 sibling, 1 reply; 39+ messages in thread
From: Michael Ellerman @ 2025-01-25 12:17 UTC (permalink / raw)
To: Dmitry V. Levin, Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
"Dmitry V. Levin" <ldv@strace.io> writes:
> On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
...
>> After looking at system_call_exception() I doubt this inconsistency can be
>> easily avoided, so I don't see how this patch could be enhanced further,
>> and what else could I do with the patch besides dropping it and letting
>> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
>> would be unfortunate.
>
> If you say this would bring some consistency, I can extend the patch with
> something like this:
Yes that would improve things IMHO, with one caveat ....
> diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
> index 727ed4a14545..dda276a934fd 100644
> --- a/arch/powerpc/kernel/ptrace/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> @@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
> * syscall parameter. This is different to the ptrace ABI where
> * both r3 and orig_gpr3 contain the first syscall parameter.
> */
> - regs->gpr[3] = -ENOSYS;
> + syscall_set_return_value(current, regs, -ENOSYS, 0);
>
> /*
> * We use the __ version here because we have already checked
> @@ -225,7 +225,7 @@ static int do_seccomp(struct pt_regs *regs)
> * modify the first syscall parameter (in orig_gpr3) and also
> * allow the syscall to proceed.
> */
> - regs->gpr[3] = regs->orig_gpr3;
> + syscall_set_return_value(current, regs, 0, regs->orig_gpr3);
This case should remain as-is. The orig_gpr3 value here is not a syscall
error code, it's the original r3 value, which is a syscall parameter.
If the tracer wants to fail the syscall it should have set something in
r3, not orig_gpr3.
> return 0;
> }
> @@ -315,7 +315,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> * If we are aborting explicitly, or if the syscall number is
> * now invalid, set the return value to -ENOSYS.
> */
> - regs->gpr[3] = -ENOSYS;
> + syscall_set_return_value(current, regs, -ENOSYS, 0);
> return -1;
> }
>
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index aa17e62f3754..c921e0cb54b8 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -229,14 +229,8 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka,
> regs_add_return_ip(regs, -4);
> regs->result = 0;
> } else {
> - if (trap_is_scv(regs)) {
> - regs->result = -EINTR;
> - regs->gpr[3] = -EINTR;
> - } else {
> - regs->result = -EINTR;
> - regs->gpr[3] = EINTR;
> - regs->ccr |= 0x10000000;
> - }
> + regs->result = -EINTR;
> + syscall_set_return_value(current, regs, -EINTR, 0);
> }
> }
cheers
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-24 15:18 ` Alexey Gladkov
2025-01-25 0:25 ` Dmitry V. Levin
@ 2025-01-25 12:18 ` Michael Ellerman
2025-01-27 11:13 ` Dmitry V. Levin
1 sibling, 1 reply; 39+ messages in thread
From: Michael Ellerman @ 2025-01-25 12:18 UTC (permalink / raw)
To: Alexey Gladkov, Dmitry V. Levin
Cc: Christophe Leroy, Oleg Nesterov, Eugene Syromyatnikov,
Mike Frysinger, Renzo Davoli, Davide Berardi, strace-devel,
Madhavan Srinivasan, Nicholas Piggin, Naveen N Rao, linuxppc-dev,
linux-kernel
Alexey Gladkov <legion@kernel.org> writes:
>
...
> I'm not a powerpc expert but shouldn't be used regs->gpr[3] via a
> regs_return_value() in system_call_exception() ?
Yes I agree.
> notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
> {
> ...
> r0 = do_syscall_trace_enter(regs);
> if (unlikely(r0 >= NR_syscalls))
> return regs->gpr[3];
This is the case where we're expecting the r3 value to be a negative
error code, to match the in-kernel semantics. But after this change it
would be a positive error value. It is probably harmless with the
current code structure, but that's just luck.
cheers
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-25 12:17 ` Michael Ellerman
@ 2025-01-25 20:48 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-25 20:48 UTC (permalink / raw)
To: Michael Ellerman
Cc: Christophe Leroy, Alexey Gladkov, Oleg Nesterov,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Sat, Jan 25, 2025 at 11:17:58PM +1100, Michael Ellerman wrote:
> "Dmitry V. Levin" <ldv@strace.io> writes:
> > On Thu, Jan 23, 2025 at 08:28:15PM +0200, Dmitry V. Levin wrote:
> ...
> >> After looking at system_call_exception() I doubt this inconsistency can be
> >> easily avoided, so I don't see how this patch could be enhanced further,
> >> and what else could I do with the patch besides dropping it and letting
> >> !trap_is_scv case be unsupported by PTRACE_SET_SYSCALL_INFO API, which
> >> would be unfortunate.
> >
> > If you say this would bring some consistency, I can extend the patch with
> > something like this:
>
> Yes that would improve things IMHO, with one caveat ....
>
> > diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
> > index 727ed4a14545..dda276a934fd 100644
> > --- a/arch/powerpc/kernel/ptrace/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> > @@ -207,7 +207,7 @@ static int do_seccomp(struct pt_regs *regs)
> > * syscall parameter. This is different to the ptrace ABI where
> > * both r3 and orig_gpr3 contain the first syscall parameter.
> > */
> > - regs->gpr[3] = -ENOSYS;
> > + syscall_set_return_value(current, regs, -ENOSYS, 0);
> >
> > /*
> > * We use the __ version here because we have already checked
> > @@ -225,7 +225,7 @@ static int do_seccomp(struct pt_regs *regs)
> > * modify the first syscall parameter (in orig_gpr3) and also
> > * allow the syscall to proceed.
> > */
> > - regs->gpr[3] = regs->orig_gpr3;
> > + syscall_set_return_value(current, regs, 0, regs->orig_gpr3);
>
> This case should remain as-is. The orig_gpr3 value here is not a syscall
> error code, it's the original r3 value, which is a syscall parameter.
I agree, but shouldn't CCR.SO be cleared somehow after it was set earlier by
syscall_set_return_value(current, regs, -ENOSYS, 0);
?
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-25 12:17 ` Michael Ellerman
@ 2025-01-25 21:25 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-25 21:25 UTC (permalink / raw)
To: Michael Ellerman
Cc: Christophe Leroy, Alexey Gladkov, Oleg Nesterov,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Sat, Jan 25, 2025 at 11:17:45PM +1100, Michael Ellerman wrote:
> "Dmitry V. Levin" <ldv@strace.io> writes:
[...]
> > The only case where I see some intersection is do_seccomp() where the
> > tracer would be able to see -ENOSYS in gpr[3]. However, the seccomp stop
> > is not the place where the tracer *reads* the system call exit status,
> > so whatever was written in gpr[3] before __secure_computing() is not
> > really relevant, consequently, selftests/seccomp/seccomp_bpf passes with
> > this patch applied as well as without it.
>
> IIRC it is important for a tracer that blocks the syscall but doesn't
> explicitly set the return value. But it's only important that the
> default return value is syscall failure (ie. ENOSYS/-ENOSYS), the actual
> sign of the r3 value should be irrelevant to the tracer.
>
> If the selftest still passes then that's probably sufficient.
Yes, I failed to explain this properly, thanks for correcting me.
With the current implementation, both -ENOSYS and ENOSYS/cr0.SO semantics
of the error code at __secure_computing() stage lead to the same result,
this is the reason why seccomp_bpf selftest passes regardless of the patch.
At any point where the tracer is entitled to interpret gpr[3] as a syscall
return value, the semantics of gpr[3] is well-defined (-ERRORCODE/cr0.SO
in non-scv case) and is a part of the ABI.
However, since we have to provide backwards compatibility with the current
inconsistent implementation, in the non-scv case we have to continue
supporting both -ENOSYS and ENOSYS/cr0.SO semantics of the syscall return
value set by the tracer at __secure_computing() stage.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-25 12:18 ` Michael Ellerman
@ 2025-01-27 11:13 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-27 11:13 UTC (permalink / raw)
To: Michael Ellerman
Cc: Alexey Gladkov, Christophe Leroy, Oleg Nesterov,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Sat, Jan 25, 2025 at 11:18:06PM +1100, Michael Ellerman wrote:
> Alexey Gladkov <legion@kernel.org> writes:
> >
> ...
> > I'm not a powerpc expert but shouldn't be used regs->gpr[3] via a
> > regs_return_value() in system_call_exception() ?
>
> Yes I agree.
>
> > notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
> > {
> > ...
> > r0 = do_syscall_trace_enter(regs);
> > if (unlikely(r0 >= NR_syscalls))
> > return regs->gpr[3];
>
> This is the case where we're expecting the r3 value to be a negative
> error code, to match the in-kernel semantics. But after this change it
> would be a positive error value. It is probably harmless with the
> current code structure, but that's just luck.
I'm afraid that's not just luck. do_seccomp() from the very beginning
supports both the generic kernel -ERRORCODE return value ABI and the
powerpc sc syscall return ABI, thanks to syscall_exit_prepare() that
converts the former to the latter. Given that this inconsistency was
exposed to user space via PTRACE_EVENT_SECCOMP tracers for so many years,
I suppose backwards compatibility has to be provided. Consequently, since
the point of __secure_computing() invocation and up to the point of
conversion in syscall_exit_prepare(), gpr[3] may be set according to
either of these two ABIs. Unfortunately, this means any future attempt
to avoid the inconsistency would be inherently incomplete.
For this reason, I doubt it would make sense to include into the patch
any changes that are needed only to address this consistency issue.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-23 22:07 ` Christophe Leroy
2025-01-23 22:35 ` Dmitry V. Levin
@ 2025-01-27 11:20 ` Dmitry V. Levin
2025-01-27 11:36 ` Christophe Leroy
1 sibling, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-27 11:20 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
[...]
> To add a bit more to the confusion,
Looks like there is no end to it:
static inline long regs_return_value(struct pt_regs *regs)
{
if (trap_is_scv(regs))
return regs->gpr[3];
if (is_syscall_success(regs))
return regs->gpr[3];
else
return -regs->gpr[3];
}
static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
{
regs->gpr[3] = rc;
}
This doesn't look consistent, does it?
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-27 11:20 ` Dmitry V. Levin
@ 2025-01-27 11:36 ` Christophe Leroy
2025-01-27 11:44 ` Dmitry V. Levin
0 siblings, 1 reply; 39+ messages in thread
From: Christophe Leroy @ 2025-01-27 11:36 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
> On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
> [...]
>> To add a bit more to the confusion,
>
> Looks like there is no end to it:
>
> static inline long regs_return_value(struct pt_regs *regs)
> {
> if (trap_is_scv(regs))
> return regs->gpr[3];
>
> if (is_syscall_success(regs))
> return regs->gpr[3];
> else
> return -regs->gpr[3];
> }
>
> static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> {
> regs->gpr[3] = rc;
> }
>
> This doesn't look consistent, does it?
>
>
That regs_set_return_value() looks pretty similar to
syscall_get_return_value().
regs_set_return_value() documentation in asm-generic/syscall.h
explicitely says: This value is meaningless if syscall_get_error()
returned nonzero
Is it the same with regs_set_return_value(), only meaningfull where
there is no error ?
By the way, why have two very similar APIs, one in syscall.h one in
ptrace.h ?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-27 11:36 ` Christophe Leroy
@ 2025-01-27 11:44 ` Dmitry V. Levin
2025-01-27 12:04 ` Christophe Leroy
0 siblings, 1 reply; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-27 11:44 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:
> Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
> > On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
> > [...]
> >> To add a bit more to the confusion,
> >
> > Looks like there is no end to it:
> >
> > static inline long regs_return_value(struct pt_regs *regs)
> > {
> > if (trap_is_scv(regs))
> > return regs->gpr[3];
> >
> > if (is_syscall_success(regs))
> > return regs->gpr[3];
> > else
> > return -regs->gpr[3];
> > }
> >
> > static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> > {
> > regs->gpr[3] = rc;
> > }
> >
> > This doesn't look consistent, does it?
> >
> >
>
> That regs_set_return_value() looks pretty similar to
> syscall_get_return_value().
Yes, but here similarities end, and differences begin.
> regs_set_return_value() documentation in asm-generic/syscall.h
> explicitely says: This value is meaningless if syscall_get_error()
> returned nonzero
>
> Is it the same with regs_set_return_value(), only meaningfull where
> there is no error ?
Did you mean syscall_set_return_value? No, it explicitly has two
arguments, "int error" and "long val", so it can be used to either
clear or set the error condition as specified by the caller.
> By the way, why have two very similar APIs, one in syscall.h one in
> ptrace.h ?
I have no polite answer to this, sorry.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-27 11:44 ` Dmitry V. Levin
@ 2025-01-27 12:04 ` Christophe Leroy
2025-01-27 12:26 ` Dmitry V. Levin
0 siblings, 1 reply; 39+ messages in thread
From: Christophe Leroy @ 2025-01-27 12:04 UTC (permalink / raw)
To: Dmitry V. Levin
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
Le 27/01/2025 à 12:44, Dmitry V. Levin a écrit :
> On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:
>> Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
>>> On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
>>> [...]
>>>> To add a bit more to the confusion,
>>>
>>> Looks like there is no end to it:
>>>
>>> static inline long regs_return_value(struct pt_regs *regs)
>>> {
>>> if (trap_is_scv(regs))
>>> return regs->gpr[3];
>>>
>>> if (is_syscall_success(regs))
>>> return regs->gpr[3];
>>> else
>>> return -regs->gpr[3];
>>> }
>>>
>>> static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
>>> {
>>> regs->gpr[3] = rc;
>>> }
>>>
>>> This doesn't look consistent, does it?
>>>
>>>
>>
>> That regs_set_return_value() looks pretty similar to
>> syscall_get_return_value().
>
> Yes, but here similarities end, and differences begin.
>
>> regs_set_return_value() documentation in asm-generic/syscall.h
>> explicitely says: This value is meaningless if syscall_get_error()
>> returned nonzero
>>
>> Is it the same with regs_set_return_value(), only meaningfull where
>> there is no error ?
>
> Did you mean syscall_set_return_value? No, it explicitly has two
> arguments, "int error" and "long val", so it can be used to either
> clear or set the error condition as specified by the caller.
Sorry, I mean syscall_get_return_value() here.
static inline long syscall_get_return_value(struct task_struct *task,
struct pt_regs *regs)
{
return regs->gpr[3];
}
Versus
static inline void regs_set_return_value(struct pt_regs *regs, unsigned
long rc)
{
regs->gpr[3] = rc;
}
>
>> By the way, why have two very similar APIs, one in syscall.h one in
>> ptrace.h ?
>
> I have no polite answer to this, sorry.
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value()
2025-01-27 12:04 ` Christophe Leroy
@ 2025-01-27 12:26 ` Dmitry V. Levin
0 siblings, 0 replies; 39+ messages in thread
From: Dmitry V. Levin @ 2025-01-27 12:26 UTC (permalink / raw)
To: Christophe Leroy
Cc: Alexey Gladkov, Oleg Nesterov, Michael Ellerman,
Eugene Syromyatnikov, Mike Frysinger, Renzo Davoli,
Davide Berardi, strace-devel, Madhavan Srinivasan,
Nicholas Piggin, Naveen N Rao, linuxppc-dev, linux-kernel
On Mon, Jan 27, 2025 at 01:04:27PM +0100, Christophe Leroy wrote:
>
>
> Le 27/01/2025 à 12:44, Dmitry V. Levin a écrit :
> > On Mon, Jan 27, 2025 at 12:36:53PM +0100, Christophe Leroy wrote:
> >> Le 27/01/2025 à 12:20, Dmitry V. Levin a écrit :
> >>> On Thu, Jan 23, 2025 at 11:07:21PM +0100, Christophe Leroy wrote:
> >>> [...]
> >>>> To add a bit more to the confusion,
> >>>
> >>> Looks like there is no end to it:
> >>>
> >>> static inline long regs_return_value(struct pt_regs *regs)
> >>> {
> >>> if (trap_is_scv(regs))
> >>> return regs->gpr[3];
> >>>
> >>> if (is_syscall_success(regs))
> >>> return regs->gpr[3];
> >>> else
> >>> return -regs->gpr[3];
> >>> }
> >>>
> >>> static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> >>> {
> >>> regs->gpr[3] = rc;
> >>> }
> >>>
> >>> This doesn't look consistent, does it?
> >>>
> >>>
> >>
> >> That regs_set_return_value() looks pretty similar to
> >> syscall_get_return_value().
> >
> > Yes, but here similarities end, and differences begin.
> >
> >> regs_set_return_value() documentation in asm-generic/syscall.h
> >> explicitely says: This value is meaningless if syscall_get_error()
> >> returned nonzero
> >>
> >> Is it the same with regs_set_return_value(), only meaningfull where
> >> there is no error ?
> >
> > Did you mean syscall_set_return_value? No, it explicitly has two
> > arguments, "int error" and "long val", so it can be used to either
> > clear or set the error condition as specified by the caller.
>
> Sorry, I mean syscall_get_return_value() here.
>
> static inline long syscall_get_return_value(struct task_struct *task,
> struct pt_regs *regs)
> {
> return regs->gpr[3];
> }
>
> Versus
>
> static inline void regs_set_return_value(struct pt_regs *regs, unsigned
> long rc)
> {
> regs->gpr[3] = rc;
> }
The asm/syscall.h API provides two functions to obtain the return value:
syscall_get_error() and syscall_get_return_value(). The first one is used
to obtain the error code when the error condition is set. When the error
condition is not set, it returns 0. The second function is used to obtain
the return value when the error condition is not set. When the error
condition is set, its return value is undefined.
--
ldv
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2025-01-27 12:26 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250113170925.GA392@strace.io>
2025-01-13 17:10 ` [PATCH v2 1/7] powerpc: properly negate error in syscall_set_return_value() Dmitry V. Levin
2025-01-13 17:34 ` Christophe Leroy
2025-01-13 17:54 ` Dmitry V. Levin
2025-01-14 17:04 ` Dmitry V. Levin
2025-01-20 13:51 ` Christophe Leroy
2025-01-20 17:12 ` Dmitry V. Levin
2025-01-21 11:13 ` Madhavan Srinivasan
2025-01-21 11:28 ` Christophe Leroy
2025-01-21 12:25 ` Madhavan Srinivasan
2025-01-21 12:42 ` Dmitry V. Levin
2025-01-23 18:28 ` Dmitry V. Levin
2025-01-23 19:11 ` Eugene Syromyatnikov
2025-01-23 22:16 ` Dmitry V. Levin
2025-01-23 22:07 ` Christophe Leroy
2025-01-23 22:35 ` Dmitry V. Levin
2025-01-27 11:20 ` Dmitry V. Levin
2025-01-27 11:36 ` Christophe Leroy
2025-01-27 11:44 ` Dmitry V. Levin
2025-01-27 12:04 ` Christophe Leroy
2025-01-27 12:26 ` Dmitry V. Levin
2025-01-23 23:43 ` Dmitry V. Levin
2025-01-24 15:18 ` Alexey Gladkov
2025-01-25 0:25 ` Dmitry V. Levin
2025-01-25 12:18 ` Michael Ellerman
2025-01-27 11:13 ` Dmitry V. Levin
2025-01-25 12:17 ` Michael Ellerman
2025-01-25 20:48 ` Dmitry V. Levin
2025-01-25 12:17 ` Michael Ellerman
2025-01-25 21:25 ` Dmitry V. Levin
2025-01-14 13:00 ` Alexey Gladkov
2025-01-14 13:48 ` Dmitry V. Levin
2025-01-14 14:53 ` Alexey Gladkov
2025-01-13 17:11 ` [PATCH v2 3/7] syscall.h: add syscall_set_arguments() and syscall_set_return_value() Dmitry V. Levin
2025-01-16 2:20 ` Charlie Jenkins
2025-01-17 0:59 ` H. Peter Anvin
2025-01-17 15:45 ` Eugene Syromyatnikov
2025-01-18 4:34 ` H. Peter Anvin
2025-01-13 17:11 ` [PATCH v2 4/7] syscall.h: introduce syscall_set_nr() Dmitry V. Levin
2025-01-16 2:20 ` Charlie Jenkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).