[PATCH] coredump: reduce stack usage in vfs

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] coredump: reduce stack usage in vfs_coredump()
@ 2025-06-20 11:21 Arnd Bergmann
  2025-06-20 16:49 ` Alexander Mikhalitsyn
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Arnd Bergmann @ 2025-06-20 11:21 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Arnd Bergmann, Jan Kara, Alexander Mikhalitsyn, Jann Horn,
	Luca Boccassi, Jeff Layton, Roman Kisel, linux-fsdevel,
	linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

The newly added socket coredump code runs into some corner cases
with KASAN that end up needing a lot of stack space:

fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]

Mark the socket helper function as noinline_for_stack so its stack
usage does not leak out to the other code paths. This also seems to
help with register pressure, and the resulting combined stack usage of
vfs_coredump() and coredump_socket() is actually lower than the inlined
version.

Moving the core_state variable into coredump_wait() helps reduce the
stack usage further and simplifies the code, though it is not sufficient
to avoid the warning by itself.

Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 fs/coredump.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index e2611fb1f254..c46e3996ff91 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
 	return nr;
 }
 
-static int coredump_wait(int exit_code, struct core_state *core_state)
+static int coredump_wait(int exit_code)
 {
 	struct task_struct *tsk = current;
+	struct core_state core_state;
 	int core_waiters = -EBUSY;
 
-	init_completion(&core_state->startup);
-	core_state->dumper.task = tsk;
-	core_state->dumper.next = NULL;
+	init_completion(&core_state.startup);
+	core_state.dumper.task = tsk;
+	core_state.dumper.next = NULL;
 
-	core_waiters = zap_threads(tsk, core_state, exit_code);
+	core_waiters = zap_threads(tsk, &core_state, exit_code);
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
 
-		wait_for_completion_state(&core_state->startup,
+		wait_for_completion_state(&core_state.startup,
 					  TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
 		/*
 		 * Wait for all the threads to become inactive, so that
 		 * all the thread context (extended register state, like
 		 * fpu etc) gets copied to the memory.
 		 */
-		ptr = core_state->dumper.next;
+		ptr = core_state.dumper.next;
 		while (ptr != NULL) {
 			wait_task_inactive(ptr->task, TASK_ANY);
 			ptr = ptr->next;
@@ -858,7 +859,7 @@ static bool coredump_sock_request(struct core_name *cn, struct coredump_params *
 	return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
 }
 
-static bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
+static noinline_for_stack bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
 {
 	if (!coredump_sock_connect(cn, cprm))
 		return false;
@@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
 {
 	struct cred *cred __free(put_cred) = NULL;
 	size_t *argv __free(kfree) = NULL;
-	struct core_state core_state;
 	struct core_name cn;
 	struct mm_struct *mm = current->mm;
 	struct linux_binfmt *binfmt = mm->binfmt;
@@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
 	if (coredump_force_suid_safe(&cprm))
 		cred->fsuid = GLOBAL_ROOT_UID;
 
-	if (coredump_wait(siginfo->si_signo, &core_state) < 0)
+	if (coredump_wait(siginfo->si_signo) < 0)
 		return;
 
 	old_cred = override_creds(cred);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
  2025-06-20 11:21 [PATCH] coredump: reduce stack usage in vfs_coredump() Arnd Bergmann
@ 2025-06-20 16:49 ` Alexander Mikhalitsyn
  2025-06-23 10:36 ` Christian Brauner
       [not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Alexander Mikhalitsyn @ 2025-06-20 16:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Alexander Viro, Christian Brauner, Arnd Bergmann, Jan Kara,
	Jann Horn, Luca Boccassi, Jeff Layton, Roman Kisel, linux-fsdevel,
	linux-kernel

Am Fr., 20. Juni 2025 um 13:21 Uhr schrieb Arnd Bergmann <arnd@kernel.org>:
>
> From: Arnd Bergmann <arnd@arndb.de>
>
> The newly added socket coredump code runs into some corner cases
> with KASAN that end up needing a lot of stack space:
>
> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
>
> Mark the socket helper function as noinline_for_stack so its stack
> usage does not leak out to the other code paths. This also seems to
> help with register pressure, and the resulting combined stack usage of
> vfs_coredump() and coredump_socket() is actually lower than the inlined
> version.
>
> Moving the core_state variable into coredump_wait() helps reduce the
> stack usage further and simplifies the code, though it is not sufficient
> to avoid the warning by itself.
>
> Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

> ---
>  fs/coredump.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index e2611fb1f254..c46e3996ff91 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
>         return nr;
>  }
>
> -static int coredump_wait(int exit_code, struct core_state *core_state)
> +static int coredump_wait(int exit_code)
>  {
>         struct task_struct *tsk = current;
> +       struct core_state core_state;
>         int core_waiters = -EBUSY;
>
> -       init_completion(&core_state->startup);
> -       core_state->dumper.task = tsk;
> -       core_state->dumper.next = NULL;
> +       init_completion(&core_state.startup);
> +       core_state.dumper.task = tsk;
> +       core_state.dumper.next = NULL;
>
> -       core_waiters = zap_threads(tsk, core_state, exit_code);
> +       core_waiters = zap_threads(tsk, &core_state, exit_code);
>         if (core_waiters > 0) {
>                 struct core_thread *ptr;
>
> -               wait_for_completion_state(&core_state->startup,
> +               wait_for_completion_state(&core_state.startup,
>                                           TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
>                 /*
>                  * Wait for all the threads to become inactive, so that
>                  * all the thread context (extended register state, like
>                  * fpu etc) gets copied to the memory.
>                  */
> -               ptr = core_state->dumper.next;
> +               ptr = core_state.dumper.next;
>                 while (ptr != NULL) {
>                         wait_task_inactive(ptr->task, TASK_ANY);
>                         ptr = ptr->next;
> @@ -858,7 +859,7 @@ static bool coredump_sock_request(struct core_name *cn, struct coredump_params *
>         return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
>  }
>
> -static bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
> +static noinline_for_stack bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
>  {
>         if (!coredump_sock_connect(cn, cprm))
>                 return false;
> @@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>  {
>         struct cred *cred __free(put_cred) = NULL;
>         size_t *argv __free(kfree) = NULL;
> -       struct core_state core_state;
>         struct core_name cn;
>         struct mm_struct *mm = current->mm;
>         struct linux_binfmt *binfmt = mm->binfmt;
> @@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>         if (coredump_force_suid_safe(&cprm))
>                 cred->fsuid = GLOBAL_ROOT_UID;
>
> -       if (coredump_wait(siginfo->si_signo, &core_state) < 0)
> +       if (coredump_wait(siginfo->si_signo) < 0)
>                 return;
>
>         old_cred = override_creds(cred);
> --
> 2.39.5
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
  2025-06-20 11:21 [PATCH] coredump: reduce stack usage in vfs_coredump() Arnd Bergmann
  2025-06-20 16:49 ` Alexander Mikhalitsyn
@ 2025-06-23 10:36 ` Christian Brauner
       [not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2025-06-23 10:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christian Brauner, Arnd Bergmann, Jan Kara, Alexander Mikhalitsyn,
	Jann Horn, Luca Boccassi, Jeff Layton, Roman Kisel, linux-fsdevel,
	linux-kernel, Alexander Viro

On Fri, 20 Jun 2025 13:21:01 +0200, Arnd Bergmann wrote:
> The newly added socket coredump code runs into some corner cases
> with KASAN that end up needing a lot of stack space:
> 
> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
> 
> Mark the socket helper function as noinline_for_stack so its stack
> usage does not leak out to the other code paths. This also seems to
> help with register pressure, and the resulting combined stack usage of
> vfs_coredump() and coredump_socket() is actually lower than the inlined
> version.
> 
> [...]

Applied to the vfs-6.17.coredump branch of the vfs/vfs.git tree.
Patches in the vfs-6.17.coredump branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.17.coredump

[1/1] coredump: reduce stack usage in vfs_coredump()
      https://git.kernel.org/vfs/vfs/c/fb82645d3f72

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
       [not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
@ 2025-06-25 11:41   ` Marek Szyprowski
       [not found]     ` <CGME20250625115426eucas1p17398cfcd215befcd3eafe0cac44b33a7@eucas1p1.samsung.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Szyprowski @ 2025-06-25 11:41 UTC (permalink / raw)
  To: Arnd Bergmann, Alexander Viro, Christian Brauner
  Cc: Arnd Bergmann, Jan Kara, Alexander Mikhalitsyn, Jann Horn,
	Luca Boccassi, Jeff Layton, Roman Kisel, linux-fsdevel,
	linux-kernel

Hi,

On 20.06.2025 13:21, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
>
> The newly added socket coredump code runs into some corner cases
> with KASAN that end up needing a lot of stack space:
>
> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
>
> Mark the socket helper function as noinline_for_stack so its stack
> usage does not leak out to the other code paths. This also seems to
> help with register pressure, and the resulting combined stack usage of
> vfs_coredump() and coredump_socket() is actually lower than the inlined
> version.
>
> Moving the core_state variable into coredump_wait() helps reduce the
> stack usage further and simplifies the code, though it is not sufficient
> to avoid the warning by itself.
>
> Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

This change appears in today's linux-next (next-20250625) as commit 
fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my 
tests I found that it causes a kernel oops on some of my ARM 32bit 
Exynos based boards. This is really strange, because I don't see any 
obvious problem in this patch. Reverting $subject on top of linux-next 
hides/fixes the oops. I suspect some kind of use-after-free issue, but I 
cannot point anything related. Here is the kernel log from one of the 
affected boards (I've intentionally kept the register and stack dumps):

8<--- cut here ---
Unable to handle kernel paging request at virtual address c10ba370 when 
write
[c10ba370] *pgd=4101941e(bad)
Internal error: Oops: 80d [#1] SMP ARM
Modules linked in: cmac bnep mwifiex_sdio mwifiex btmrvl_sdio btmrvl 
sha256 libsha256_generic bluetooth cfg80211 exynos_gsc v4l2_mem2mem 
s5p_mfc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 
videobuf2_common ecdh_generic ecc videodev mc s5p_cec
CPU: 1 UID: 0 PID: 1367 Comm: cgm-release-age Not tainted 
6.16.0-rc3-next-20250625 #10627 PREEMPT
Hardware name: Samsung Exynos (Flattened Device Tree)
PC is at vfs_coredump+0x294/0x17bc
LR is at _raw_spin_unlock_irq+0x20/0x50
pc : [<c03c4534>]    lr : [<c0cf096c>]    psr: a0000013
sp : f1249dd0  ip : 00000000  fp : f1249e68
r10: c1ae41e4  r9 : 00000000  r8 : c13aaa74
r7 : 00000000  r6 : 63726f46  r5 : c2a3c000  r4 : c3852080
r3 : c10ba370  r2 : c3852080  r1 : ffffffff  r0 : 00006325
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 10c5387d  Table: 4324c06a  DAC: 00000051
Register r0 information: non-paged memory
Register r1 information: non-paged memory
Register r2 information: slab task_struct start c3852080 pointer offset 
0 size 4160
Register r3 information: non-slab/vmalloc memory
Register r4 information: slab task_struct start c3852080 pointer offset 
0 size 4160
Register r5 information: slab kmalloc-128 start c2a3c000 pointer offset 
0 size 128
Register r6 information: non-paged memory
Register r7 information: NULL pointer
Register r8 information: non-slab/vmalloc memory
Register r9 information: NULL pointer
Register r10 information: non-slab/vmalloc memory
Register r11 information: 2-page vmalloc region starting at 0xf1248000 
allocated at kernel_clone+0x58/0x3f4
Register r12 information: NULL pointer
Process cgm-release-age (pid: 1367, stack limit = 0x4909c75e)
Stack: (0xf1249dd0 to 0xf124a000)
9dc0:                                     c3852830 00000000 c3852080 
c01ae7fc
9de0: c13095a8 c45c8600 c2a3c000 00000000 00000000 c014ab28 c102c84c 
c104be90
9e00: c13095a8 c1496b58 ffffffff 00000081 00000000 6fda5e52 c127e2ec 
00000000
9e20: c2a3c280 00000004 00000080 00000000 c3852800 00000001 00000001 
00000000
9e40: c2855d90 00000000 c3852830 c0ce2874 c13133c8 c3852080 c13133e0 
c1472911
9e60: c1cc6480 ef0484e0 f1249f60 00000000 00000000 800000cd 00000001 
00000000
9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
9ea0: 00000000 00000000 00000006 c01450a8 f1249f4c 00000001 c12849dc 
6fda5e52
9ec0: c0cf096c c3852080 00000005 00000006 400004d8 0001e000 c1309154 
c2855d80
9ee0: f1249f60 c014ab28 00000000 c02eeb48 00000000 c3852830 00000000 
ce4d1c00
9f00: f1249f4c c13095a8 c127e2ec c2855d90 00000000 6fda5e52 c12849dc 
c3852080
9f20: f1249fb0 00000000 f1249f4c 5ac3c35a b6d21668 c3852730 b6d2166c 
c011a430
9f40: 00000000 00000002 f1249f74 c0cf096c bee9b9d4 c0148328 f1249f7c 
00000000
9f60: 00000006 00000000 fffffffa 00000557 00000000 00000000 00000000 
00000000
9f80: 00000000 6fda5e52 00000000 00000000 b6f43968 bee9b9d4 000000af 
c0100290
9fa0: c3852080 000000af 00000000 c0100088 00000000 bee9b9d4 00000000 
00000008
9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
00000000
9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002 00000000 
00000000
Call trace:
  vfs_coredump from get_signal+0x990/0xd9c
  get_signal from do_work_pending+0x118/0x588
  do_work_pending from slow_work_pending+0xc/0x24
Exception stack(0xf1249fb0 to 0xf1249ff8)
9fa0:                                     00000000 bee9b9d4 00000000 
00000008
9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
00000000
9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002
Code: e1a03006 e5966004 e5930000 f57ff05b (e5837000)
---[ end trace 0000000000000000 ]---
note: cgm-release-age[1367] exited with irqs disabled


> ---
>   fs/coredump.c | 20 ++++++++++----------
>   1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/fs/coredump.c b/fs/coredump.c
> index e2611fb1f254..c46e3996ff91 100644
> --- a/fs/coredump.c
> +++ b/fs/coredump.c
> @@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
>   	return nr;
>   }
>   
> -static int coredump_wait(int exit_code, struct core_state *core_state)
> +static int coredump_wait(int exit_code)
>   {
>   	struct task_struct *tsk = current;
> +	struct core_state core_state;
>   	int core_waiters = -EBUSY;
>   
> -	init_completion(&core_state->startup);
> -	core_state->dumper.task = tsk;
> -	core_state->dumper.next = NULL;
> +	init_completion(&core_state.startup);
> +	core_state.dumper.task = tsk;
> +	core_state.dumper.next = NULL;
>   
> -	core_waiters = zap_threads(tsk, core_state, exit_code);
> +	core_waiters = zap_threads(tsk, &core_state, exit_code);
>   	if (core_waiters > 0) {
>   		struct core_thread *ptr;
>   
> -		wait_for_completion_state(&core_state->startup,
> +		wait_for_completion_state(&core_state.startup,
>   					  TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
>   		/*
>   		 * Wait for all the threads to become inactive, so that
>   		 * all the thread context (extended register state, like
>   		 * fpu etc) gets copied to the memory.
>   		 */
> -		ptr = core_state->dumper.next;
> +		ptr = core_state.dumper.next;
>   		while (ptr != NULL) {
>   			wait_task_inactive(ptr->task, TASK_ANY);
>   			ptr = ptr->next;
> @@ -858,7 +859,7 @@ static bool coredump_sock_request(struct core_name *cn, struct coredump_params *
>   	return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
>   }
>   
> -static bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
> +static noinline_for_stack bool coredump_socket(struct core_name *cn, struct coredump_params *cprm)
>   {
>   	if (!coredump_sock_connect(cn, cprm))
>   		return false;
> @@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>   {
>   	struct cred *cred __free(put_cred) = NULL;
>   	size_t *argv __free(kfree) = NULL;
> -	struct core_state core_state;
>   	struct core_name cn;
>   	struct mm_struct *mm = current->mm;
>   	struct linux_binfmt *binfmt = mm->binfmt;
> @@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>   	if (coredump_force_suid_safe(&cprm))
>   		cred->fsuid = GLOBAL_ROOT_UID;
>   
> -	if (coredump_wait(siginfo->si_signo, &core_state) < 0)
> +	if (coredump_wait(siginfo->si_signo) < 0)
>   		return;
>   
>   	old_cred = override_creds(cred);

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
       [not found]     ` <CGME20250625115426eucas1p17398cfcd215befcd3eafe0cac44b33a7@eucas1p1.samsung.com>
@ 2025-06-25 11:54       ` Marek Szyprowski
  2025-06-25 13:29         ` Arnd Bergmann
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Szyprowski @ 2025-06-25 11:54 UTC (permalink / raw)
  To: Arnd Bergmann, Alexander Viro, Christian Brauner
  Cc: Arnd Bergmann, Jan Kara, Alexander Mikhalitsyn, Jann Horn,
	Luca Boccassi, Jeff Layton, Roman Kisel, linux-fsdevel,
	linux-kernel

On 25.06.2025 13:41, Marek Szyprowski wrote:
>
> On 20.06.2025 13:21, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> The newly added socket coredump code runs into some corner cases
>> with KASAN that end up needing a lot of stack space:
>>
>> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger 
>> than 1280 bytes [-Werror=frame-larger-than=]
>>
>> Mark the socket helper function as noinline_for_stack so its stack
>> usage does not leak out to the other code paths. This also seems to
>> help with register pressure, and the resulting combined stack usage of
>> vfs_coredump() and coredump_socket() is actually lower than the inlined
>> version.
>>
>> Moving the core_state variable into coredump_wait() helps reduce the
>> stack usage further and simplifies the code, though it is not sufficient
>> to avoid the warning by itself.
>>
>> Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> This change appears in today's linux-next (next-20250625) as commit 
> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my 
> tests I found that it causes a kernel oops on some of my ARM 32bit 
> Exynos based boards. This is really strange, because I don't see any 
> obvious problem in this patch. Reverting $subject on top of linux-next 
> hides/fixes the oops. I suspect some kind of use-after-free issue, but 
> I cannot point anything related. Here is the kernel log from one of 
> the affected boards (I've intentionally kept the register and stack 
> dumps):

I've just checked once again and found the source of the issue. 
vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(), 
which performs the following dereference:

next = current->signal->core_state->dumper.next

of the core_state assigned in zap_threads() called from coredump_wait(). 
It looks that core_state cannot be moved into coredump_wait() without 
refactoring/cleaning this first.


>
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address c10ba370 
> when write
> [c10ba370] *pgd=4101941e(bad)
> Internal error: Oops: 80d [#1] SMP ARM
> Modules linked in: cmac bnep mwifiex_sdio mwifiex btmrvl_sdio btmrvl 
> sha256 libsha256_generic bluetooth cfg80211 exynos_gsc v4l2_mem2mem 
> s5p_mfc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 
> videobuf2_common ecdh_generic ecc videodev mc s5p_cec
> CPU: 1 UID: 0 PID: 1367 Comm: cgm-release-age Not tainted 
> 6.16.0-rc3-next-20250625 #10627 PREEMPT
> Hardware name: Samsung Exynos (Flattened Device Tree)
> PC is at vfs_coredump+0x294/0x17bc
> LR is at _raw_spin_unlock_irq+0x20/0x50
> pc : [<c03c4534>]    lr : [<c0cf096c>]    psr: a0000013
> sp : f1249dd0  ip : 00000000  fp : f1249e68
> r10: c1ae41e4  r9 : 00000000  r8 : c13aaa74
> r7 : 00000000  r6 : 63726f46  r5 : c2a3c000  r4 : c3852080
> r3 : c10ba370  r2 : c3852080  r1 : ffffffff  r0 : 00006325
> Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 4324c06a  DAC: 00000051
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: slab task_struct start c3852080 pointer 
> offset 0 size 4160
> Register r3 information: non-slab/vmalloc memory
> Register r4 information: slab task_struct start c3852080 pointer 
> offset 0 size 4160
> Register r5 information: slab kmalloc-128 start c2a3c000 pointer 
> offset 0 size 128
> Register r6 information: non-paged memory
> Register r7 information: NULL pointer
> Register r8 information: non-slab/vmalloc memory
> Register r9 information: NULL pointer
> Register r10 information: non-slab/vmalloc memory
> Register r11 information: 2-page vmalloc region starting at 0xf1248000 
> allocated at kernel_clone+0x58/0x3f4
> Register r12 information: NULL pointer
> Process cgm-release-age (pid: 1367, stack limit = 0x4909c75e)
> Stack: (0xf1249dd0 to 0xf124a000)
> 9dc0:                                     c3852830 00000000 c3852080 
> c01ae7fc
> 9de0: c13095a8 c45c8600 c2a3c000 00000000 00000000 c014ab28 c102c84c 
> c104be90
> 9e00: c13095a8 c1496b58 ffffffff 00000081 00000000 6fda5e52 c127e2ec 
> 00000000
> 9e20: c2a3c280 00000004 00000080 00000000 c3852800 00000001 00000001 
> 00000000
> 9e40: c2855d90 00000000 c3852830 c0ce2874 c13133c8 c3852080 c13133e0 
> c1472911
> 9e60: c1cc6480 ef0484e0 f1249f60 00000000 00000000 800000cd 00000001 
> 00000000
> 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> 9ea0: 00000000 00000000 00000006 c01450a8 f1249f4c 00000001 c12849dc 
> 6fda5e52
> 9ec0: c0cf096c c3852080 00000005 00000006 400004d8 0001e000 c1309154 
> c2855d80
> 9ee0: f1249f60 c014ab28 00000000 c02eeb48 00000000 c3852830 00000000 
> ce4d1c00
> 9f00: f1249f4c c13095a8 c127e2ec c2855d90 00000000 6fda5e52 c12849dc 
> c3852080
> 9f20: f1249fb0 00000000 f1249f4c 5ac3c35a b6d21668 c3852730 b6d2166c 
> c011a430
> 9f40: 00000000 00000002 f1249f74 c0cf096c bee9b9d4 c0148328 f1249f7c 
> 00000000
> 9f60: 00000006 00000000 fffffffa 00000557 00000000 00000000 00000000 
> 00000000
> 9f80: 00000000 6fda5e52 00000000 00000000 b6f43968 bee9b9d4 000000af 
> c0100290
> 9fa0: c3852080 000000af 00000000 c0100088 00000000 bee9b9d4 00000000 
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002 00000000 
> 00000000
> Call trace:
>  vfs_coredump from get_signal+0x990/0xd9c
>  get_signal from do_work_pending+0x118/0x588
>  do_work_pending from slow_work_pending+0xc/0x24
> Exception stack(0xf1249fb0 to 0xf1249ff8)
> 9fa0:                                     00000000 bee9b9d4 00000000 
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002
> Code: e1a03006 e5966004 e5930000 f57ff05b (e5837000)
> ---[ end trace 0000000000000000 ]---
> note: cgm-release-age[1367] exited with irqs disabled
>
>
>> ---
>>   fs/coredump.c | 20 ++++++++++----------
>>   1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index e2611fb1f254..c46e3996ff91 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
>>       return nr;
>>   }
>>   -static int coredump_wait(int exit_code, struct core_state 
>> *core_state)
>> +static int coredump_wait(int exit_code)
>>   {
>>       struct task_struct *tsk = current;
>> +    struct core_state core_state;
>>       int core_waiters = -EBUSY;
>>   -    init_completion(&core_state->startup);
>> -    core_state->dumper.task = tsk;
>> -    core_state->dumper.next = NULL;
>> +    init_completion(&core_state.startup);
>> +    core_state.dumper.task = tsk;
>> +    core_state.dumper.next = NULL;
>>   -    core_waiters = zap_threads(tsk, core_state, exit_code);
>> +    core_waiters = zap_threads(tsk, &core_state, exit_code);
>>       if (core_waiters > 0) {
>>           struct core_thread *ptr;
>>   - wait_for_completion_state(&core_state->startup,
>> +        wait_for_completion_state(&core_state.startup,
>>                         TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
>>           /*
>>            * Wait for all the threads to become inactive, so that
>>            * all the thread context (extended register state, like
>>            * fpu etc) gets copied to the memory.
>>            */
>> -        ptr = core_state->dumper.next;
>> +        ptr = core_state.dumper.next;
>>           while (ptr != NULL) {
>>               wait_task_inactive(ptr->task, TASK_ANY);
>>               ptr = ptr->next;
>> @@ -858,7 +859,7 @@ static bool coredump_sock_request(struct 
>> core_name *cn, struct coredump_params *
>>       return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
>>   }
>>   -static bool coredump_socket(struct core_name *cn, struct 
>> coredump_params *cprm)
>> +static noinline_for_stack bool coredump_socket(struct core_name *cn, 
>> struct coredump_params *cprm)
>>   {
>>       if (!coredump_sock_connect(cn, cprm))
>>           return false;
>> @@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>>   {
>>       struct cred *cred __free(put_cred) = NULL;
>>       size_t *argv __free(kfree) = NULL;
>> -    struct core_state core_state;
>>       struct core_name cn;
>>       struct mm_struct *mm = current->mm;
>>       struct linux_binfmt *binfmt = mm->binfmt;
>> @@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>>       if (coredump_force_suid_safe(&cprm))
>>           cred->fsuid = GLOBAL_ROOT_UID;
>>   -    if (coredump_wait(siginfo->si_signo, &core_state) < 0)
>> +    if (coredump_wait(siginfo->si_signo) < 0)
>>           return;
>>         old_cred = override_creds(cred);
>
> Best regards

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
  2025-06-25 11:54       ` Marek Szyprowski
@ 2025-06-25 13:29         ` Arnd Bergmann
  2025-06-26  6:22           ` Venkat Rao Bagalkote
  2025-06-26  8:19           ` Christian Brauner
  0 siblings, 2 replies; 8+ messages in thread
From: Arnd Bergmann @ 2025-06-25 13:29 UTC (permalink / raw)
  To: Marek Szyprowski, Arnd Bergmann, Alexander Viro,
	Christian Brauner
  Cc: Jan Kara, Alexander Mikhalitsyn, Jann Horn, Luca Boccassi,
	Jeff Layton, Roman Kisel, linux-fsdevel, linux-kernel

On Wed, Jun 25, 2025, at 13:54, Marek Szyprowski wrote:
> On 25.06.2025 13:41, Marek Szyprowski wrote:
>>
>> This change appears in today's linux-next (next-20250625) as commit 
>> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my 
>> tests I found that it causes a kernel oops on some of my ARM 32bit 
>> Exynos based boards. This is really strange, because I don't see any 
>> obvious problem in this patch. Reverting $subject on top of linux-next 
>> hides/fixes the oops. I suspect some kind of use-after-free issue, but 
>> I cannot point anything related. Here is the kernel log from one of 
>> the affected boards (I've intentionally kept the register and stack 
>> dumps):
>
> I've just checked once again and found the source of the issue. 
> vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(), 
> which performs the following dereference:
>
> next = current->signal->core_state->dumper.next
>
> of the core_state assigned in zap_threads() called from coredump_wait(). 
> It looks that core_state cannot be moved into coredump_wait() without 
> refactoring/cleaning this first.

Thanks for the analysis, I agree that this can't work and my patch
just needs to be dropped. The 'noinline_for_stack' change on
its own is probably sufficient to avoid the warning, and I can
respin a new version after more build testing.

     Arnd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
  2025-06-25 13:29         ` Arnd Bergmann
@ 2025-06-26  6:22           ` Venkat Rao Bagalkote
  2025-06-26  8:19           ` Christian Brauner
  1 sibling, 0 replies; 8+ messages in thread
From: Venkat Rao Bagalkote @ 2025-06-26  6:22 UTC (permalink / raw)
  To: Arnd Bergmann, Marek Szyprowski, Arnd Bergmann, Alexander Viro,
	Christian Brauner
  Cc: Jan Kara, Alexander Mikhalitsyn, Jann Horn, Luca Boccassi,
	Jeff Layton, Roman Kisel, linux-fsdevel, linux-kernel


On 25/06/25 6:59 pm, Arnd Bergmann wrote:
> On Wed, Jun 25, 2025, at 13:54, Marek Szyprowski wrote:
>> On 25.06.2025 13:41, Marek Szyprowski wrote:
>>> This change appears in today's linux-next (next-20250625) as commit
>>> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my
>>> tests I found that it causes a kernel oops on some of my ARM 32bit
>>> Exynos based boards. This is really strange, because I don't see any
>>> obvious problem in this patch. Reverting $subject on top of linux-next
>>> hides/fixes the oops. I suspect some kind of use-after-free issue, but
>>> I cannot point anything related. Here is the kernel log from one of
>>> the affected boards (I've intentionally kept the register and stack
>>> dumps):
>> I've just checked once again and found the source of the issue.
>> vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(),
>> which performs the following dereference:
>>
>> next = current->signal->core_state->dumper.next
>>
>> of the core_state assigned in zap_threads() called from coredump_wait().
>> It looks that core_state cannot be moved into coredump_wait() without
>> refactoring/cleaning this first.


IBM CI has also reported the similar crash, while running ./check 
tests/generic/228 from xfstests. This issue is observed on both xfs and 
ext4.


Traces:


[28956.438544] run fstests generic/228 at 2025-06-26 01:02:28
[28956.806452] coredump: 4746(sysctl): Unsafe core_pattern used with 
fs.suid_dumpable=2: pipe handler or fully qualified core dump path 
required. Set kernel.core_pattern before fs.suid_dumpable.
[28956.809279] BUG: Unable to handle kernel data access at 
0x3437342e65727d2f
[28956.809287] Faulting instruction address: 0xc0000000010fe718
[28956.809292] Oops: Kernel access of bad area, sig: 11 [#1]
[28956.809297] LE PAGE_SIZE=64K MMU=Hash  SMP NR_CPUS=8192 NUMA pSeries
[28956.809303] Modules linked in: loop nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat bonding nf_conntrack 
nf_defrag_ipv6 tls nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink 
pseries_rng vmx_crypto xfs sr_mod cdrom sd_mod sg ibmvscsi ibmveth 
scsi_transport_srp fuse
[28956.809347] CPU: 25 UID: 0 PID: 4748 Comm: xfs_io Kdump: loaded Not 
tainted 6.16.0-rc3-next-20250625 #1 VOLUNTARY
[28956.809355] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202 
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[28956.809360] NIP:  c0000000010fe718 LR: c0000000001d0d20 CTR: 
0000000000000000
[28956.809365] REGS: c00000009a80f720 TRAP: 0380   Not tainted 
(6.16.0-rc3-next-20250625)
[28956.809370] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 88008844  XER: 20040000
[28956.809385] CFAR: c0000000001d0d1c IRQMASK: 1
[28956.809385] GPR00: c0000000001d0d20 c00000009a80f9c0 c000000001648100 
3437342e65727d2f
[28956.809385] GPR04: 0000000000000003 0000000000000000 0000000000000000 
fffffffffffe0000
[28956.809385] GPR08: c0000000c97baa00 0000000000000033 c0000000b9039000 
0000000000008000
[28956.809385] GPR12: c0000000001dd158 c000000017f91300 0000000000000000 
0000000000000000
[28956.809385] GPR16: 0000000000000000 0000000000000018 c0000000b9039000 
c0000000b9039d60
[28956.809385] GPR20: c0000000b9039080 c0000000b9039d48 0000000000040100 
0000000000000001
[28956.809385] GPR24: 0000000008430000 c00000009a80fd30 c0000000c97baa00 
c000000002baf820
[28956.809385] GPR28: 3437342e65727d2f 0000000000000000 0000000000000003 
0000000000000000
[28956.809444] NIP [c0000000010fe718] _raw_spin_lock_irqsave+0x34/0xb0
[28956.809452] LR [c0000000001d0d20] try_to_wake_up+0x6c/0x828
[28956.809459] Call Trace:
[28956.809462] [c00000009a80f9c0] [c00000009a80fa10] 0xc00000009a80fa10 
(unreliable)
[28956.809469] [c00000009a80f9f0] [0000000000000000] 0x0
[28956.809474] [c00000009a80fa80] [c0000000006f1958] 
vfs_coredump+0x254/0x5c8
[28956.809481] [c00000009a80fbf0] [c00000000018cf3c] get_signal+0x454/0xb64
[28956.809488] [c00000009a80fcf0] [c00000000002188c] do_signal+0x7c/0x324
[28956.809496] [c00000009a80fd90] [c000000000022a00] 
do_notify_resume+0xb0/0x13c
[28956.809502] [c00000009a80fdc0] [c000000000032508] 
interrupt_exit_user_prepare_main+0x1ac/0x264
[28956.809510] [c00000009a80fe20] [c000000000032710] 
syscall_exit_prepare+0x150/0x178
[28956.809516] [c00000009a80fe50] [c00000000000d068] 
system_call_vectored_common+0x168/0x2ec
[28956.809525] ---- interrupt: 3000 at 0x7fff82b24bf4
[28956.809529] NIP:  00007fff82b24bf4 LR: 00007fff82b24bf4 CTR: 
0000000000000000
[28956.809534] REGS: c00000009a80fe80 TRAP: 3000   Not tainted 
(6.16.0-rc3-next-20250625)
[28956.809538] MSR:  800000000280f033 
<SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48004802  XER: 00000000
[28956.809554] IRQMASK: 0
[28956.809554] GPR00: 0000000000000135 00007ffffe2ecf50 00007fff82c37200 
ffffffffffffffe5
[28956.809554] GPR04: 0000000000000000 0000000000000000 0000000006500000 
00007fff82e3e120
[28956.809554] GPR08: 00007fff82e369e8 0000000000000000 0000000000000000 
0000000000000000
[28956.809554] GPR12: 0000000000000000 00007fff82e3e120 0000000000000000 
0000000000000000
[28956.809554] GPR16: 0000000000000000 0000000000000000 0000000000000000 
0000000000000000
[28956.809554] GPR20: 0000000000000000 0000000000000000 0000000000000000 
0000000000000001
[28956.809554] GPR24: 0000010009812f10 0000000000000000 0000000000000001 
0000000123099fe8
[28956.809554] GPR28: 0000000000000000 0000000000000003 0000000000000000 
0000000006500000
[28956.809610] NIP [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809614] LR [00007fff82b24bf4] 0x7fff82b24bf4
[28956.809618] ---- interrupt: 3000
[28956.809621] Code: 38429a1c 7c0802a6 60000000 fbe1fff8 f821ffd1 
8bed0932 63e90001 992d0932 a12d0008 3ce0fffe 5529083c 61290001 
<7d001829> 7d063879 40c20018 7d063838
[28956.809641] ---[ end trace 0000000000000000 ]---
[28956.812734] pstore: backend (nvram) writing error (-1)


If you happen to fix this, please add below tag.


Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>


Regards,

Venkat.

> Thanks for the analysis, I agree that this can't work and my patch
> just needs to be dropped. The 'noinline_for_stack' change on
> its own is probably sufficient to avoid the warning, and I can
> respin a new version after more build testing.
>
>       Arnd
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
  2025-06-25 13:29         ` Arnd Bergmann
  2025-06-26  6:22           ` Venkat Rao Bagalkote
@ 2025-06-26  8:19           ` Christian Brauner
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Brauner @ 2025-06-26  8:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Marek Szyprowski, Arnd Bergmann, Alexander Viro, Jan Kara,
	Alexander Mikhalitsyn, Jann Horn, Luca Boccassi, Jeff Layton,
	Roman Kisel, linux-fsdevel, linux-kernel

On Wed, Jun 25, 2025 at 03:29:50PM +0200, Arnd Bergmann wrote:
> On Wed, Jun 25, 2025, at 13:54, Marek Szyprowski wrote:
> > On 25.06.2025 13:41, Marek Szyprowski wrote:
> >>
> >> This change appears in today's linux-next (next-20250625) as commit 
> >> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my 
> >> tests I found that it causes a kernel oops on some of my ARM 32bit 
> >> Exynos based boards. This is really strange, because I don't see any 
> >> obvious problem in this patch. Reverting $subject on top of linux-next 
> >> hides/fixes the oops. I suspect some kind of use-after-free issue, but 
> >> I cannot point anything related. Here is the kernel log from one of 
> >> the affected boards (I've intentionally kept the register and stack 
> >> dumps):
> >
> > I've just checked once again and found the source of the issue. 
> > vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(), 
> > which performs the following dereference:
> >
> > next = current->signal->core_state->dumper.next
> >
> > of the core_state assigned in zap_threads() called from coredump_wait(). 
> > It looks that core_state cannot be moved into coredump_wait() without 
> > refactoring/cleaning this first.
> 
> Thanks for the analysis, I agree that this can't work and my patch
> just needs to be dropped. The 'noinline_for_stack' change on
> its own is probably sufficient to avoid the warning, and I can
> respin a new version after more build testing.

@Arnd, I've dropped the previous patch. I'll wait for you to respin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-26  8:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-20 11:21 [PATCH] coredump: reduce stack usage in vfs_coredump() Arnd Bergmann
2025-06-20 16:49 ` Alexander Mikhalitsyn
2025-06-23 10:36 ` Christian Brauner
     [not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
2025-06-25 11:41   ` Marek Szyprowski
     [not found]     ` <CGME20250625115426eucas1p17398cfcd215befcd3eafe0cac44b33a7@eucas1p1.samsung.com>
2025-06-25 11:54       ` Marek Szyprowski
2025-06-25 13:29         ` Arnd Bergmann
2025-06-26  6:22           ` Venkat Rao Bagalkote
2025-06-26  8:19           ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).