From: Marek Szyprowski <m.szyprowski@samsung.com>
To: Arnd Bergmann <arnd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>, Jan Kara <jack@suse.cz>,
Alexander Mikhalitsyn <alexander@mihalicyn.com>,
Jann Horn <jannh@google.com>,
Luca Boccassi <luca.boccassi@gmail.com>,
Jeff Layton <jlayton@kernel.org>,
Roman Kisel <romank@linux.microsoft.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
Date: Wed, 25 Jun 2025 13:54:25 +0200 [thread overview]
Message-ID: <8f080dc3-ef13-4d9a-8964-0c2b3395072e@samsung.com> (raw)
In-Reply-To: <404dfe9a-1f4f-4776-863a-d8bbe08335e2@samsung.com>
On 25.06.2025 13:41, Marek Szyprowski wrote:
>
> On 20.06.2025 13:21, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> The newly added socket coredump code runs into some corner cases
>> with KASAN that end up needing a lot of stack space:
>>
>> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger
>> than 1280 bytes [-Werror=frame-larger-than=]
>>
>> Mark the socket helper function as noinline_for_stack so its stack
>> usage does not leak out to the other code paths. This also seems to
>> help with register pressure, and the resulting combined stack usage of
>> vfs_coredump() and coredump_socket() is actually lower than the inlined
>> version.
>>
>> Moving the core_state variable into coredump_wait() helps reduce the
>> stack usage further and simplifies the code, though it is not sufficient
>> to avoid the warning by itself.
>>
>> Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> This change appears in today's linux-next (next-20250625) as commit
> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my
> tests I found that it causes a kernel oops on some of my ARM 32bit
> Exynos based boards. This is really strange, because I don't see any
> obvious problem in this patch. Reverting $subject on top of linux-next
> hides/fixes the oops. I suspect some kind of use-after-free issue, but
> I cannot point anything related. Here is the kernel log from one of
> the affected boards (I've intentionally kept the register and stack
> dumps):
I've just checked once again and found the source of the issue.
vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(),
which performs the following dereference:
next = current->signal->core_state->dumper.next
of the core_state assigned in zap_threads() called from coredump_wait().
It looks that core_state cannot be moved into coredump_wait() without
refactoring/cleaning this first.
>
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address c10ba370
> when write
> [c10ba370] *pgd=4101941e(bad)
> Internal error: Oops: 80d [#1] SMP ARM
> Modules linked in: cmac bnep mwifiex_sdio mwifiex btmrvl_sdio btmrvl
> sha256 libsha256_generic bluetooth cfg80211 exynos_gsc v4l2_mem2mem
> s5p_mfc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2
> videobuf2_common ecdh_generic ecc videodev mc s5p_cec
> CPU: 1 UID: 0 PID: 1367 Comm: cgm-release-age Not tainted
> 6.16.0-rc3-next-20250625 #10627 PREEMPT
> Hardware name: Samsung Exynos (Flattened Device Tree)
> PC is at vfs_coredump+0x294/0x17bc
> LR is at _raw_spin_unlock_irq+0x20/0x50
> pc : [<c03c4534>] lr : [<c0cf096c>] psr: a0000013
> sp : f1249dd0 ip : 00000000 fp : f1249e68
> r10: c1ae41e4 r9 : 00000000 r8 : c13aaa74
> r7 : 00000000 r6 : 63726f46 r5 : c2a3c000 r4 : c3852080
> r3 : c10ba370 r2 : c3852080 r1 : ffffffff r0 : 00006325
> Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> Control: 10c5387d Table: 4324c06a DAC: 00000051
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: slab task_struct start c3852080 pointer
> offset 0 size 4160
> Register r3 information: non-slab/vmalloc memory
> Register r4 information: slab task_struct start c3852080 pointer
> offset 0 size 4160
> Register r5 information: slab kmalloc-128 start c2a3c000 pointer
> offset 0 size 128
> Register r6 information: non-paged memory
> Register r7 information: NULL pointer
> Register r8 information: non-slab/vmalloc memory
> Register r9 information: NULL pointer
> Register r10 information: non-slab/vmalloc memory
> Register r11 information: 2-page vmalloc region starting at 0xf1248000
> allocated at kernel_clone+0x58/0x3f4
> Register r12 information: NULL pointer
> Process cgm-release-age (pid: 1367, stack limit = 0x4909c75e)
> Stack: (0xf1249dd0 to 0xf124a000)
> 9dc0: c3852830 00000000 c3852080
> c01ae7fc
> 9de0: c13095a8 c45c8600 c2a3c000 00000000 00000000 c014ab28 c102c84c
> c104be90
> 9e00: c13095a8 c1496b58 ffffffff 00000081 00000000 6fda5e52 c127e2ec
> 00000000
> 9e20: c2a3c280 00000004 00000080 00000000 c3852800 00000001 00000001
> 00000000
> 9e40: c2855d90 00000000 c3852830 c0ce2874 c13133c8 c3852080 c13133e0
> c1472911
> 9e60: c1cc6480 ef0484e0 f1249f60 00000000 00000000 800000cd 00000001
> 00000000
> 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 9ea0: 00000000 00000000 00000006 c01450a8 f1249f4c 00000001 c12849dc
> 6fda5e52
> 9ec0: c0cf096c c3852080 00000005 00000006 400004d8 0001e000 c1309154
> c2855d80
> 9ee0: f1249f60 c014ab28 00000000 c02eeb48 00000000 c3852830 00000000
> ce4d1c00
> 9f00: f1249f4c c13095a8 c127e2ec c2855d90 00000000 6fda5e52 c12849dc
> c3852080
> 9f20: f1249fb0 00000000 f1249f4c 5ac3c35a b6d21668 c3852730 b6d2166c
> c011a430
> 9f40: 00000000 00000002 f1249f74 c0cf096c bee9b9d4 c0148328 f1249f7c
> 00000000
> 9f60: 00000006 00000000 fffffffa 00000557 00000000 00000000 00000000
> 00000000
> 9f80: 00000000 6fda5e52 00000000 00000000 b6f43968 bee9b9d4 000000af
> c0100290
> 9fa0: c3852080 000000af 00000000 c0100088 00000000 bee9b9d4 00000000
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002 00000000
> 00000000
> Call trace:
> vfs_coredump from get_signal+0x990/0xd9c
> get_signal from do_work_pending+0x118/0x588
> do_work_pending from slow_work_pending+0xc/0x24
> Exception stack(0xf1249fb0 to 0xf1249ff8)
> 9fa0: 00000000 bee9b9d4 00000000
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002
> Code: e1a03006 e5966004 e5930000 f57ff05b (e5837000)
> ---[ end trace 0000000000000000 ]---
> note: cgm-release-age[1367] exited with irqs disabled
>
>
>> ---
>> fs/coredump.c | 20 ++++++++++----------
>> 1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index e2611fb1f254..c46e3996ff91 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
>> return nr;
>> }
>> -static int coredump_wait(int exit_code, struct core_state
>> *core_state)
>> +static int coredump_wait(int exit_code)
>> {
>> struct task_struct *tsk = current;
>> + struct core_state core_state;
>> int core_waiters = -EBUSY;
>> - init_completion(&core_state->startup);
>> - core_state->dumper.task = tsk;
>> - core_state->dumper.next = NULL;
>> + init_completion(&core_state.startup);
>> + core_state.dumper.task = tsk;
>> + core_state.dumper.next = NULL;
>> - core_waiters = zap_threads(tsk, core_state, exit_code);
>> + core_waiters = zap_threads(tsk, &core_state, exit_code);
>> if (core_waiters > 0) {
>> struct core_thread *ptr;
>> - wait_for_completion_state(&core_state->startup,
>> + wait_for_completion_state(&core_state.startup,
>> TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
>> /*
>> * Wait for all the threads to become inactive, so that
>> * all the thread context (extended register state, like
>> * fpu etc) gets copied to the memory.
>> */
>> - ptr = core_state->dumper.next;
>> + ptr = core_state.dumper.next;
>> while (ptr != NULL) {
>> wait_task_inactive(ptr->task, TASK_ANY);
>> ptr = ptr->next;
>> @@ -858,7 +859,7 @@ static bool coredump_sock_request(struct
>> core_name *cn, struct coredump_params *
>> return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
>> }
>> -static bool coredump_socket(struct core_name *cn, struct
>> coredump_params *cprm)
>> +static noinline_for_stack bool coredump_socket(struct core_name *cn,
>> struct coredump_params *cprm)
>> {
>> if (!coredump_sock_connect(cn, cprm))
>> return false;
>> @@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>> {
>> struct cred *cred __free(put_cred) = NULL;
>> size_t *argv __free(kfree) = NULL;
>> - struct core_state core_state;
>> struct core_name cn;
>> struct mm_struct *mm = current->mm;
>> struct linux_binfmt *binfmt = mm->binfmt;
>> @@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>> if (coredump_force_suid_safe(&cprm))
>> cred->fsuid = GLOBAL_ROOT_UID;
>> - if (coredump_wait(siginfo->si_signo, &core_state) < 0)
>> + if (coredump_wait(siginfo->si_signo) < 0)
>> return;
>> old_cred = override_creds(cred);
>
> Best regards
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
next prev parent reply other threads:[~2025-06-25 11:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-20 11:21 [PATCH] coredump: reduce stack usage in vfs_coredump() Arnd Bergmann
2025-06-20 16:49 ` Alexander Mikhalitsyn
2025-06-23 10:36 ` Christian Brauner
[not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
2025-06-25 11:41 ` Marek Szyprowski
[not found] ` <CGME20250625115426eucas1p17398cfcd215befcd3eafe0cac44b33a7@eucas1p1.samsung.com>
2025-06-25 11:54 ` Marek Szyprowski [this message]
2025-06-25 13:29 ` Arnd Bergmann
2025-06-26 6:22 ` Venkat Rao Bagalkote
2025-06-26 8:19 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f080dc3-ef13-4d9a-8964-0c2b3395072e@samsung.com \
--to=m.szyprowski@samsung.com \
--cc=alexander@mihalicyn.com \
--cc=arnd@arndb.de \
--cc=arnd@kernel.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luca.boccassi@gmail.com \
--cc=romank@linux.microsoft.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).