Re: [PATCH] coredump: reduce stack usage in vfs_coredump()

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Marek Szyprowski <m.szyprowski@samsung.com>
To: Arnd Bergmann <arnd@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>, Jan Kara <jack@suse.cz>,
	Alexander Mikhalitsyn <alexander@mihalicyn.com>,
	Jann Horn <jannh@google.com>,
	Luca Boccassi <luca.boccassi@gmail.com>,
	Jeff Layton <jlayton@kernel.org>,
	Roman Kisel <romank@linux.microsoft.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] coredump: reduce stack usage in vfs_coredump()
Date: Wed, 25 Jun 2025 13:54:25 +0200	[thread overview]
Message-ID: <8f080dc3-ef13-4d9a-8964-0c2b3395072e@samsung.com> (raw)
In-Reply-To: <404dfe9a-1f4f-4776-863a-d8bbe08335e2@samsung.com>

On 25.06.2025 13:41, Marek Szyprowski wrote:
>
> On 20.06.2025 13:21, Arnd Bergmann wrote:
>> From: Arnd Bergmann <arnd@arndb.de>
>>
>> The newly added socket coredump code runs into some corner cases
>> with KASAN that end up needing a lot of stack space:
>>
>> fs/coredump.c:1206:1: error: the frame size of 1680 bytes is larger 
>> than 1280 bytes [-Werror=frame-larger-than=]
>>
>> Mark the socket helper function as noinline_for_stack so its stack
>> usage does not leak out to the other code paths. This also seems to
>> help with register pressure, and the resulting combined stack usage of
>> vfs_coredump() and coredump_socket() is actually lower than the inlined
>> version.
>>
>> Moving the core_state variable into coredump_wait() helps reduce the
>> stack usage further and simplifies the code, though it is not sufficient
>> to avoid the warning by itself.
>>
>> Fixes: 6a7a50e5f1ac ("coredump: use a single helper for the socket")
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> This change appears in today's linux-next (next-20250625) as commit 
> fb82645d3f72 ("coredump: reduce stack usage in vfs_coredump()"). In my 
> tests I found that it causes a kernel oops on some of my ARM 32bit 
> Exynos based boards. This is really strange, because I don't see any 
> obvious problem in this patch. Reverting $subject on top of linux-next 
> hides/fixes the oops. I suspect some kind of use-after-free issue, but 
> I cannot point anything related. Here is the kernel log from one of 
> the affected boards (I've intentionally kept the register and stack 
> dumps):

I've just checked once again and found the source of the issue. 
vfs_coredump() calls coredump_cleanup(), which calls coredump_finish(), 
which performs the following dereference:

next = current->signal->core_state->dumper.next

of the core_state assigned in zap_threads() called from coredump_wait(). 
It looks that core_state cannot be moved into coredump_wait() without 
refactoring/cleaning this first.


>
> 8<--- cut here ---
> Unable to handle kernel paging request at virtual address c10ba370 
> when write
> [c10ba370] *pgd=4101941e(bad)
> Internal error: Oops: 80d [#1] SMP ARM
> Modules linked in: cmac bnep mwifiex_sdio mwifiex btmrvl_sdio btmrvl 
> sha256 libsha256_generic bluetooth cfg80211 exynos_gsc v4l2_mem2mem 
> s5p_mfc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 
> videobuf2_common ecdh_generic ecc videodev mc s5p_cec
> CPU: 1 UID: 0 PID: 1367 Comm: cgm-release-age Not tainted 
> 6.16.0-rc3-next-20250625 #10627 PREEMPT
> Hardware name: Samsung Exynos (Flattened Device Tree)
> PC is at vfs_coredump+0x294/0x17bc
> LR is at _raw_spin_unlock_irq+0x20/0x50
> pc : [<c03c4534>]    lr : [<c0cf096c>]    psr: a0000013
> sp : f1249dd0  ip : 00000000  fp : f1249e68
> r10: c1ae41e4  r9 : 00000000  r8 : c13aaa74
> r7 : 00000000  r6 : 63726f46  r5 : c2a3c000  r4 : c3852080
> r3 : c10ba370  r2 : c3852080  r1 : ffffffff  r0 : 00006325
> Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 10c5387d  Table: 4324c06a  DAC: 00000051
> Register r0 information: non-paged memory
> Register r1 information: non-paged memory
> Register r2 information: slab task_struct start c3852080 pointer 
> offset 0 size 4160
> Register r3 information: non-slab/vmalloc memory
> Register r4 information: slab task_struct start c3852080 pointer 
> offset 0 size 4160
> Register r5 information: slab kmalloc-128 start c2a3c000 pointer 
> offset 0 size 128
> Register r6 information: non-paged memory
> Register r7 information: NULL pointer
> Register r8 information: non-slab/vmalloc memory
> Register r9 information: NULL pointer
> Register r10 information: non-slab/vmalloc memory
> Register r11 information: 2-page vmalloc region starting at 0xf1248000 
> allocated at kernel_clone+0x58/0x3f4
> Register r12 information: NULL pointer
> Process cgm-release-age (pid: 1367, stack limit = 0x4909c75e)
> Stack: (0xf1249dd0 to 0xf124a000)
> 9dc0:                                     c3852830 00000000 c3852080 
> c01ae7fc
> 9de0: c13095a8 c45c8600 c2a3c000 00000000 00000000 c014ab28 c102c84c 
> c104be90
> 9e00: c13095a8 c1496b58 ffffffff 00000081 00000000 6fda5e52 c127e2ec 
> 00000000
> 9e20: c2a3c280 00000004 00000080 00000000 c3852800 00000001 00000001 
> 00000000
> 9e40: c2855d90 00000000 c3852830 c0ce2874 c13133c8 c3852080 c13133e0 
> c1472911
> 9e60: c1cc6480 ef0484e0 f1249f60 00000000 00000000 800000cd 00000001 
> 00000000
> 9e80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> 9ea0: 00000000 00000000 00000006 c01450a8 f1249f4c 00000001 c12849dc 
> 6fda5e52
> 9ec0: c0cf096c c3852080 00000005 00000006 400004d8 0001e000 c1309154 
> c2855d80
> 9ee0: f1249f60 c014ab28 00000000 c02eeb48 00000000 c3852830 00000000 
> ce4d1c00
> 9f00: f1249f4c c13095a8 c127e2ec c2855d90 00000000 6fda5e52 c12849dc 
> c3852080
> 9f20: f1249fb0 00000000 f1249f4c 5ac3c35a b6d21668 c3852730 b6d2166c 
> c011a430
> 9f40: 00000000 00000002 f1249f74 c0cf096c bee9b9d4 c0148328 f1249f7c 
> 00000000
> 9f60: 00000006 00000000 fffffffa 00000557 00000000 00000000 00000000 
> 00000000
> 9f80: 00000000 6fda5e52 00000000 00000000 b6f43968 bee9b9d4 000000af 
> c0100290
> 9fa0: c3852080 000000af 00000000 c0100088 00000000 bee9b9d4 00000000 
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002 00000000 
> 00000000
> Call trace:
>  vfs_coredump from get_signal+0x990/0xd9c
>  get_signal from do_work_pending+0x118/0x588
>  do_work_pending from slow_work_pending+0xc/0x24
> Exception stack(0xf1249fb0 to 0xf1249ff8)
> 9fa0:                                     00000000 bee9b9d4 00000000 
> 00000008
> 9fc0: 00000000 b6f43968 bee9b9d4 000000af bee9bf70 b6ebdc94 00440f90 
> 00000000
> 9fe0: 00000020 bee9b9d0 ffffffff b6d2166c 00000010 00000002
> Code: e1a03006 e5966004 e5930000 f57ff05b (e5837000)
> ---[ end trace 0000000000000000 ]---
> note: cgm-release-age[1367] exited with irqs disabled
>
>
>> ---
>>   fs/coredump.c | 20 ++++++++++----------
>>   1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/coredump.c b/fs/coredump.c
>> index e2611fb1f254..c46e3996ff91 100644
>> --- a/fs/coredump.c
>> +++ b/fs/coredump.c
>> @@ -518,27 +518,28 @@ static int zap_threads(struct task_struct *tsk,
>>       return nr;
>>   }
>>   -static int coredump_wait(int exit_code, struct core_state 
>> *core_state)
>> +static int coredump_wait(int exit_code)
>>   {
>>       struct task_struct *tsk = current;
>> +    struct core_state core_state;
>>       int core_waiters = -EBUSY;
>>   -    init_completion(&core_state->startup);
>> -    core_state->dumper.task = tsk;
>> -    core_state->dumper.next = NULL;
>> +    init_completion(&core_state.startup);
>> +    core_state.dumper.task = tsk;
>> +    core_state.dumper.next = NULL;
>>   -    core_waiters = zap_threads(tsk, core_state, exit_code);
>> +    core_waiters = zap_threads(tsk, &core_state, exit_code);
>>       if (core_waiters > 0) {
>>           struct core_thread *ptr;
>>   - wait_for_completion_state(&core_state->startup,
>> +        wait_for_completion_state(&core_state.startup,
>>                         TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
>>           /*
>>            * Wait for all the threads to become inactive, so that
>>            * all the thread context (extended register state, like
>>            * fpu etc) gets copied to the memory.
>>            */
>> -        ptr = core_state->dumper.next;
>> +        ptr = core_state.dumper.next;
>>           while (ptr != NULL) {
>>               wait_task_inactive(ptr->task, TASK_ANY);
>>               ptr = ptr->next;
>> @@ -858,7 +859,7 @@ static bool coredump_sock_request(struct 
>> core_name *cn, struct coredump_params *
>>       return coredump_sock_mark(cprm->file, COREDUMP_MARK_REQACK);
>>   }
>>   -static bool coredump_socket(struct core_name *cn, struct 
>> coredump_params *cprm)
>> +static noinline_for_stack bool coredump_socket(struct core_name *cn, 
>> struct coredump_params *cprm)
>>   {
>>       if (!coredump_sock_connect(cn, cprm))
>>           return false;
>> @@ -1095,7 +1096,6 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>>   {
>>       struct cred *cred __free(put_cred) = NULL;
>>       size_t *argv __free(kfree) = NULL;
>> -    struct core_state core_state;
>>       struct core_name cn;
>>       struct mm_struct *mm = current->mm;
>>       struct linux_binfmt *binfmt = mm->binfmt;
>> @@ -1131,7 +1131,7 @@ void vfs_coredump(const kernel_siginfo_t *siginfo)
>>       if (coredump_force_suid_safe(&cprm))
>>           cred->fsuid = GLOBAL_ROOT_UID;
>>   -    if (coredump_wait(siginfo->si_signo, &core_state) < 0)
>> +    if (coredump_wait(siginfo->si_signo) < 0)
>>           return;
>>         old_cred = override_creds(cred);
>
> Best regards

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

next prev parent reply	other threads:[~2025-06-25 11:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 11:21 [PATCH] coredump: reduce stack usage in vfs_coredump() Arnd Bergmann
2025-06-20 16:49 ` Alexander Mikhalitsyn
2025-06-23 10:36 ` Christian Brauner
     [not found] ` <CGME20250625114152eucas1p250b0d9a60a030e0eca6adf4d50794ebd@eucas1p2.samsung.com>
2025-06-25 11:41   ` Marek Szyprowski
     [not found]     ` <CGME20250625115426eucas1p17398cfcd215befcd3eafe0cac44b33a7@eucas1p1.samsung.com>
2025-06-25 11:54       ` Marek Szyprowski [this message]
2025-06-25 13:29         ` Arnd Bergmann
2025-06-26  6:22           ` Venkat Rao Bagalkote
2025-06-26  8:19           ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f080dc3-ef13-4d9a-8964-0c2b3395072e@samsung.com \
    --to=m.szyprowski@samsung.com \
    --cc=alexander@mihalicyn.com \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luca.boccassi@gmail.com \
    --cc=romank@linux.microsoft.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).