From: Valentin Schneider <valentin.schneider@arm.com>
To: Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Cc: catalin.marinas@arm.com, dvyukov@google.com,
mark.rutland@arm.com, peterz@infradead.org,
quic_qiancai@quicinc.com, will@kernel.org, woodylin@google.com
Subject: Re: [PATCH] Reset task stack state in bringup_cpu()
Date: Mon, 15 Nov 2021 12:16:14 +0000 [thread overview]
Message-ID: <87pmr1r4vl.mognet@arm.com> (raw)
In-Reply-To: <20211115113310.35693-1-mark.rutland@arm.com>
Hi Mark,
Thanks for tackling this and glueing the pieces back together. LGTM, though
I couldn't stop myself from playing changelog police - I also have a
question/comment wrt the BP.
On 15/11/21 11:33, Mark Rutland wrote:
> To hot unplug a CPU, the idle task on that CPU calls a few layers of C
> code before finally leaving the kernel. When KASAN is in use, poisoned
> shadow is left around for each of the active stack frames, and when
> shadow call stacks are in use. When shadow call stacks are in use the
> task's SCS SP is left pointing at an arbitrary point within the task's
> shadow call stack.
>
> When an offlines CPU is hotlpugged back into the kernel, this stale
^^^^^^^^ ^^^^^^^^^^
offlined? hotplugged
> state can adversely affect the newly onlined CPU. Stale KASAN shadow can
> alias new stackframes and result in bogus KASAN warnings. A stale SCS SP
> is effectively a memory leak, and prevents a portion of the shadow call
> stack being used. Across a number of hotplug cycles the task's entire
> shadow call stack can become unusable.
>
> We previously fixed the KASAN issue in commit:
>
> e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
>
> In commit:
>
> f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
>
> ... we broke both KASAN and SCS, with SCS being fixed up in commit:
>
> 63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
>
> ... but as this runs in the context of the idle task being offlines it's
^^^^^^^^
offlined
> potentially fragile.
>
> Fix both of these consistently and more robustly by resetting the SCS SP
> and KASAN shadow immediately before we online a CPU. This ensures the
> idle task always has a consistent state, and removes the need to do so
> when initializing an idle task or when unplugging an idle task.
>
> I've tested this with both GCC and clang, with reelvant options enabled,
^^^^^^^^
relevant
> offlining and online CPUs with:
^^^^^^
onlining
>
> | while true; do
> | for C in /sys/devices/system/cpu/cpu*/online; do
> | echo 0 > $C;
> | echo 1 > $C;
> | done
> | done
>
> Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/
> Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/
> Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Woody Lin <woodylin@google.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> kernel/cpu.c | 7 +++++++
> kernel/sched/core.c | 4 ----
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 192e43a87407..407a2568f35e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -31,6 +31,7 @@
> #include <linux/smpboot.h>
> #include <linux/relay.h>
> #include <linux/slab.h>
> +#include <linux/scs.h>
> #include <linux/percpu-rwsem.h>
> #include <linux/cpuset.h>
>
> @@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
> int ret;
>
> /*
> + * Reset stale stack state from the last time this CPU was online.
> + */
> + scs_task_reset(idle);
> + kasan_unpoison_task_stack(idle);
> +
> + /*
> * Some architectures have to walk the irq descriptors to
> * setup the vector space for the cpu which comes online.
> * Prevent irq alloc/free across the bringup.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3c9b0fda64ac..76f9deeaa942 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8619,9 +8619,6 @@ void __init init_idle(struct task_struct *idle, int cpu)
> idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> kthread_set_per_cpu(idle, cpu);
>
> - scs_task_reset(idle);
> - kasan_unpoison_task_stack(idle);
> -
So those are no longer invoked for the BP during bootup (via sched_init());
that looks OK for KASAN per:
e1b77c92981a ("sched/kasan: remove stale KASAN poison after hotplug")
I didn't find any explicit commit for SCS but from the looks of
arm64/include/asm/thread_info.h we seem to be initializing things
correctly, so IIUC the removed hunk wasn't actually necessary for the BP's
first boot.
> #ifdef CONFIG_SMP
> /*
> * It's possible that init_idle() gets called multiple times on a task,
> @@ -8777,7 +8774,6 @@ void idle_task_exit(void)
> finish_arch_post_lock_switch();
> }
>
> - scs_task_reset(current);
> /* finish_cpu(), as ran on the BP, will clean up the active_mm state */
> }
>
> --
> 2.11.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Valentin Schneider <valentin.schneider@arm.com>
To: Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Cc: catalin.marinas@arm.com, dvyukov@google.com,
mark.rutland@arm.com, peterz@infradead.org,
quic_qiancai@quicinc.com, will@kernel.org, woodylin@google.com
Subject: Re: [PATCH] Reset task stack state in bringup_cpu()
Date: Mon, 15 Nov 2021 12:16:14 +0000 [thread overview]
Message-ID: <87pmr1r4vl.mognet@arm.com> (raw)
In-Reply-To: <20211115113310.35693-1-mark.rutland@arm.com>
Hi Mark,
Thanks for tackling this and glueing the pieces back together. LGTM, though
I couldn't stop myself from playing changelog police - I also have a
question/comment wrt the BP.
On 15/11/21 11:33, Mark Rutland wrote:
> To hot unplug a CPU, the idle task on that CPU calls a few layers of C
> code before finally leaving the kernel. When KASAN is in use, poisoned
> shadow is left around for each of the active stack frames, and when
> shadow call stacks are in use. When shadow call stacks are in use the
> task's SCS SP is left pointing at an arbitrary point within the task's
> shadow call stack.
>
> When an offlines CPU is hotlpugged back into the kernel, this stale
^^^^^^^^ ^^^^^^^^^^
offlined? hotplugged
> state can adversely affect the newly onlined CPU. Stale KASAN shadow can
> alias new stackframes and result in bogus KASAN warnings. A stale SCS SP
> is effectively a memory leak, and prevents a portion of the shadow call
> stack being used. Across a number of hotplug cycles the task's entire
> shadow call stack can become unusable.
>
> We previously fixed the KASAN issue in commit:
>
> e1b77c92981a5222 ("sched/kasan: remove stale KASAN poison after hotplug")
>
> In commit:
>
> f1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
>
> ... we broke both KASAN and SCS, with SCS being fixed up in commit:
>
> 63acd42c0d4942f7 ("sched/scs: Reset the shadow stack when idle_task_exit")
>
> ... but as this runs in the context of the idle task being offlines it's
^^^^^^^^
offlined
> potentially fragile.
>
> Fix both of these consistently and more robustly by resetting the SCS SP
> and KASAN shadow immediately before we online a CPU. This ensures the
> idle task always has a consistent state, and removes the need to do so
> when initializing an idle task or when unplugging an idle task.
>
> I've tested this with both GCC and clang, with reelvant options enabled,
^^^^^^^^
relevant
> offlining and online CPUs with:
^^^^^^
onlining
>
> | while true; do
> | for C in /sys/devices/system/cpu/cpu*/online; do
> | echo 0 > $C;
> | echo 1 > $C;
> | done
> | done
>
> Link: https://lore.kernel.org/lkml/20211012083521.973587-1-woodylin@google.com/
> Link: https://lore.kernel.org/linux-arm-kernel/YY9ECKyPtDbD9q8q@qian-HP-Z2-SFF-G5-Workstation/
> Fixes: 1a0a376ca0c4ef1 ("sched/core: Initialize the idle task with preemption disabled")
> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Woody Lin <woodylin@google.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> kernel/cpu.c | 7 +++++++
> kernel/sched/core.c | 4 ----
> 2 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 192e43a87407..407a2568f35e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -31,6 +31,7 @@
> #include <linux/smpboot.h>
> #include <linux/relay.h>
> #include <linux/slab.h>
> +#include <linux/scs.h>
> #include <linux/percpu-rwsem.h>
> #include <linux/cpuset.h>
>
> @@ -588,6 +589,12 @@ static int bringup_cpu(unsigned int cpu)
> int ret;
>
> /*
> + * Reset stale stack state from the last time this CPU was online.
> + */
> + scs_task_reset(idle);
> + kasan_unpoison_task_stack(idle);
> +
> + /*
> * Some architectures have to walk the irq descriptors to
> * setup the vector space for the cpu which comes online.
> * Prevent irq alloc/free across the bringup.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3c9b0fda64ac..76f9deeaa942 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8619,9 +8619,6 @@ void __init init_idle(struct task_struct *idle, int cpu)
> idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
> kthread_set_per_cpu(idle, cpu);
>
> - scs_task_reset(idle);
> - kasan_unpoison_task_stack(idle);
> -
So those are no longer invoked for the BP during bootup (via sched_init());
that looks OK for KASAN per:
e1b77c92981a ("sched/kasan: remove stale KASAN poison after hotplug")
I didn't find any explicit commit for SCS but from the looks of
arm64/include/asm/thread_info.h we seem to be initializing things
correctly, so IIUC the removed hunk wasn't actually necessary for the BP's
first boot.
> #ifdef CONFIG_SMP
> /*
> * It's possible that init_idle() gets called multiple times on a task,
> @@ -8777,7 +8774,6 @@ void idle_task_exit(void)
> finish_arch_post_lock_switch();
> }
>
> - scs_task_reset(current);
> /* finish_cpu(), as ran on the BP, will clean up the active_mm state */
> }
>
> --
> 2.11.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-11-15 12:17 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-15 11:33 [PATCH] Reset task stack state in bringup_cpu() Mark Rutland
2021-11-15 11:33 ` Mark Rutland
2021-11-15 12:16 ` Valentin Schneider [this message]
2021-11-15 12:16 ` Valentin Schneider
2021-11-15 14:09 ` Mark Rutland
2021-11-15 14:09 ` Mark Rutland
2021-11-15 18:16 ` Mark Rutland
2021-11-15 18:16 ` Mark Rutland
2021-11-16 16:31 ` Qian Cai
2021-11-16 16:31 ` Qian Cai
2021-11-17 11:52 ` Mark Rutland
2021-11-17 11:52 ` Mark Rutland
2021-11-17 23:39 ` Qian Cai
2021-11-17 23:39 ` Qian Cai
2023-03-11 10:52 ` David Woodhouse
2023-03-11 10:52 ` David Woodhouse
2023-03-11 10:52 ` David Woodhouse
2023-03-11 20:29 ` Johannes Berg
2023-03-11 20:29 ` Johannes Berg
2023-03-11 20:29 ` Johannes Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pmr1r4vl.mognet@arm.com \
--to=valentin.schneider@arm.com \
--cc=catalin.marinas@arm.com \
--cc=dvyukov@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=quic_qiancai@quicinc.com \
--cc=will@kernel.org \
--cc=woodylin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.