* + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch
@ 2023-05-25 20:52 Andrew Morton
2023-06-10 19:29 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2023-05-25 20:52 UTC (permalink / raw)
To: mm-commits, torvalds, peterz, npiggin, akpm
The patch titled
Subject: lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Nicholas Piggin <npiggin@gmail.com>
Subject: lazy tlb: fix hotplug exit race with MMU_LAZY_TLB_SHOOTDOWN
Date: Wed, 24 May 2023 16:04:54 +1000
CPU unplug first calls __cpu_disable(), and that's where powerpc calls
cleanup_cpu_mmu_context(), which clears this CPU from mm_cpumask() of all
mms in the system.
However this CPU may still be using a lazy tlb mm, and its mm_cpumask bit
will be cleared from it. The CPU does not switch away from the lazy tlb
mm until arch_cpu_idle_dead() calls idle_task_exit().
If that user mm exits in this window, it will not be subject to the lazy
tlb mm shootdown and may be freed while in use as a lazy mm by the CPU
that is being unplugged.
cleanup_cpu_mmu_context() could be moved later, but it looks better to
move the lazy tlb mm switching earlier. The problem with doing the lazy
mm switching in idle_task_exit() is explained in commit bf2c59fce4074
("sched/core: Fix illegal RCU from offline CPUs"), which added a wart to
switch away from the mm but leave it set in active_mm to be cleaned up
later.
So instead, switch away from the lazy tlb mm on the stopper kthread before
the CPU is taken down. This CPU will never switch to a user thread from
this point, so it has no chance to pick up a new lazy tlb mm. This
removes the lazy tlb mm handling wart in CPU unplug.
idle_task_exit() remains to reduce churn in the patch. It could be
removed entirely after this because finish_cpu() makes a similar check.
finish_cpu() itself is not strictly needed because init_mm will never have
its refcount drop to zero. But it is conceptually nicer to keep it rather
than have the idle thread drop the reference on the mm it is using.
Link: https://lkml.kernel.org/r/20230524060455.147699-1-npiggin@gmail.com
Fixes: 2655421ae69fa ("lazy tlb: shoot lazies, non-refcounting lazy tlb mm reference handling scheme")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/sched/hotplug.h | 2 ++
kernel/cpu.c | 11 +++++++----
kernel/sched/core.c | 24 +++++++++++++++++++-----
3 files changed, 28 insertions(+), 9 deletions(-)
--- a/include/linux/sched/hotplug.h~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown
+++ a/include/linux/sched/hotplug.h
@@ -19,8 +19,10 @@ extern int sched_cpu_dying(unsigned int
#endif
#ifdef CONFIG_HOTPLUG_CPU
+extern void idle_task_prepare_exit(void);
extern void idle_task_exit(void);
#else
+static inline void idle_task_prepare_exit(void) {}
static inline void idle_task_exit(void) {}
#endif
--- a/kernel/cpu.c~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown
+++ a/kernel/cpu.c
@@ -618,12 +618,13 @@ static int finish_cpu(unsigned int cpu)
struct mm_struct *mm = idle->active_mm;
/*
- * idle_task_exit() will have switched to &init_mm, now
- * clean up any remaining active_mm state.
+ * idle_task_prepare_exit() ensured the idle task was using
+ * &init_mm. Now that the CPU has stopped, drop that refcount.
*/
- if (mm != &init_mm)
- idle->active_mm = &init_mm;
+ WARN_ON(mm != &init_mm);
+ idle->active_mm = NULL;
mmdrop_lazy_tlb(mm);
+
return 0;
}
@@ -1030,6 +1031,8 @@ static int take_cpu_down(void *_param)
enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
int err, cpu = smp_processor_id();
+ idle_task_prepare_exit();
+
/* Ensure this CPU doesn't handle any more interrupts. */
err = __cpu_disable();
if (err < 0)
--- a/kernel/sched/core.c~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown
+++ a/kernel/sched/core.c
@@ -9373,19 +9373,33 @@ void sched_setnuma(struct task_struct *p
* Ensure that the idle task is using init_mm right before its CPU goes
* offline.
*/
-void idle_task_exit(void)
+void idle_task_prepare_exit(void)
{
struct mm_struct *mm = current->active_mm;
- BUG_ON(cpu_online(smp_processor_id()));
- BUG_ON(current != this_rq()->idle);
+ WARN_ON(!irqs_disabled());
if (mm != &init_mm) {
- switch_mm(mm, &init_mm, current);
+ mmgrab_lazy_tlb(&init_mm);
+ current->active_mm = &init_mm;
+ switch_mm_irqs_off(mm, &init_mm, current);
finish_arch_post_lock_switch();
+ mmdrop_lazy_tlb(mm);
}
+ /* finish_cpu() will mmdrop the init_mm ref after this CPU stops */
+}
+
+/*
+ * After the CPU is offline, double check that it was previously switched to
+ * init_mm. This call can be removed because the condition is caught in
+ * finish_cpu() as well.
+ */
+void idle_task_exit(void)
+{
+ BUG_ON(cpu_online(smp_processor_id()));
+ BUG_ON(current != this_rq()->idle);
- /* finish_cpu(), as ran on the BP, will clean up the active_mm state */
+ WARN_ON_ONCE(current->active_mm != &init_mm);
}
static int __balance_push_cpu_stop(void *arg)
_
Patches currently in -mm which might be from npiggin@gmail.com are
lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch
lazy-tlb-consolidate-lazy-tlb-mm-switching.patch
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch
2023-05-25 20:52 + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch Andrew Morton
@ 2023-06-10 19:29 ` Thomas Gleixner
2023-06-20 6:02 ` Nicholas Piggin
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2023-06-10 19:29 UTC (permalink / raw)
To: linux-kernel, mm-commits, torvalds, peterz, npiggin, akpm
On Thu, May 25 2023 at 13:52, Andrew Morton wrote:
Replying here as I wasn't cc'ed on the patch.
> @@ -1030,6 +1031,8 @@ static int take_cpu_down(void *_param)
> enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
> int err, cpu = smp_processor_id();
>
> + idle_task_prepare_exit();
> +
> /* Ensure this CPU doesn't handle any more interrupts. */
> err = __cpu_disable();
> if (err < 0)
> --- a/kernel/sched/core.c~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown
> +++ a/kernel/sched/core.c
> @@ -9373,19 +9373,33 @@ void sched_setnuma(struct task_struct *p
> * Ensure that the idle task is using init_mm right before its CPU goes
> * offline.
> */
> -void idle_task_exit(void)
> +void idle_task_prepare_exit(void)
This function name along with the above comment is completely
misleading. It suggests this is about the idle task itself instead of
making it clear that this ensures that the kernel threads of the
outgoing CPU are not longer using a mm which is not init_mm.
The callsite is arbitratily chosen too. Why does this have to be done
from stomp machine context?
There is zero reason to do so. The last hotplug state before teardown is
CPUHP_AP_SCHED_WAIT_EMPTY. It invokes sched_cpu_wait_empty() in the
context of the CPU hotplug thread of the outgoing CPU.
sched_cpu_wait_empty() guarantees that there are no temporarily pinned
(via migrate disable) user space tasks on that CPU anymore. The
scheduler guarantees that there won't be user space tasks woken up on or
migrated to that CPU because the CPU is not in the cpu_active mask.
The stopper thread has absolutely nothing to do with that.
So sched_cpu_wait_empty() is the obvious place to handle that:
int sched_cpu_wait_empty(unsigned int cpu)
{
balance_hotplug_wait();
+ sched_force_init_mm();
return 0;
}
And then have:
/*
* Invoked on the outgoing CPU in context of the CPU hotplug thread
* after ensuring that there are no user space tasks left on the CPU.
*
* If there is a lazy mm in use on the hotplug thread, drop it and
* switch to init_mm.
*
* The reference count on init_mm is dropped in finish_cpu().
*/
static void sched_force_init_mm(void)
{
No?
> {
> struct mm_struct *mm = current->active_mm;
>
> - BUG_ON(cpu_online(smp_processor_id()));
> - BUG_ON(current != this_rq()->idle);
> + WARN_ON(!irqs_disabled());
>
> if (mm != &init_mm) {
> - switch_mm(mm, &init_mm, current);
> + mmgrab_lazy_tlb(&init_mm);
> + current->active_mm = &init_mm;
> + switch_mm_irqs_off(mm, &init_mm, current);
> finish_arch_post_lock_switch();
> + mmdrop_lazy_tlb(mm);
> }
> + /* finish_cpu() will mmdrop the init_mm ref after this CPU stops */
> +}
> +
> +/*
> + * After the CPU is offline, double check that it was previously switched to
> + * init_mm. This call can be removed because the condition is caught in
> + * finish_cpu() as well.
So why adding it in the first place?
The changelog mumbles something about reducing churn, but I fail to see
that reduction. This adds 10 lines of pointless code and comments for
zero value.
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch
2023-06-10 19:29 ` Thomas Gleixner
@ 2023-06-20 6:02 ` Nicholas Piggin
2023-06-20 6:32 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Nicholas Piggin @ 2023-06-20 6:02 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel, mm-commits, torvalds, peterz, akpm
On Sun Jun 11, 2023 at 5:29 AM AEST, Thomas Gleixner wrote:
> On Thu, May 25 2023 at 13:52, Andrew Morton wrote:
>
> Replying here as I wasn't cc'ed on the patch.
>
> > @@ -1030,6 +1031,8 @@ static int take_cpu_down(void *_param)
> > enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
> > int err, cpu = smp_processor_id();
> >
> > + idle_task_prepare_exit();
> > +
> > /* Ensure this CPU doesn't handle any more interrupts. */
> > err = __cpu_disable();
> > if (err < 0)
> > --- a/kernel/sched/core.c~lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown
> > +++ a/kernel/sched/core.c
> > @@ -9373,19 +9373,33 @@ void sched_setnuma(struct task_struct *p
> > * Ensure that the idle task is using init_mm right before its CPU goes
> > * offline.
> > */
> > -void idle_task_exit(void)
> > +void idle_task_prepare_exit(void)
>
> This function name along with the above comment is completely
> misleading. It suggests this is about the idle task itself instead of
> making it clear that this ensures that the kernel threads of the
> outgoing CPU are not longer using a mm which is not init_mm.
>
> The callsite is arbitratily chosen too. Why does this have to be done
> from stomp machine context?
It's the minimalish fix. My patch didn't change what that idle_task_exit
is attempting to do.
> There is zero reason to do so. The last hotplug state before teardown is
> CPUHP_AP_SCHED_WAIT_EMPTY. It invokes sched_cpu_wait_empty() in the
> context of the CPU hotplug thread of the outgoing CPU.
>
> sched_cpu_wait_empty() guarantees that there are no temporarily pinned
> (via migrate disable) user space tasks on that CPU anymore. The
> scheduler guarantees that there won't be user space tasks woken up on or
> migrated to that CPU because the CPU is not in the cpu_active mask.
>
> The stopper thread has absolutely nothing to do with that.
>
> So sched_cpu_wait_empty() is the obvious place to handle that:
>
> int sched_cpu_wait_empty(unsigned int cpu)
> {
> balance_hotplug_wait();
> + sched_force_init_mm();
> return 0;
> }
>
> And then have:
>
> /*
> * Invoked on the outgoing CPU in context of the CPU hotplug thread
> * after ensuring that there are no user space tasks left on the CPU.
> *
> * If there is a lazy mm in use on the hotplug thread, drop it and
> * switch to init_mm.
> *
> * The reference count on init_mm is dropped in finish_cpu().
> */
> static void sched_force_init_mm(void)
> {
>
> No?
It could be done in many places. Peter touched it last and it's
been in the tree since prehistoric times.
> > {
> > struct mm_struct *mm = current->active_mm;
> >
> > - BUG_ON(cpu_online(smp_processor_id()));
> > - BUG_ON(current != this_rq()->idle);
> > + WARN_ON(!irqs_disabled());
> >
> > if (mm != &init_mm) {
> > - switch_mm(mm, &init_mm, current);
> > + mmgrab_lazy_tlb(&init_mm);
> > + current->active_mm = &init_mm;
> > + switch_mm_irqs_off(mm, &init_mm, current);
> > finish_arch_post_lock_switch();
> > + mmdrop_lazy_tlb(mm);
> > }
> > + /* finish_cpu() will mmdrop the init_mm ref after this CPU stops */
> > +}
> > +
> > +/*
> > + * After the CPU is offline, double check that it was previously switched to
> > + * init_mm. This call can be removed because the condition is caught in
> > + * finish_cpu() as well.
>
> So why adding it in the first place?
>
> The changelog mumbles something about reducing churn, but I fail to see
> that reduction. This adds 10 lines of pointless code and comments for
> zero value.
Not sure what you're talking about. The patch didn't add it. Removing it
requires removing it from all archs, which is the churn.
Thanks,
Nick
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch
2023-06-20 6:02 ` Nicholas Piggin
@ 2023-06-20 6:32 ` Thomas Gleixner
2023-06-20 8:10 ` Nicholas Piggin
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2023-06-20 6:32 UTC (permalink / raw)
To: Nicholas Piggin, linux-kernel, mm-commits, torvalds, peterz, akpm
On Tue, Jun 20 2023 at 16:02, Nicholas Piggin wrote:
> On Sun Jun 11, 2023 at 5:29 AM AEST, Thomas Gleixner wrote:
>> /*
>> * Invoked on the outgoing CPU in context of the CPU hotplug thread
>> * after ensuring that there are no user space tasks left on the CPU.
>> *
>> * If there is a lazy mm in use on the hotplug thread, drop it and
>> * switch to init_mm.
>> *
>> * The reference count on init_mm is dropped in finish_cpu().
>> */
>> static void sched_force_init_mm(void)
>> {
>>
>> No?
>
> It could be done in many places. Peter touched it last and it's
> been in the tree since prehistoric times.
That's an argument for slapping it into some randomly chosen place and
be done with it, right?
>> > +/*
>> > + * After the CPU is offline, double check that it was previously switched to
>> > + * init_mm. This call can be removed because the condition is caught in
>> > + * finish_cpu() as well.
>>
>> So why adding it in the first place?
>>
>> The changelog mumbles something about reducing churn, but I fail to see
>> that reduction. This adds 10 lines of pointless code and comments for
>> zero value.
>
> Not sure what you're talking about. The patch didn't add it. Removing it
> requires removing it from all archs, which is the churn.
Sure. That's left as an exercise for others, right?
Oh well.
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch
2023-06-20 6:32 ` Thomas Gleixner
@ 2023-06-20 8:10 ` Nicholas Piggin
0 siblings, 0 replies; 5+ messages in thread
From: Nicholas Piggin @ 2023-06-20 8:10 UTC (permalink / raw)
To: Thomas Gleixner, linux-kernel, mm-commits, torvalds, peterz, akpm
On Tue Jun 20, 2023 at 4:32 PM AEST, Thomas Gleixner wrote:
> On Tue, Jun 20 2023 at 16:02, Nicholas Piggin wrote:
> > On Sun Jun 11, 2023 at 5:29 AM AEST, Thomas Gleixner wrote:
> >> /*
> >> * Invoked on the outgoing CPU in context of the CPU hotplug thread
> >> * after ensuring that there are no user space tasks left on the CPU.
> >> *
> >> * If there is a lazy mm in use on the hotplug thread, drop it and
> >> * switch to init_mm.
> >> *
> >> * The reference count on init_mm is dropped in finish_cpu().
> >> */
> >> static void sched_force_init_mm(void)
> >> {
> >>
> >> No?
> >
> > It could be done in many places. Peter touched it last and it's
> > been in the tree since prehistoric times.
>
> That's an argument for slapping it into some randomly chosen place and
> be done with it, right?
Ah, not exactly but I did misremember, I did have to change where I
added it so it does turn out to be more arbitrary than I thought.
If it goes in wait empty then than state is no longer wait empty, it's
wait empty and switch mm. I can put it there, should I also rename the
state?
> >> > +/*
> >> > + * After the CPU is offline, double check that it was previously switched to
> >> > + * init_mm. This call can be removed because the condition is caught in
> >> > + * finish_cpu() as well.
> >>
> >> So why adding it in the first place?
> >>
> >> The changelog mumbles something about reducing churn, but I fail to see
> >> that reduction. This adds 10 lines of pointless code and comments for
> >> zero value.
> >
> > Not sure what you're talking about. The patch didn't add it. Removing it
> > requires removing it from all archs, which is the churn.
>
> Sure. That's left as an exercise for others, right?
No, I'm telling you why I left the function in. Did not want to gate a
fix behind herding the arch cats. I will send the trivial patch to arch
trees after it's upstream. This is how such API changes are typically
done.
Thanks,
Nick
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-06-20 8:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-25 20:52 + lazy-tlb-fix-hotplug-exit-race-with-mmu_lazy_tlb_shootdown.patch added to mm-hotfixes-unstable branch Andrew Morton
2023-06-10 19:29 ` Thomas Gleixner
2023-06-20 6:02 ` Nicholas Piggin
2023-06-20 6:32 ` Thomas Gleixner
2023-06-20 8:10 ` Nicholas Piggin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.