[PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance
@ 2026-03-02  7:52 Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 01/12] smp: Disable preemption explicitly in __csd_lock_wait Chuyi Zhou
                   ` (11 more replies)
  0 siblings, 12 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Changes in v2:
 - Simplify the code comments in [PATCH v2 2/12] (pointed by peter and
   muchun)
 - Adjust the preemption disabling logic in smp_call_function_any() in
   [PATCH v2 3/12] (suggested by peter).
 - Use on-stack cpumask only when !CONFIG_CPUMASK_OFFSTACK in [PATCH V2
   4/12] (pointed by peter)
 - Add [PATCH v2 5/12] to replace migrate_disable with the rcu mechanism
 - Adjust the preemption disabling logic to allow flush_tlb_multi() to be
   preemptible and migratable in [PATCH v2 11/12]
 - Collect Acked-bys and Reviewed-bys

Introduction
============

The vast majority of smp_call_function*() callers block until remote CPUs
complete the IPI function execution. As smp_call_function*() runs with
preemption disabled throughout, scheduling latency increases dramatically
with the number of remote CPUs and other factors (such as interrupts being
disabled).

On x86-64 architectures, TLB flushes are performed via IPIs; thus, during
process exit or when process-mapped pages are reclaimed, numerous IPI
operations must be awaited, leading to increased scheduling latency for
other threads on the current CPU. In our production environment, we
observed IPI wait-induced scheduling latency reaching up to 16ms on a
16-core machine. Our goal is to allow preemption during IPI completion
waiting to improve real-time performance.

Background
============

In our production environments, latency-sensitive workloads (DPDK) are
configured with the highest priority to preempt lower-priority tasks at any
time. We discovered that DPDK's wake-up latency is primarily caused by the
current CPU having preemption disabled. Therefore, we collected the maximum
preemption disabled events within every 30-second interval and then
calculated the P50/P99 of these max preemption disabled events:

                        p50(ns)               p99(ns)
cpu0                   254956                 5465050
cpu1                   115801                 120782
cpu2                   43324                  72957
cpu3                   256637                 16723307
cpu4                   58979                  87237
cpu5                   47464                  79815
cpu6                   48881                  81371
cpu7                   52263                  82294
cpu8                   263555                 4657713
cpu9                   44935                  73962
cpu10                  37659                  65026
cpu11                  257008                 2706878
cpu12                  49669                  90006
cpu13                  45186                  74666
cpu14                  60705                  83866
cpu15                  51311                  86885

Meanwhile, we have collected the distribution of preemption disabling
events exceeding 1ms across different CPUs over several hours(I omitted
CPU data that were all zeros):

CPU        1~10ms   10~50ms   50~100ms
cpu0        29       5       0
cpu3        38       13      0
cpu8        34       6       0
cpu11       24       10      0

The preemption disabled for several milliseconds or even 10ms+ mostly
originates from TLB flush:

@stack[
    trace_preempt_on+143
    trace_preempt_on+143
    preempt_count_sub+67
    arch_tlbbatch_flush/flush_tlb_mm_range
    task_exit/page_reclaim/...
]

Further analysis confirms that the majority of the time is consumed in
csd_lock_wait().

Now smp_call*() always needs to disable preemption, mainly to protect its
internal per‑CPU data structures and synchronize with CPU offline
operations. This patchset attempts to make csd_lock_wait() preemptible,
thereby reducing the preemption‑disabled critical section and improving
kernel real‑time performance.

Effect
======

After applying this patchset, we no longer observe preemption disabled for
more than 1ms on the arch_tlbbatch_flush/flush_tlb_mm_range path. The
overall P99 of max preemption disabled events in every 30-second is
reduced to around 1.5ms (the remaining latency is primarily due to lock
contention.

                     before patch    after patch    reduced by
                     -----------    --------------  ------------
p99(ns)                16723307        1556034        ~90.70%

Chuyi Zhou (12):
  smp: Disable preemption explicitly in __csd_lock_wait
  smp: Enable preemption early in smp_call_function_single
  smp: Remove get_cpu from smp_call_function_any
  smp: Use on-stack cpumask in smp_call_function_many_cond
  smp: Free call_function_data via RCU in smpcfd_dead_cpu
  smp: Enable preemption early in smp_call_function_many_cond
  smp: Remove preempt_disable from smp_call_function
  smp: Remove preempt_disable from on_each_cpu_cond_mask
  scftorture: Remove preempt_disable in scftorture_invoke_one
  x86/mm: Move flush_tlb_info back to the stack
  x86/mm: Enable preemption during native_flush_tlb_multi
  x86/mm: Enable preemption during flush_tlb_kernel_range

 arch/x86/kernel/kvm.c |   4 +-
 arch/x86/mm/tlb.c     | 137 ++++++++++++++++++------------------------
 kernel/scftorture.c   |   9 +--
 kernel/smp.c          |  81 +++++++++++++++++++------
 4 files changed, 125 insertions(+), 106 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 01/12] smp: Disable preemption explicitly in __csd_lock_wait
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 02/12] smp: Enable preemption early in smp_call_function_single Chuyi Zhou
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

The latter patches will enable preemption before csd_lock_wait(), which
could break csdlock_debug. Because the slice of other tasks on the CPU may
be accounted between ktime_get_mono_fast_ns() calls. Disable preemption
explicitly in __csd_lock_wait(). This is a preparation for the next
patches.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
---
 kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index f349960f79ca..fc1f7a964616 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -323,6 +323,8 @@ static void __csd_lock_wait(call_single_data_t *csd)
 	int bug_id = 0;
 	u64 ts0, ts1;
 
+	guard(preempt)();
+
 	ts1 = ts0 = ktime_get_mono_fast_ns();
 	for (;;) {
 		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages))
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 02/12] smp: Enable preemption early in smp_call_function_single
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 01/12] smp: Disable preemption explicitly in __csd_lock_wait Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 03/12] smp: Remove get_cpu from smp_call_function_any Chuyi Zhou
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Now smp_call_function_single() disables preemption mainly for the following
reasons:

- To protect the per-cpu csd_data from concurrent modification by other
tasks on the current CPU in the !wait case. For the wait case,
synchronization is not a concern as on-stack csd is used.

- To prevent the remote online CPU from being offlined. Specifically, we
want to ensure that no new IPIs are queued after smpcfd_dying_cpu() has
finished.

Disabling preemption for the entire execution is unnecessary, especially
csd_lock_wait() part does not require preemption protection. This patch
enables preemption before csd_lock_wait() to reduce the preemption-disabled
critical section.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
 kernel/smp.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index fc1f7a964616..b603d4229f95 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -685,11 +685,16 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 
 	err = generic_exec_single(cpu, csd);
 
+	/*
+	 * @csd is stack-allocated when @wait is true. No concurrent access
+	 * except from the IPI completion path, so we can re-enable preemption
+	 * early to reduce latency.
+	 */
+	put_cpu();
+
 	if (wait)
 		csd_lock_wait(csd);
 
-	put_cpu();
-
 	return err;
 }
 EXPORT_SYMBOL(smp_call_function_single);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 03/12] smp: Remove get_cpu from smp_call_function_any
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 01/12] smp: Disable preemption explicitly in __csd_lock_wait Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 02/12] smp: Enable preemption early in smp_call_function_single Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond Chuyi Zhou
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Now smp_call_function_single() would enable preemption before
csd_lock_wait() to reduce the critical section. To allow callers of
smp_call_function_any() to also benefit from this optimization, remove
get_cpu()/put_cpu() from smp_call_function_any().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
---
 kernel/smp.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b603d4229f95..80daf9dd4a25 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -761,16 +761,26 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async);
 int smp_call_function_any(const struct cpumask *mask,
 			  smp_call_func_t func, void *info, int wait)
 {
+	bool local = true;
 	unsigned int cpu;
 	int ret;
 
-	/* Try for same CPU (cheapest) */
+	/*
+	 * Prevent migration to another CPU after selecting the current CPU
+	 * as the target.
+	 */
 	cpu = get_cpu();
-	if (!cpumask_test_cpu(cpu, mask))
+
+	/* Try for same CPU (cheapest) */
+	if (!cpumask_test_cpu(cpu, mask)) {
 		cpu = sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu));
+		local = false;
+		put_cpu();
+	}
 
 	ret = smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
+	if (local)
+		put_cpu();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (2 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 03/12] smp: Remove get_cpu from smp_call_function_any Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-10  7:12   ` Muchun Song
  2026-03-02  7:52 ` [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu Chuyi Zhou
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

This patch use on-stack cpumask to replace percpu cfd cpumask in
smp_call_function_many_cond(). Note that when both CONFIG_CPUMASK_OFFSTACK
and PREEMPT_RT are enabled, allocation during preempt-disabled section
would break RT. Therefore, only do this when CONFIG_CPUMASK_OFFSTACK=n.
This is a preparation for enabling preemption during csd_lock_wait() in
smp_call_function_many_cond().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/smp.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 80daf9dd4a25..9728ba55944d 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -799,14 +799,25 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 					unsigned int scf_flags,
 					smp_cond_func_t cond_func)
 {
+	bool preemptible_wait = !IS_ENABLED(CONFIG_CPUMASK_OFFSTACK);
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
 	bool wait = scf_flags & SCF_WAIT;
+	cpumask_var_t cpumask_stack;
+	struct cpumask *cpumask;
 	int nr_cpus = 0;
 	bool run_remote = false;
 
 	lockdep_assert_preemption_disabled();
 
+	cfd = this_cpu_ptr(&cfd_data);
+	cpumask = cfd->cpumask;
+
+	if (preemptible_wait) {
+		BUILD_BUG_ON(!alloc_cpumask_var(&cpumask_stack, GFP_ATOMIC));
+		cpumask = cpumask_stack;
+	}
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -827,16 +838,15 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 
 	/* Check if we need remote execution, i.e., any CPU excluding this one. */
 	if (cpumask_any_and_but(mask, cpu_online_mask, this_cpu) < nr_cpu_ids) {
-		cfd = this_cpu_ptr(&cfd_data);
-		cpumask_and(cfd->cpumask, mask, cpu_online_mask);
-		__cpumask_clear_cpu(this_cpu, cfd->cpumask);
+		cpumask_and(cpumask, mask, cpu_online_mask);
+		__cpumask_clear_cpu(this_cpu, cpumask);
 
 		cpumask_clear(cfd->cpumask_ipi);
-		for_each_cpu(cpu, cfd->cpumask) {
+		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd = per_cpu_ptr(cfd->csd, cpu);
 
 			if (cond_func && !cond_func(cpu, info)) {
-				__cpumask_clear_cpu(cpu, cfd->cpumask);
+				__cpumask_clear_cpu(cpu, cpumask);
 				continue;
 			}
 
@@ -887,13 +897,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	}
 
 	if (run_remote && wait) {
-		for_each_cpu(cpu, cfd->cpumask) {
+		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd;
 
 			csd = per_cpu_ptr(cfd->csd, cpu);
 			csd_lock_wait(csd);
 		}
 	}
+
+	if (preemptible_wait)
+		free_cpumask_var(cpumask_stack);
 }
 
 /**
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (3 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-10  7:05   ` Muchun Song
  2026-03-02  7:52 ` [PATCH v2 06/12] smp: Enable preemption early in smp_call_function_many_cond Chuyi Zhou
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Use rcu_read_lock to protect the hole scope of smp_call_function_many_cond
and wait for all read critical sections to exit before releasing percpu csd
data. This is a preparation for enabling preemption during csd_lock_wait().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 9728ba55944d..ad6073b71bbd 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -77,6 +77,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
 {
 	struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
 
+	synchronize_rcu();
 	free_cpumask_var(cfd->cpumask);
 	free_cpumask_var(cfd->cpumask_ipi);
 	free_percpu(cfd->csd);
@@ -810,6 +811,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 
 	lockdep_assert_preemption_disabled();
 
+	rcu_read_lock();
 	cfd = this_cpu_ptr(&cfd_data);
 	cpumask = cfd->cpumask;
 
@@ -907,6 +909,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 
 	if (preemptible_wait)
 		free_cpumask_var(cpumask_stack);
+	rcu_read_unlock();
 }
 
 /**
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 06/12] smp: Enable preemption early in smp_call_function_many_cond
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (4 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function Chuyi Zhou
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Now smp_call_function_many_cond() disables preemption mainly for the
following reasons:

- To prevent the remote online CPU from going offline. Specifically, we
want to ensure that no new csds are queued after smpcfd_dying_cpu() has
finished. Therefore, preemption must be disabled until all necessary IPIs
are sent.

- To prevent migration to another CPU, which also implicitly prevents the
current CPU from going offline (since stop_machine requires preempting the
current task to execute offline callbacks).

- To protect the per-cpu cfd_data from concurrent modification by other
smp_call_*() on the current CPU. cfd_data contains cpumasks and per-cpu
csds. Before enqueueing a csd, we block on the csd_lock() to ensure the
previous asyc csd->func() has completed, and then initialize csd->func and
csd->info. After sending the IPI, we spin-wait for the remote CPU to call
csd_unlock(). Actually the csd_lock mechanism already guarantees csd
serialization. If preemption occurs during csd_lock_wait, other concurrent
smp_call_function_many_cond calls will simply block until the previous
csd->func() completes:

task A                    task B

sd->func = fun_a
send ipis

                preempted by B
               --------------->
                        csd_lock(csd); // block until last
                                       // fun_a finished

                        csd->func = func_b;
                        csd->info = info;
                            ...
                        send ipis

                switch back to A
                <---------------

csd_lock_wait(csd); // block until remote finish func_*

This patch enables preemption before csd_lock_wait() which makes the
potentially unpredictable csd_lock_wait() preemptible and migratable.
Note that being migrated to another CPU and calling csd_lock_wait() may
cause UAF due to smpcfd_dead_cpu() during the current CPU offline process.
Previous patch used the RCU mechanism to synchronize csd_lock_wait()
with smpcfd_dead_cpu() to prevent the above UAF issue.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/smp.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index ad6073b71bbd..18e7e4a8f1b6 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -801,7 +801,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 					smp_cond_func_t cond_func)
 {
 	bool preemptible_wait = !IS_ENABLED(CONFIG_CPUMASK_OFFSTACK);
-	int cpu, last_cpu, this_cpu = smp_processor_id();
+	int cpu, last_cpu, this_cpu;
 	struct call_function_data *cfd;
 	bool wait = scf_flags & SCF_WAIT;
 	cpumask_var_t cpumask_stack;
@@ -809,9 +809,9 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	int nr_cpus = 0;
 	bool run_remote = false;
 
-	lockdep_assert_preemption_disabled();
-
 	rcu_read_lock();
+	this_cpu = get_cpu();
+
 	cfd = this_cpu_ptr(&cfd_data);
 	cpumask = cfd->cpumask;
 
@@ -898,6 +898,19 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		local_irq_restore(flags);
 	}
 
+	/*
+	 * We may block in csd_lock_wait() for a significant amount of time,
+	 * especially when interrupts are disabled or with a large number of
+	 * remote CPUs. Try to enable preemption before csd_lock_wait().
+	 *
+	 * Use the cpumask_stack instead of cfd->cpumask to avoid concurrency
+	 * modification from tasks on the same cpu. If preemption occurs during
+	 * csd_lock_wait, other concurrent smp_call_function_many_cond() calls
+	 * will simply block until the previous csd->func() complete.
+	 */
+	if (preemptible_wait)
+		put_cpu();
+
 	if (run_remote && wait) {
 		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd;
@@ -907,7 +920,9 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		}
 	}
 
-	if (preemptible_wait)
+	if (!preemptible_wait)
+		put_cpu();
+	else
 		free_cpumask_var(cpumask_stack);
 	rcu_read_unlock();
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (5 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 06/12] smp: Enable preemption early in smp_call_function_many_cond Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-10  7:06   ` Muchun Song
  2026-03-02  7:52 ` [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask Chuyi Zhou
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Now smp_call_function_many_cond() internally handles the preemption logic,
so smp_call_function() does not need to explicitly disable preemption.
Remove preempt_{enable, disable} from smp_call_function().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/smp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 18e7e4a8f1b6..f9c0028968ef 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -966,9 +966,8 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 void smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	smp_call_function_many_cond(cpu_online_mask, func, info,
+			wait ? SCF_WAIT : 0, NULL);
 }
 EXPORT_SYMBOL(smp_call_function);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (6 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-10  7:06   ` Muchun Song
  2026-03-02  7:52 ` [PATCH v2 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one Chuyi Zhou
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Now smp_call_function_many_cond() internally handles the preemption logic,
so on_each_cpu_cond_mask does not need to explicitly disable preemption.
Remove preempt_{enable, disable} from on_each_cpu_cond_mask().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/smp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f9c0028968ef..47c3b057f57f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1086,9 +1086,7 @@ void on_each_cpu_cond_mask(smp_cond_func_t cond_func, smp_call_func_t func,
 	if (wait)
 		scf_flags |= SCF_WAIT;
 
-	preempt_disable();
 	smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);
-	preempt_enable();
 }
 EXPORT_SYMBOL(on_each_cpu_cond_mask);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (7 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack Chuyi Zhou
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Previous patches make smp_call*() handle preemption logic internally.
Now the preempt_disable() by most callers becomes unnecessary and can
therefore be removed. Remove preempt_{enable, disable} in
scftorture_invoke_one().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/scftorture.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index 327c315f411c..b87215e40be5 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -364,8 +364,6 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 	}
 	if (use_cpus_read_lock)
 		cpus_read_lock();
-	else
-		preempt_disable();
 	switch (scfsp->scfs_prim) {
 	case SCF_PRIM_RESCHED:
 		if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) {
@@ -411,13 +409,10 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 		if (!ret) {
 			if (use_cpus_read_lock)
 				cpus_read_unlock();
-			else
-				preempt_enable();
+
 			wait_for_completion(&scfcp->scfc_completion);
 			if (use_cpus_read_lock)
 				cpus_read_lock();
-			else
-				preempt_disable();
 		} else {
 			scfp->n_single_rpc_ofl++;
 			scf_add_to_free_list(scfcp);
@@ -463,8 +458,6 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra
 	}
 	if (use_cpus_read_lock)
 		cpus_read_unlock();
-	else
-		preempt_enable();
 	if (allocfail)
 		schedule_timeout_idle((1 + longwait) * HZ);  // Let no-wait handlers complete.
 	else if (!(torture_random(trsp) & 0xfff))
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (8 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02 14:58   ` Peter Zijlstra
  2026-03-02  7:52 ` [PATCH v2 11/12] x86/mm: Enable preemption during native_flush_tlb_multi Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range Chuyi Zhou
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the
stack") converted flush_tlb_info from stack variable to per-CPU variable.
This brought about a performance improvement of around 3% in extreme test.
However, it also required that all flush_tlb* operations keep preemption
disabled entirely to prevent concurrent modifications of flush_tlb_info.
flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for
all remote CPUs to complete their local TLB flushes. The process could
take tens of milliseconds when interrupts are disabled or with a large
number of remote CPUs.

From the perspective of improving kernel real-time performance, this patch
reverts flush_tlb_info back to stack variables. This is a preparation for
enabling preemption during TLB flush in next patch.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 arch/x86/mm/tlb.c | 124 ++++++++++++++++++----------------------------
 1 file changed, 49 insertions(+), 75 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 621e09d049cb..91a0fb389303 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1394,71 +1394,30 @@ void flush_tlb_multi(const struct cpumask *cpumask,
  */
 unsigned long tlb_single_page_flush_ceiling __read_mostly = 33;
 
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info);
-
-#ifdef CONFIG_DEBUG_VM
-static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx);
-#endif
-
-static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
-			unsigned long start, unsigned long end,
-			unsigned int stride_shift, bool freed_tables,
-			u64 new_tlb_gen)
+void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
+				unsigned long end, unsigned int stride_shift,
+				bool freed_tables)
 {
-	struct flush_tlb_info *info = this_cpu_ptr(&flush_tlb_info);
+	int cpu = get_cpu();
 
-#ifdef CONFIG_DEBUG_VM
-	/*
-	 * Ensure that the following code is non-reentrant and flush_tlb_info
-	 * is not overwritten. This means no TLB flushing is initiated by
-	 * interrupt handlers and machine-check exception handlers.
-	 */
-	BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) != 1);
-#endif
+	struct flush_tlb_info info = {
+		.mm = mm,
+		.stride_shift = stride_shift,
+		.freed_tables = freed_tables,
+		.trim_cpumask = 0,
+		.initiating_cpu = cpu,
+	};
 
-	/*
-	 * If the number of flushes is so large that a full flush
-	 * would be faster, do a full flush.
-	 */
 	if ((end - start) >> stride_shift > tlb_single_page_flush_ceiling) {
 		start = 0;
 		end = TLB_FLUSH_ALL;
 	}
 
-	info->start		= start;
-	info->end		= end;
-	info->mm		= mm;
-	info->stride_shift	= stride_shift;
-	info->freed_tables	= freed_tables;
-	info->new_tlb_gen	= new_tlb_gen;
-	info->initiating_cpu	= smp_processor_id();
-	info->trim_cpumask	= 0;
-
-	return info;
-}
-
-static void put_flush_tlb_info(void)
-{
-#ifdef CONFIG_DEBUG_VM
-	/* Complete reentrancy prevention checks */
-	barrier();
-	this_cpu_dec(flush_tlb_info_idx);
-#endif
-}
-
-void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
-				unsigned long end, unsigned int stride_shift,
-				bool freed_tables)
-{
-	struct flush_tlb_info *info;
-	int cpu = get_cpu();
-	u64 new_tlb_gen;
-
 	/* This is also a barrier that synchronizes with switch_mm(). */
-	new_tlb_gen = inc_mm_tlb_gen(mm);
+	info.new_tlb_gen = inc_mm_tlb_gen(mm);
 
-	info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables,
-				  new_tlb_gen);
+	info.start = start;
+	info.end = end;
 
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
@@ -1466,19 +1425,18 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	 * flush_tlb_func_local() directly in this case.
 	 */
 	if (mm_global_asid(mm)) {
-		broadcast_tlb_flush(info);
+		broadcast_tlb_flush(&info);
 	} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
-		info->trim_cpumask = should_trim_cpumask(mm);
-		flush_tlb_multi(mm_cpumask(mm), info);
+		info.trim_cpumask = should_trim_cpumask(mm);
+		flush_tlb_multi(mm_cpumask(mm), &info);
 		consider_global_asid(mm);
 	} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
-		flush_tlb_func(info);
+		flush_tlb_func(&info);
 		local_irq_enable();
 	}
 
-	put_flush_tlb_info();
 	put_cpu();
 	mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
 }
@@ -1548,19 +1506,29 @@ static void kernel_tlb_flush_range(struct flush_tlb_info *info)
 
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-	struct flush_tlb_info *info;
+	struct flush_tlb_info info = {
+		.mm = NULL,
+		.stride_shift = PAGE_SHIFT,
+		.freed_tables = false,
+		.trim_cpumask = 0,
+		.new_tlb_gen = TLB_GENERATION_INVALID
+	};
 
 	guard(preempt)();
 
-	info = get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false,
-				  TLB_GENERATION_INVALID);
+	if ((end - start) >> PAGE_SHIFT > tlb_single_page_flush_ceiling) {
+		start = 0;
+		end = TLB_FLUSH_ALL;
+	}
 
-	if (info->end == TLB_FLUSH_ALL)
-		kernel_tlb_flush_all(info);
-	else
-		kernel_tlb_flush_range(info);
+	info.initiating_cpu = smp_processor_id(),
+	info.start = start;
+	info.end = end;
 
-	put_flush_tlb_info();
+	if (info.end == TLB_FLUSH_ALL)
+		kernel_tlb_flush_all(&info);
+	else
+		kernel_tlb_flush_range(&info);
 }
 
 /*
@@ -1728,12 +1696,19 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all);
 
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 {
-	struct flush_tlb_info *info;
-
 	int cpu = get_cpu();
 
-	info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false,
-				  TLB_GENERATION_INVALID);
+	struct flush_tlb_info info = {
+		.start = 0,
+		.end = TLB_FLUSH_ALL,
+		.mm = NULL,
+		.stride_shift = 0,
+		.freed_tables = false,
+		.new_tlb_gen = TLB_GENERATION_INVALID,
+		.initiating_cpu = cpu,
+		.trim_cpumask = 0,
+	};
+
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
 	 * a local TLB flush is needed. Optimize this use-case by calling
@@ -1743,17 +1718,16 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages = false;
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
-		flush_tlb_multi(&batch->cpumask, info);
+		flush_tlb_multi(&batch->cpumask, &info);
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
-		flush_tlb_func(info);
+		flush_tlb_func(&info);
 		local_irq_enable();
 	}
 
 	cpumask_clear(&batch->cpumask);
 
-	put_flush_tlb_info();
 	put_cpu();
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 11/12] x86/mm: Enable preemption during native_flush_tlb_multi
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (9 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-02  7:52 ` [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range Chuyi Zhou
  11 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

flush_tlb_mm_range()/arch_tlbbatch_flush() -> native_flush_tlb_multi() is a
common path in real production environments. When pages are reclaimed or
process exit, native_flush_tlb_multi() sends IPIs to remote CPUs and waits
for all remote CPUs to complete their local TLB flushes. The overall
latency may reach tens of milliseconds due to a large number of remote CPUs
and other factors (such as interrupts being disabled). Since
flush_tlb_mm_range()/arch_tlbbatch_flush() always disable preemption, which
may cause increased scheduling latency for other threads on the current
CPU.

Previous patch converted flush_tlb_info from per-cpu variable to on-stack
variable. Additionally, it's no longer necessary to explicitly disable
preemption before calling smp_call*() since they internally handles the
preemption logic. Now is's safe to enable preemption during
native_flush_tlb_multi().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 arch/x86/kernel/kvm.c | 4 +++-
 arch/x86/mm/tlb.c     | 9 +++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3bc062363814..4f7f4c1149b9 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -668,8 +668,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *cpumask,
 	u8 state;
 	int cpu;
 	struct kvm_steal_time *src;
-	struct cpumask *flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
+	struct cpumask *flushmask;
 
+	guard(preempt)();
+	flushmask = this_cpu_cpumask_var_ptr(__pv_cpu_mask);
 	cpumask_copy(flushmask, cpumask);
 	/*
 	 * We have to call flush only on online vCPUs. And
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 91a0fb389303..86d9c208e424 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1427,9 +1427,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	if (mm_global_asid(mm)) {
 		broadcast_tlb_flush(&info);
 	} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
+		put_cpu();
 		info.trim_cpumask = should_trim_cpumask(mm);
 		flush_tlb_multi(mm_cpumask(mm), &info);
 		consider_global_asid(mm);
+		goto invalidate;
 	} else if (mm == this_cpu_read(cpu_tlbstate.loaded_mm)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
@@ -1438,6 +1440,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	}
 
 	put_cpu();
+invalidate:
 	mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
 }
 
@@ -1718,7 +1721,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages = false;
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
+		put_cpu();
 		flush_tlb_multi(&batch->cpumask, &info);
+		goto clear;
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
@@ -1726,9 +1731,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 		local_irq_enable();
 	}
 
-	cpumask_clear(&batch->cpumask);
-
 	put_cpu();
+clear:
+	cpumask_clear(&batch->cpumask);
 }
 
 /*
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range
  2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
                   ` (10 preceding siblings ...)
  2026-03-02  7:52 ` [PATCH v2 11/12] x86/mm: Enable preemption during native_flush_tlb_multi Chuyi Zhou
@ 2026-03-02  7:52 ` Chuyi Zhou
  2026-03-10  6:35   ` kernel test robot
  11 siblings, 1 reply; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-02  7:52 UTC (permalink / raw)
  To: tglx, mingo, luto, peterz, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt
  Cc: linux-kernel, Chuyi Zhou

flush_tlb_kernel_range() is invoked when kernel memory mapping changes.
On x86 platforms without the INVLPGB feature enabled, we need to send IPIs
to every online CPU and synchronously wait for them to complete
do_kernel_range_flush(). This process can be time-consuming due to factors
such as a large number of CPUs or other issues (like interrupts being
disabled). flush_tlb_kernel_range() always disables preemption, this may
affect the scheduling latency of other tasks on the current CPU.

Previous patch convert flush_tlb_info from per-cpu variable to on-stack
variable. Additionally, it's no longer necessary to explicitly disable
preemption before calling smp_call*() since they internally handles the
preemption logic. Now is's safe to enable preemption during
flush_tlb_kernel_range().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 arch/x86/mm/tlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 86d9c208e424..48371eb36773 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1467,6 +1467,8 @@ static void invlpgb_kernel_range_flush(struct flush_tlb_info *info)
 {
 	unsigned long addr, nr;

+	guard(preempt)();
+
 	for (addr = info->start; addr < info->end; addr += nr << PAGE_SHIFT) {
 		nr = (info->end - addr) >> PAGE_SHIFT;

@@ -1517,8 +1519,6 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 		.new_tlb_gen = TLB_GENERATION_INVALID
 	};

-	guard(preempt)();
-
 	if ((end - start) >> PAGE_SHIFT > tlb_single_page_flush_ceiling) {
 		start = 0;
 		end = TLB_FLUSH_ALL;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack
  2026-03-02  7:52 ` [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack Chuyi Zhou
@ 2026-03-02 14:58   ` Peter Zijlstra
  2026-03-03  3:20     ` Chuyi Zhou
  2026-03-05  7:01     ` Chuyi Zhou
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2026-03-02 14:58 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: tglx, mingo, luto, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt, linux-kernel

On Mon, Mar 02, 2026 at 03:52:14PM +0800, Chuyi Zhou wrote:
> Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the
> stack") converted flush_tlb_info from stack variable to per-CPU variable.
> This brought about a performance improvement of around 3% in extreme test.
> However, it also required that all flush_tlb* operations keep preemption
> disabled entirely to prevent concurrent modifications of flush_tlb_info.
> flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for
> all remote CPUs to complete their local TLB flushes. The process could
> take tens of milliseconds when interrupts are disabled or with a large
> number of remote CPUs.
> 
> From the perspective of improving kernel real-time performance, this patch
> reverts flush_tlb_info back to stack variables. This is a preparation for
> enabling preemption during TLB flush in next patch.

This isn't properly justified. You've got to show that 'most' workloads
are not adversely affected by this.

Most people still swing towards performance most of the time.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack
  2026-03-02 14:58   ` Peter Zijlstra
@ 2026-03-03  3:20     ` Chuyi Zhou
  2026-03-05  7:01     ` Chuyi Zhou
  1 sibling, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-03  3:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, luto, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt, linux-kernel

Hi Peter

在 2026/3/2 22:58, Peter Zijlstra 写道:
> On Mon, Mar 02, 2026 at 03:52:14PM +0800, Chuyi Zhou wrote:
>> Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the
>> stack") converted flush_tlb_info from stack variable to per-CPU variable.
>> This brought about a performance improvement of around 3% in extreme test.
>> However, it also required that all flush_tlb* operations keep preemption
>> disabled entirely to prevent concurrent modifications of flush_tlb_info.
>> flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for
>> all remote CPUs to complete their local TLB flushes. The process could
>> take tens of milliseconds when interrupts are disabled or with a large
>> number of remote CPUs.
>>
>>  From the perspective of improving kernel real-time performance, this patch
>> reverts flush_tlb_info back to stack variables. This is a preparation for
>> enabling preemption during TLB flush in next patch.
> 
> This isn't properly justified. You've got to show that 'most' workloads
> are not adversely affected by this.
> 
> Most people still swing towards performance most of the time.

I will try to reproduce the microbenchmarks mentioned in Commit 
3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack").

In addition, I will check whether there are suitable general memory 
tests available in the mmtests framework.

I will send the results afterwards.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack
  2026-03-02 14:58   ` Peter Zijlstra
  2026-03-03  3:20     ` Chuyi Zhou
@ 2026-03-05  7:01     ` Chuyi Zhou
  1 sibling, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-05  7:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, luto, paulmck, muchun.song, bp, dave.hansen,
	pbonzini, bigeasy, clrkwllms, rostedt, linux-kernel

Hi Peter,

在 2026/3/2 22:58, Peter Zijlstra 写道:
> On Mon, Mar 02, 2026 at 03:52:14PM +0800, Chuyi Zhou wrote:
>> Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the
>> stack") converted flush_tlb_info from stack variable to per-CPU variable.
>> This brought about a performance improvement of around 3% in extreme test.
>> However, it also required that all flush_tlb* operations keep preemption
>> disabled entirely to prevent concurrent modifications of flush_tlb_info.
>> flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for
>> all remote CPUs to complete their local TLB flushes. The process could
>> take tens of milliseconds when interrupts are disabled or with a large
>> number of remote CPUs.
>>
>>  From the perspective of improving kernel real-time performance, this patch
>> reverts flush_tlb_info back to stack variables. This is a preparation for
>> enabling preemption during TLB flush in next patch.
> 
> This isn't properly justified. You've got to show that 'most' workloads
> are not adversely affected by this.
> 
> Most people still swing towards performance most of the time.

I attempted to reproduce the microbenchmark mentioned in Commit 
3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the 
stack") using the script below.

The baseline was tip/sched/core: f74d204baf9f (sched/hrtick: Mark
hrtick_clear() as always used).

The test environment was an Ice Lake system (Intel(R) Xeon(R) Platinum
8336C) with 128 CPUs and 2 NUMA nodes.

Using the per-CPU flush_tlb_info showed only a very marginal performance
advantage, approximately 1%.

                             base            on-stack
                             ----            ---------
       avg (usec/op)         5.9362           5.9956   (+1%)
       stddev                0.0240           0.0096

I also tested with mmtest/stress-ng-madvise, which randomly calls 
madvise on pages within a mmap range and triggers a large number of 
high-frequency TLB flushes. However, I did not observe any significant 
difference.

				 baseline              on-stack

Amean     bops-madvise-1        13.64 (   0.00%)      13.56 (   0.59%)
Amean     bops-madvise-2        27.32 (   0.00%)      27.26 (   0.24%)
Amean     bops-madvise-4        53.35 (   0.00%)      53.54 (  -0.35%)
Amean     bops-madvise-8        103.09 (   0.00%)     103.30 (  -0.20%)
Amean     bops-madvise-16       191.88 (   0.00%)     191.75 (   0.07%)
Amean     bops-madvise-32       287.98 (   0.00%)     291.01 *  -1.05%*
Amean     bops-madvise-64       365.84 (   0.00%)     368.09 *  -0.61%*
Amean     bops-madvise-128      422.72 (   0.00%)     423.47 (  -0.18%)
Amean     bops-madvise-256      435.61 (   0.00%)     435.63 (  -0.01%)


Thanks.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <unistd.h>

#define NUM_OPS 1000000
#define NUM_THREADS 3
#define NUM_RUNS 5
#define PAGE_SIZE 4096

volatile int stop_threads = 0;


void *busy_wait_thread(void *arg) {
     while (!stop_threads) {
         __asm__ volatile ("nop");
     }
     return NULL;
}

long long get_usec() {
     struct timeval tv;
     gettimeofday(&tv, NULL);
     return tv.tv_sec * 1000000LL + tv.tv_usec;
}

int main() {
     pthread_t threads[NUM_THREADS];
     char *addr;
     int i, r;


     addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | 
MAP_ANONYMOUS, -1, 0);
     if (addr == MAP_FAILED) {
         perror("mmap");
         exit(1);
     }


     for (i = 0; i < NUM_THREADS; i++) {
         if (pthread_create(&threads[i], NULL, busy_wait_thread, NULL) 
!= 0) {
             perror("pthread_create");
             exit(1);
         }
     }

     printf("Running benchmark: %d runs, %d ops each, %d background 
threads\n", NUM_RUNS, NUM_OPS, NUM_THREADS);

     for (r = 0; r < NUM_RUNS; r++) {
         long long start, end;

         start = get_usec();
         for (i = 0; i < NUM_OPS; i++) {

             addr[0] = 1;

             if (madvise(addr, PAGE_SIZE, MADV_DONTNEED) != 0) {
                 perror("madvise");
                 exit(1);
             }
         }
         end = get_usec();

         double duration = (double)(end - start);
         double avg_lat = duration / NUM_OPS;
         printf("Run %d: Total time %.2f us, Avg latency %.4f us/op\n", 
r + 1, duration, avg_lat);
     }


     stop_threads = 1;
     for (i = 0; i < NUM_THREADS; i++) {
         pthread_join(threads[i], NULL);
     }

     munmap(addr, PAGE_SIZE);
     return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range
  2026-03-02  7:52 ` [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range Chuyi Zhou
@ 2026-03-10  6:35   ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2026-03-10  6:35 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: oe-lkp, lkp, linux-kernel, tglx, mingo, luto, peterz, paulmck,
	muchun.song, bp, dave.hansen, pbonzini, bigeasy, clrkwllms,
	rostedt, Chuyi Zhou, oliver.sang



Hello,

kernel test robot noticed "BUG:using_smp_processor_id()in_preemptible" on:

commit: 71316421085260ada767336b7d506cf68cd54501 ("[PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range")
url: https://github.com/intel-lab-lkp/linux/commits/Chuyi-Zhou/smp-Disable-preemption-explicitly-in-__csd_lock_wait/20260302-155954
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 07fff0001f755de9063652eb09f9c46a108fd7e2
patch link: https://lore.kernel.org/all/20260302075216.2170675-13-zhouchuyi@bytedance.com/
patch subject: [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range

in testcase: boot

config: i386-randconfig-012-20250528
compiler: gcc-14
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202603101347.ef3b298c-lkp@intel.com



[    0.311796][    T1] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
[    0.312917][    T1] caller is debug_smp_processor_id (lib/smp_processor_id.c:59)
[    0.313674][    T1] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Tainted: G                T   7.0.0-rc1-00532-g713164210852 #1 PREEMPT(full)
[    0.313680][    T1] Tainted: [T]=RANDSTRUCT
[    0.313682][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.313684][    T1] Call Trace:
[    0.313686][    T1]  ? show_stack (arch/x86/kernel/dumpstack.c:338)
[    0.313701][    T1]  dump_stack_lvl (lib/dump_stack.c:122)
[    0.313715][    T1]  dump_stack (lib/dump_stack.c:130)
[    0.313723][    T1]  check_preemption_disabled (lib/smp_processor_id.c:47)
[    0.313737][    T1]  debug_smp_processor_id (lib/smp_processor_id.c:59)
[    0.313743][    T1]  flush_tlb_kernel_range (arch/x86/mm/tlb.c:1528)
[    0.313764][    T1]  cpa_flush (arch/x86/mm/pat/set_memory.c:455)
[    0.313774][    T1]  change_page_attr_set_clr (arch/x86/mm/pat/set_memory.c:2054)
[    0.313802][    T1]  set_memory_ro (arch/x86/mm/pat/set_memory.c:2305)
[    0.313815][    T1]  bpf_prog_select_runtime (kernel/bpf/core.c:2560 (discriminator 1))
[    0.313829][    T1]  bpf_prepare_filter (net/core/filter.c:1323 net/core/filter.c:1371)
[    0.313847][    T1]  bpf_prog_create (net/core/filter.c:1412 (discriminator 1))
[    0.313858][    T1]  ptp_classifier_init (net/core/ptp_classifier.c:227 (discriminator 2))
[    0.313863][    T1]  ? find_next_fd (include/linux/find.h:192 fs/file.c:564)
[    0.313871][    T1]  sock_init (net/socket.c:3314)
[    0.313882][    T1]  ? sock_alloc_inode (net/socket.c:347)
[    0.313888][    T1]  ? damon_sample_wsse_init (net/socket.c:3268)
[    0.313895][    T1]  do_one_initcall (init/main.c:1382)
[    0.313904][    T1]  ? parse_args (kernel/params.c:130 (discriminator 2) kernel/params.c:186 (discriminator 2))
[    0.313923][    T1]  ? debug_smp_processor_id (lib/smp_processor_id.c:59)
[    0.313930][    T1]  ? rcu_is_watching (include/linux/context_tracking.h:128 kernel/rcu/tree.c:752)
[    0.313948][    T1]  do_initcalls (init/main.c:1443 (discriminator 3) init/main.c:1460 (discriminator 3))
[    0.313964][    T1]  kernel_init_freeable (init/main.c:1694)
[    0.313969][    T1]  ? rest_init (init/main.c:1574)
[    0.313973][    T1]  kernel_init (init/main.c:1584)
[    0.313978][    T1]  ret_from_fork (arch/x86/kernel/process.c:164)
[    0.313982][    T1]  ? rest_init (init/main.c:1574)
[    0.313989][    T1]  ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
[    0.313996][    T1]  entry_INT80_32 (arch/x86/entry/entry_32.S:945)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260310/202603101347.ef3b298c-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu
  2026-03-02  7:52 ` [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu Chuyi Zhou
@ 2026-03-10  7:05   ` Muchun Song
  2026-03-10  7:26     ` Chuyi Zhou
  0 siblings, 1 reply; 22+ messages in thread
From: Muchun Song @ 2026-03-10  7:05 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: tglx, mingo, luto, peterz, paulmck, bp, dave.hansen, pbonzini,
	bigeasy, clrkwllms, rostedt, linux-kernel



> On Mar 2, 2026, at 15:52, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> 
> Use rcu_read_lock to protect the hole scope of smp_call_function_many_cond
                                  ^
                                 whole

BTW, The protection we often refer to actually pertains to data, not code.
Therefore, the rcu_read_lock() here is actually preventing smp_call_function_many_cond()
from accessing percpu csd data that has already been freed (which is
what smpcfd_dead_cpu() does).

> and wait for all read critical sections to exit before releasing percpu csd
> data. This is a preparation for enabling preemption during csd_lock_wait().
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
> kernel/smp.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 9728ba55944d..ad6073b71bbd 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -77,6 +77,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
> {
> 	struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
> 
> + 	synchronize_rcu();
> 	free_cpumask_var(cfd->cpumask);
> 	free_cpumask_var(cfd->cpumask_ipi);
> 	free_percpu(cfd->csd);
> @@ -810,6 +811,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> 
> 	lockdep_assert_preemption_disabled();
> 
> + 	rcu_read_lock();
> 	cfd = this_cpu_ptr(&cfd_data);
> 	cpumask = cfd->cpumask;
> 
> @@ -907,6 +909,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> 
> 	if (preemptible_wait)
> 		free_cpumask_var(cpumask_stack);
> + 	rcu_read_unlock();

We could call rcu_read_unlock() before free_cpumask_var() above.

> }
> 
> /**
> -- 
> 2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function
  2026-03-02  7:52 ` [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function Chuyi Zhou
@ 2026-03-10  7:06   ` Muchun Song
  0 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2026-03-10  7:06 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: tglx, mingo, luto, peterz, paulmck, bp, dave.hansen, pbonzini,
	bigeasy, clrkwllms, rostedt, linux-kernel



> On Mar 2, 2026, at 15:52, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> 
> Now smp_call_function_many_cond() internally handles the preemption logic,
> so smp_call_function() does not need to explicitly disable preemption.
> Remove preempt_{enable, disable} from smp_call_function().
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask
  2026-03-02  7:52 ` [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask Chuyi Zhou
@ 2026-03-10  7:06   ` Muchun Song
  0 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2026-03-10  7:06 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: tglx, mingo, luto, peterz, paulmck, bp, dave.hansen, pbonzini,
	bigeasy, clrkwllms, rostedt, linux-kernel



> On Mar 2, 2026, at 15:52, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> 
> Now smp_call_function_many_cond() internally handles the preemption logic,
> so on_each_cpu_cond_mask does not need to explicitly disable preemption.
> Remove preempt_{enable, disable} from on_each_cpu_cond_mask().
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond
  2026-03-02  7:52 ` [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond Chuyi Zhou
@ 2026-03-10  7:12   ` Muchun Song
  0 siblings, 0 replies; 22+ messages in thread
From: Muchun Song @ 2026-03-10  7:12 UTC (permalink / raw)
  To: Chuyi Zhou
  Cc: tglx, mingo, luto, peterz, paulmck, bp, dave.hansen, pbonzini,
	bigeasy, clrkwllms, rostedt, linux-kernel



> On Mar 2, 2026, at 15:52, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> 
> This patch use on-stack cpumask to replace percpu cfd cpumask in
> smp_call_function_many_cond(). Note that when both CONFIG_CPUMASK_OFFSTACK
> and PREEMPT_RT are enabled, allocation during preempt-disabled section
> would break RT. Therefore, only do this when CONFIG_CPUMASK_OFFSTACK=n.
> This is a preparation for enabling preemption during csd_lock_wait() in
> smp_call_function_many_cond().
> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>

Reviewed-by: Muchun Song <muchun.song@linux.dev>

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu
  2026-03-10  7:05   ` Muchun Song
@ 2026-03-10  7:26     ` Chuyi Zhou
  0 siblings, 0 replies; 22+ messages in thread
From: Chuyi Zhou @ 2026-03-10  7:26 UTC (permalink / raw)
  To: Muchun Song
  Cc: tglx, mingo, luto, peterz, paulmck, bp, dave.hansen, pbonzini,
	bigeasy, clrkwllms, rostedt, linux-kernel

Hi Muchun,

在 2026/3/10 15:05, Muchun Song 写道:
> 
> 
>> On Mar 2, 2026, at 15:52, Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> Use rcu_read_lock to protect the hole scope of smp_call_function_many_cond
>                                    ^
>                                   whole
> 
> BTW, The protection we often refer to actually pertains to data, not code.
> Therefore, the rcu_read_lock() here is actually preventing smp_call_function_many_cond()
> from accessing percpu csd data that has already been freed (which is
> what smpcfd_dead_cpu() does).
> 

OK.

>> and wait for all read critical sections to exit before releasing percpu csd
>> data. This is a preparation for enabling preemption during csd_lock_wait().
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>> kernel/smp.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index 9728ba55944d..ad6073b71bbd 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -77,6 +77,7 @@ int smpcfd_dead_cpu(unsigned int cpu)
>> {
>> 	struct call_function_data *cfd = &per_cpu(cfd_data, cpu);
>>
>> + 	synchronize_rcu();
>> 	free_cpumask_var(cfd->cpumask);
>> 	free_cpumask_var(cfd->cpumask_ipi);
>> 	free_percpu(cfd->csd);
>> @@ -810,6 +811,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>>
>> 	lockdep_assert_preemption_disabled();
>>
>> + 	rcu_read_lock();
>> 	cfd = this_cpu_ptr(&cfd_data);
>> 	cpumask = cfd->cpumask;
>>
>> @@ -907,6 +909,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>>
>> 	if (preemptible_wait)
>> 		free_cpumask_var(cpumask_stack);
>> + 	rcu_read_unlock();
> 
> We could call rcu_read_unlock() before free_cpumask_var() above.

I will update in the next version.

Thanks.

> 
>> }
>>
>> /**
>> -- 
>> 2.20.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-03-10  7:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02  7:52 [PATCH v2 00/12] Allow preemption during IPI completion waiting to improve real-time performance Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 01/12] smp: Disable preemption explicitly in __csd_lock_wait Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 02/12] smp: Enable preemption early in smp_call_function_single Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 03/12] smp: Remove get_cpu from smp_call_function_any Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond Chuyi Zhou
2026-03-10  7:12   ` Muchun Song
2026-03-02  7:52 ` [PATCH v2 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu Chuyi Zhou
2026-03-10  7:05   ` Muchun Song
2026-03-10  7:26     ` Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 06/12] smp: Enable preemption early in smp_call_function_many_cond Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 07/12] smp: Remove preempt_disable from smp_call_function Chuyi Zhou
2026-03-10  7:06   ` Muchun Song
2026-03-02  7:52 ` [PATCH v2 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask Chuyi Zhou
2026-03-10  7:06   ` Muchun Song
2026-03-02  7:52 ` [PATCH v2 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 10/12] x86/mm: Move flush_tlb_info back to the stack Chuyi Zhou
2026-03-02 14:58   ` Peter Zijlstra
2026-03-03  3:20     ` Chuyi Zhou
2026-03-05  7:01     ` Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 11/12] x86/mm: Enable preemption during native_flush_tlb_multi Chuyi Zhou
2026-03-02  7:52 ` [PATCH v2 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range Chuyi Zhou
2026-03-10  6:35   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox