* [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch
@ 2025-10-30 13:56 Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 1/4] riscv: mm: Introduce percpu loaded_asid Xu Lu
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Xu Lu @ 2025-10-30 13:56 UTC (permalink / raw)
To: pjw, palmer, aou, alex, apatel, guoren; +Cc: linux-riscv, linux-kernel, Xu Lu
When need to flush tlb of a remote cpu, there is no need to send an IPI
if the target cpu is not using the asid we want to flush. Instead, we
can cache the tlb flush info in percpu buffer, and defer the tlb flush
to the next context_switch.
This reduces the number of IPI due to tlb flush:
* ltp - mmapstress01
Before: ~108k
After: ~46k
Future plan in the next version:
- This patch series reduces IPI by deferring tlb flush to
context_switch. It does not clear the mm_cpumask of target mm_struct. In
the next version, I will apply a threshold to the number of ASIDs
maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
has not been used for the longest time will be flushed out. And current
cpu will be cleared in the mm_cpumask.
Thanks in advance for your comments.
Xu Lu (4):
riscv: mm: Introduce percpu loaded_asid
riscv: mm: Introduce percpu tlb flush queue
riscv: mm: Enqueue tlbflush info if task is not running on target cpu
riscv: mm: Perform tlb flush during context_switch
arch/riscv/include/asm/mmu_context.h | 1 +
arch/riscv/include/asm/tlbflush.h | 4 ++
arch/riscv/mm/context.c | 10 ++++
arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++-
4 files changed, 90 insertions(+), 1 deletion(-)
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH v1 1/4] riscv: mm: Introduce percpu loaded_asid
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
@ 2025-10-30 13:56 ` Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 2/4] riscv: mm: Introduce percpu tlb flush queue Xu Lu
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Xu Lu @ 2025-10-30 13:56 UTC (permalink / raw)
To: pjw, palmer, aou, alex, apatel, guoren; +Cc: linux-riscv, linux-kernel, Xu Lu
The percpu loaded_asid records the asid currently used by each CPU.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
arch/riscv/include/asm/mmu_context.h | 1 +
arch/riscv/mm/context.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/arch/riscv/include/asm/mmu_context.h b/arch/riscv/include/asm/mmu_context.h
index 8c4bc49a3a0f5..fd532f8e8d057 100644
--- a/arch/riscv/include/asm/mmu_context.h
+++ b/arch/riscv/include/asm/mmu_context.h
@@ -39,6 +39,7 @@ static inline int init_new_context(struct task_struct *tsk,
}
DECLARE_STATIC_KEY_FALSE(use_asid_allocator);
+DECLARE_PER_CPU(unsigned long, loaded_asid);
#ifdef CONFIG_RISCV_ISA_SUPM
#define mm_untag_mask mm_untag_mask
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 55c20ad1f7444..4d5792c3a8c19 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -32,6 +32,8 @@ static unsigned long *context_asid_map;
static DEFINE_PER_CPU(atomic_long_t, active_context);
static DEFINE_PER_CPU(unsigned long, reserved_context);
+DEFINE_PER_CPU(unsigned long, loaded_asid) = 0;
+
static bool check_update_reserved_context(unsigned long cntx,
unsigned long newcntx)
{
@@ -193,6 +195,8 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
(cntx2asid(cntx) << SATP_ASID_SHIFT) |
satp_mode);
+ this_cpu_write(loaded_asid, cntx2asid(cntx));
+
if (need_flush_tlb)
local_flush_tlb_all();
}
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH v1 2/4] riscv: mm: Introduce percpu tlb flush queue
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 1/4] riscv: mm: Introduce percpu loaded_asid Xu Lu
@ 2025-10-30 13:56 ` Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 3/4] riscv: mm: Enqueue tlbflush info if task is not running on target cpu Xu Lu
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Xu Lu @ 2025-10-30 13:56 UTC (permalink / raw)
To: pjw, palmer, aou, alex, apatel, guoren; +Cc: linux-riscv, linux-kernel, Xu Lu
The percpu tlb flush queue is used to buffer the tlb flush tasks that
each cpu needs to process.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
arch/riscv/mm/tlbflush.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 8404530ec00f9..aa8f1304ae5c4 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -103,6 +103,18 @@ struct flush_tlb_range_data {
unsigned long stride;
};
+#define TLB_FLUSH_QUEUE_SIZE 16
+struct tlb_flush_queue {
+ struct flush_tlb_range_data tasks[TLB_FLUSH_QUEUE_SIZE];
+ raw_spinlock_t lock;
+ unsigned int len;
+};
+
+DEFINE_PER_CPU(struct tlb_flush_queue, tlb_flush_queue) = {
+ .lock = __RAW_SPIN_LOCK_UNLOCKED(tlb_flush_queue.lock),
+ .len = 0,
+};
+
static void __ipi_flush_tlb_range_asid(void *info)
{
struct flush_tlb_range_data *d = info;
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH v1 3/4] riscv: mm: Enqueue tlbflush info if task is not running on target cpu
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 1/4] riscv: mm: Introduce percpu loaded_asid Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 2/4] riscv: mm: Introduce percpu tlb flush queue Xu Lu
@ 2025-10-30 13:56 ` Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch Xu Lu
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Xu Lu @ 2025-10-30 13:56 UTC (permalink / raw)
To: pjw, palmer, aou, alex, apatel, guoren; +Cc: linux-riscv, linux-kernel, Xu Lu
When need to flush tlb of a remote cpu, we only send ipi to the target
cpu if the task is currently running on it. Otherwise, we only enqueue
the tlb flush info in target cpu's tlb flush queue.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
arch/riscv/mm/tlbflush.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index aa8f1304ae5c4..f4333c3a6d251 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -115,6 +115,32 @@ DEFINE_PER_CPU(struct tlb_flush_queue, tlb_flush_queue) = {
.len = 0,
};
+static bool should_ipi_flush(int cpu, void *info)
+{
+ struct tlb_flush_queue *queue = per_cpu_ptr(&tlb_flush_queue, cpu);
+ struct flush_tlb_range_data *d = info;
+ unsigned long flags;
+
+ if (per_cpu(loaded_asid, cpu) == d->asid)
+ return true;
+
+ raw_spin_lock_irqsave(&queue->lock, flags);
+ if (queue->len < TLB_FLUSH_QUEUE_SIZE) {
+ queue->tasks[queue->len] = *d;
+ queue->len++;
+ } else {
+ raw_spin_unlock_irqrestore(&queue->lock, flags);
+ return true;
+ }
+ raw_spin_unlock_irqrestore(&queue->lock, flags);
+
+ /* Recheck whether loaded_asid changed during enqueueing task */
+ if (per_cpu(loaded_asid, cpu) == d->asid)
+ return true;
+
+ return false;
+}
+
static void __ipi_flush_tlb_range_asid(void *info)
{
struct flush_tlb_range_data *d = info;
@@ -152,7 +178,9 @@ static void __flush_tlb_range(struct mm_struct *mm,
ftd.start = start;
ftd.size = size;
ftd.stride = stride;
- on_each_cpu_mask(cmask, __ipi_flush_tlb_range_asid, &ftd, 1);
+ on_each_cpu_cond_mask(should_ipi_flush,
+ __ipi_flush_tlb_range_asid,
+ &ftd, 1, cmask);
}
put_cpu();
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
` (2 preceding siblings ...)
2025-10-30 13:56 ` [RFC PATCH v1 3/4] riscv: mm: Enqueue tlbflush info if task is not running on target cpu Xu Lu
@ 2025-10-30 13:56 ` Xu Lu
2025-11-03 3:44 ` Guo Ren
2025-11-03 3:44 ` [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Guo Ren
2025-11-07 1:55 ` Guo Ren
5 siblings, 1 reply; 10+ messages in thread
From: Xu Lu @ 2025-10-30 13:56 UTC (permalink / raw)
To: pjw, palmer, aou, alex, apatel, guoren; +Cc: linux-riscv, linux-kernel, Xu Lu
During context_switch, check the percpu tlb flush queue and lazily
perform tlb flush.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
---
arch/riscv/include/asm/tlbflush.h | 4 ++++
arch/riscv/mm/context.c | 6 ++++++
arch/riscv/mm/tlbflush.c | 34 +++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+)
diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
index eed0abc405143..7735c36f13d9f 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -66,6 +66,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
extern unsigned long tlb_flush_all_threshold;
+
+DECLARE_PER_CPU(bool, need_tlb_flush);
+void local_tlb_flush_queue_drain(void);
+
#else /* CONFIG_MMU */
#define local_flush_tlb_all() do { } while (0)
#endif /* CONFIG_MMU */
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 4d5792c3a8c19..82b743bc81e4c 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -199,6 +199,12 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
if (need_flush_tlb)
local_flush_tlb_all();
+
+ /* Paired with RISCV_FENCE in should_ipi_flush() */
+ RISCV_FENCE(w, r);
+
+ if (this_cpu_read(need_tlb_flush))
+ local_tlb_flush_queue_drain();
}
static void set_mm_noasid(struct mm_struct *mm)
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index f4333c3a6d251..6592f72354df9 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -115,6 +115,8 @@ DEFINE_PER_CPU(struct tlb_flush_queue, tlb_flush_queue) = {
.len = 0,
};
+DEFINE_PER_CPU(bool, need_tlb_flush) = false;
+
static bool should_ipi_flush(int cpu, void *info)
{
struct tlb_flush_queue *queue = per_cpu_ptr(&tlb_flush_queue, cpu);
@@ -134,6 +136,14 @@ static bool should_ipi_flush(int cpu, void *info)
}
raw_spin_unlock_irqrestore(&queue->lock, flags);
+ /* Ensure tlb flush info is queued before setting need_tlb_flush flag */
+ smp_wmb();
+
+ per_cpu(need_tlb_flush, cpu) = true;
+
+ /* Paired with RISCV_FENCE in set_mm_asid() */
+ RISCV_FENCE(w, r);
+
/* Recheck whether loaded_asid changed during enqueueing task */
if (per_cpu(loaded_asid, cpu) == d->asid)
return true;
@@ -146,6 +156,9 @@ static void __ipi_flush_tlb_range_asid(void *info)
struct flush_tlb_range_data *d = info;
local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
+
+ if (this_cpu_read(need_tlb_flush))
+ local_tlb_flush_queue_drain();
}
static inline unsigned long get_mm_asid(struct mm_struct *mm)
@@ -280,3 +293,24 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
cpumask_clear(&batch->cpumask);
}
+
+void local_tlb_flush_queue_drain(void)
+{
+ struct tlb_flush_queue *queue = this_cpu_ptr(&tlb_flush_queue);
+ struct flush_tlb_range_data *d;
+ unsigned int i;
+
+ this_cpu_write(need_tlb_flush, false);
+
+ /* Ensure clearing the need_tlb_flush flags before real tlb flush */
+ smp_wmb();
+
+ raw_spin_lock(&queue->lock);
+ for (i = 0; i < queue->len; i++) {
+ d = &queue->tasks[i];
+ local_flush_tlb_range_asid(d->start, d->size, d->stride,
+ d->asid);
+ }
+ queue->len = 0;
+ raw_spin_unlock(&queue->lock);
+}
--
2.20.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch
2025-10-30 13:56 ` [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch Xu Lu
@ 2025-11-03 3:44 ` Guo Ren
0 siblings, 0 replies; 10+ messages in thread
From: Guo Ren @ 2025-11-03 3:44 UTC (permalink / raw)
To: Xu Lu; +Cc: pjw, palmer, aou, alex, apatel, linux-riscv, linux-kernel
On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> During context_switch, check the percpu tlb flush queue and lazily
> perform tlb flush.
>
> Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
> ---
> arch/riscv/include/asm/tlbflush.h | 4 ++++
> arch/riscv/mm/context.c | 6 ++++++
> arch/riscv/mm/tlbflush.c | 34 +++++++++++++++++++++++++++++++
> 3 files changed, 44 insertions(+)
>
> diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h
> index eed0abc405143..7735c36f13d9f 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -66,6 +66,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch,
> void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>
> extern unsigned long tlb_flush_all_threshold;
> +
> +DECLARE_PER_CPU(bool, need_tlb_flush);
> +void local_tlb_flush_queue_drain(void);
> +
> #else /* CONFIG_MMU */
> #define local_flush_tlb_all() do { } while (0)
> #endif /* CONFIG_MMU */
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 4d5792c3a8c19..82b743bc81e4c 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -199,6 +199,12 @@ static void set_mm_asid(struct mm_struct *mm, unsigned int cpu)
>
> if (need_flush_tlb)
> local_flush_tlb_all();
> +
> + /* Paired with RISCV_FENCE in should_ipi_flush() */
> + RISCV_FENCE(w, r);
> +
> + if (this_cpu_read(need_tlb_flush))
> + local_tlb_flush_queue_drain();
> }
>
> static void set_mm_noasid(struct mm_struct *mm)
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index f4333c3a6d251..6592f72354df9 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -115,6 +115,8 @@ DEFINE_PER_CPU(struct tlb_flush_queue, tlb_flush_queue) = {
> .len = 0,
> };
>
> +DEFINE_PER_CPU(bool, need_tlb_flush) = false;
> +
> static bool should_ipi_flush(int cpu, void *info)
> {
> struct tlb_flush_queue *queue = per_cpu_ptr(&tlb_flush_queue, cpu);
> @@ -134,6 +136,14 @@ static bool should_ipi_flush(int cpu, void *info)
> }
> raw_spin_unlock_irqrestore(&queue->lock, flags);
>
> + /* Ensure tlb flush info is queued before setting need_tlb_flush flag */
> + smp_wmb();
> +
> + per_cpu(need_tlb_flush, cpu) = true;
> +
> + /* Paired with RISCV_FENCE in set_mm_asid() */
> + RISCV_FENCE(w, r);
> +
> /* Recheck whether loaded_asid changed during enqueueing task */
> if (per_cpu(loaded_asid, cpu) == d->asid)
> return true;
> @@ -146,6 +156,9 @@ static void __ipi_flush_tlb_range_asid(void *info)
> struct flush_tlb_range_data *d = info;
>
> local_flush_tlb_range_asid(d->start, d->size, d->stride, d->asid);
> +
> + if (this_cpu_read(need_tlb_flush))
> + local_tlb_flush_queue_drain();
> }
>
> static inline unsigned long get_mm_asid(struct mm_struct *mm)
> @@ -280,3 +293,24 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
> cpumask_clear(&batch->cpumask);
> }
> +
> +void local_tlb_flush_queue_drain(void)
> +{
> + struct tlb_flush_queue *queue = this_cpu_ptr(&tlb_flush_queue);
> + struct flush_tlb_range_data *d;
> + unsigned int i;
> +
> + this_cpu_write(need_tlb_flush, false);
> +
> + /* Ensure clearing the need_tlb_flush flags before real tlb flush */
> + smp_wmb();
> +
> + raw_spin_lock(&queue->lock);
> + for (i = 0; i < queue->len; i++) {
> + d = &queue->tasks[i];
> + local_flush_tlb_range_asid(d->start, d->size, d->stride,
> + d->asid);
Here, do we need an accurate flush for a delay flush?
> + }
> + queue->len = 0;
> + raw_spin_unlock(&queue->lock);
> +}
> --
> 2.20.1
>
--
Best Regards
Guo Ren
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
` (3 preceding siblings ...)
2025-10-30 13:56 ` [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch Xu Lu
@ 2025-11-03 3:44 ` Guo Ren
2025-11-03 7:06 ` [External] " Xu Lu
2025-11-07 1:55 ` Guo Ren
5 siblings, 1 reply; 10+ messages in thread
From: Guo Ren @ 2025-11-03 3:44 UTC (permalink / raw)
To: Xu Lu; +Cc: pjw, palmer, aou, alex, apatel, linux-riscv, linux-kernel
On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> When need to flush tlb of a remote cpu, there is no need to send an IPI
> if the target cpu is not using the asid we want to flush. Instead, we
> can cache the tlb flush info in percpu buffer, and defer the tlb flush
> to the next context_switch.
>
> This reduces the number of IPI due to tlb flush:
>
> * ltp - mmapstress01
> Before: ~108k
> After: ~46k
Great result!
I've some questions:
1. Do we need an accurate address flush by a new queue of
flush_tlb_range_data? Why not flush the whole asid?
2. If we reuse the context_tlb_flush_pending mechanism, could
mmapstress01 gain the result better than ~46k?
3. If we meet the kernel address space, we must use IPI flush
immediately, but I didn't see your patch consider that case, or am I
wrong?
>
> Future plan in the next version:
>
> - This patch series reduces IPI by deferring tlb flush to
> context_switch. It does not clear the mm_cpumask of target mm_struct. In
> the next version, I will apply a threshold to the number of ASIDs
> maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
> has not been used for the longest time will be flushed out. And current
> cpu will be cleared in the mm_cpumask.
>
> Thanks in advance for your comments.
>
> Xu Lu (4):
> riscv: mm: Introduce percpu loaded_asid
> riscv: mm: Introduce percpu tlb flush queue
> riscv: mm: Enqueue tlbflush info if task is not running on target cpu
> riscv: mm: Perform tlb flush during context_switch
>
> arch/riscv/include/asm/mmu_context.h | 1 +
> arch/riscv/include/asm/tlbflush.h | 4 ++
> arch/riscv/mm/context.c | 10 ++++
> arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++-
> 4 files changed, 90 insertions(+), 1 deletion(-)
>
> --
> 2.20.1
>
--
Best Regards
Guo Ren
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch
2025-11-03 3:44 ` [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Guo Ren
@ 2025-11-03 7:06 ` Xu Lu
0 siblings, 0 replies; 10+ messages in thread
From: Xu Lu @ 2025-11-03 7:06 UTC (permalink / raw)
To: Guo Ren; +Cc: pjw, palmer, aou, alex, apatel, linux-riscv, linux-kernel
Hi Guo Ren,
On Mon, Nov 3, 2025 at 11:44 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > When need to flush tlb of a remote cpu, there is no need to send an IPI
> > if the target cpu is not using the asid we want to flush. Instead, we
> > can cache the tlb flush info in percpu buffer, and defer the tlb flush
> > to the next context_switch.
> >
> > This reduces the number of IPI due to tlb flush:
> >
> > * ltp - mmapstress01
> > Before: ~108k
> > After: ~46k
> Great result!
>
> I've some questions:
> 1. Do we need an accurate address flush by a new queue of
> flush_tlb_range_data? Why not flush the whole asid?
Flushing the whole address space may cause subsequent tlb misses.
Consider such a case: there is only one user mode thread frequently
running on the target hart. When the user thread falls asleep and cpu
context switches to idle thread, another thread of the same process
running on another hart modifies the mapping and needs to perform tlb
flush. The first user mode thread will encounter a large number of tlb
misses when it resumes. I want to try to balance the ipi count and tlb
misses.
> 2. If we reuse the context_tlb_flush_pending mechanism, could
> mmapstress01 gain the result better than ~46k?
Besides lazy tlb flush, another way to reduce ipi overhead is to clean
mm_cpumask. And it does gain a better result for mmapstress01. I have
sent a patch[1] which clears mm_cpumask whenever flushing all tlb of a
certain asid and it reduces the ipi count from ~98k to 268.
As was mentioned in the previous email, in the next version, I will
supply the mm_cpumask clear procedure. Specifically, I will flush all
tlb of an asid and clear mm_cpumask whenever it hasn't been scheduled
after enough context switches.
[1] https://lore.kernel.org/all/20250827131444.23893-3-luxu.kernel@bytedance.com/
> 3. If we meet the kernel address space, we must use IPI flush
> immediately, but I didn't see your patch consider that case, or am I
> wrong?
Nice catch! Forgot to add the kernel ASID judgment logic in the
shoulded_ipi_flush function. I will supply it in the next version.
I have considered canceling ipi and deferring the tlb flush to the
next time target hart enters the s mode if the target hart is now
running in user mode. But there are too many kernel entry points to
consider, especially now we have sse. For kernel tlb flush, it may be
more secure to send ipi. Thanks.
Best Regards,
Xu Lu
>
> >
> > Future plan in the next version:
> >
> > - This patch series reduces IPI by deferring tlb flush to
> > context_switch. It does not clear the mm_cpumask of target mm_struct. In
> > the next version, I will apply a threshold to the number of ASIDs
> > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
> > has not been used for the longest time will be flushed out. And current
> > cpu will be cleared in the mm_cpumask.
> >
> > Thanks in advance for your comments.
> >
> > Xu Lu (4):
> > riscv: mm: Introduce percpu loaded_asid
> > riscv: mm: Introduce percpu tlb flush queue
> > riscv: mm: Enqueue tlbflush info if task is not running on target cpu
> > riscv: mm: Perform tlb flush during context_switch
> >
> > arch/riscv/include/asm/mmu_context.h | 1 +
> > arch/riscv/include/asm/tlbflush.h | 4 ++
> > arch/riscv/mm/context.c | 10 ++++
> > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++-
> > 4 files changed, 90 insertions(+), 1 deletion(-)
> >
> > --
> > 2.20.1
> >
>
>
> --
> Best Regards
> Guo Ren
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
` (4 preceding siblings ...)
2025-11-03 3:44 ` [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Guo Ren
@ 2025-11-07 1:55 ` Guo Ren
2025-11-07 3:03 ` [External] " Xu Lu
5 siblings, 1 reply; 10+ messages in thread
From: Guo Ren @ 2025-11-07 1:55 UTC (permalink / raw)
To: Xu Lu; +Cc: pjw, palmer, aou, alex, apatel, linux-riscv, linux-kernel
On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
>
> When need to flush tlb of a remote cpu, there is no need to send an IPI
> if the target cpu is not using the asid we want to flush. Instead, we
> can cache the tlb flush info in percpu buffer, and defer the tlb flush
> to the next context_switch.
>
> This reduces the number of IPI due to tlb flush:
>
> * ltp - mmapstress01
> Before: ~108k
> After: ~46k
Could you add the results for these two test cases to the next version?
* lmbench - lat_pagefault
* lmbench - lat_mmap
Thank you!
>
> Future plan in the next version:
>
> - This patch series reduces IPI by deferring tlb flush to
> context_switch. It does not clear the mm_cpumask of target mm_struct. In
> the next version, I will apply a threshold to the number of ASIDs
> maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
> has not been used for the longest time will be flushed out. And current
> cpu will be cleared in the mm_cpumask.
>
> Thanks in advance for your comments.
>
> Xu Lu (4):
> riscv: mm: Introduce percpu loaded_asid
> riscv: mm: Introduce percpu tlb flush queue
> riscv: mm: Enqueue tlbflush info if task is not running on target cpu
> riscv: mm: Perform tlb flush during context_switch
>
> arch/riscv/include/asm/mmu_context.h | 1 +
> arch/riscv/include/asm/tlbflush.h | 4 ++
> arch/riscv/mm/context.c | 10 ++++
> arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++-
> 4 files changed, 90 insertions(+), 1 deletion(-)
>
> --
> 2.20.1
>
--
Best Regards
Guo Ren
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] Re: [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch
2025-11-07 1:55 ` Guo Ren
@ 2025-11-07 3:03 ` Xu Lu
0 siblings, 0 replies; 10+ messages in thread
From: Xu Lu @ 2025-11-07 3:03 UTC (permalink / raw)
To: Guo Ren; +Cc: pjw, palmer, aou, alex, apatel, linux-riscv, linux-kernel
On Fri, Nov 7, 2025 at 9:56 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Thu, Oct 30, 2025 at 9:57 PM Xu Lu <luxu.kernel@bytedance.com> wrote:
> >
> > When need to flush tlb of a remote cpu, there is no need to send an IPI
> > if the target cpu is not using the asid we want to flush. Instead, we
> > can cache the tlb flush info in percpu buffer, and defer the tlb flush
> > to the next context_switch.
> >
> > This reduces the number of IPI due to tlb flush:
> >
> > * ltp - mmapstress01
> > Before: ~108k
> > After: ~46k
>
> Could you add the results for these two test cases to the next version?
>
> * lmbench - lat_pagefault
> * lmbench - lat_mmap
Roger that. Thanks for your supplement.
>
> Thank you!
>
> >
> > Future plan in the next version:
> >
> > - This patch series reduces IPI by deferring tlb flush to
> > context_switch. It does not clear the mm_cpumask of target mm_struct. In
> > the next version, I will apply a threshold to the number of ASIDs
> > maintained by each cpu's tlb. Once the threshold is exceeded, ASID that
> > has not been used for the longest time will be flushed out. And current
> > cpu will be cleared in the mm_cpumask.
> >
> > Thanks in advance for your comments.
> >
> > Xu Lu (4):
> > riscv: mm: Introduce percpu loaded_asid
> > riscv: mm: Introduce percpu tlb flush queue
> > riscv: mm: Enqueue tlbflush info if task is not running on target cpu
> > riscv: mm: Perform tlb flush during context_switch
> >
> > arch/riscv/include/asm/mmu_context.h | 1 +
> > arch/riscv/include/asm/tlbflush.h | 4 ++
> > arch/riscv/mm/context.c | 10 ++++
> > arch/riscv/mm/tlbflush.c | 76 +++++++++++++++++++++++++++-
> > 4 files changed, 90 insertions(+), 1 deletion(-)
> >
> > --
> > 2.20.1
> >
>
>
> --
> Best Regards
> Guo Ren
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-07 3:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-30 13:56 [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 1/4] riscv: mm: Introduce percpu loaded_asid Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 2/4] riscv: mm: Introduce percpu tlb flush queue Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 3/4] riscv: mm: Enqueue tlbflush info if task is not running on target cpu Xu Lu
2025-10-30 13:56 ` [RFC PATCH v1 4/4] riscv: mm: Perform tlb flush during context_switch Xu Lu
2025-11-03 3:44 ` Guo Ren
2025-11-03 3:44 ` [RFC PATCH v1 0/4] riscv: mm: Defer tlb flush to context_switch Guo Ren
2025-11-03 7:06 ` [External] " Xu Lu
2025-11-07 1:55 ` Guo Ren
2025-11-07 3:03 ` [External] " Xu Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox