* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 15:19 [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code Dave Hansen
@ 2026-04-21 15:20 ` Dave Hansen
2026-04-21 15:25 ` Hellstrom, Thomas
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2026-04-21 15:20 UTC (permalink / raw)
To: Dave Hansen, linux-kernel
Cc: Andy Lutomirski, Borislav Petkov, Cui Ling, Hellstrom Thomas,
Ingo Molnar, Peter Zijlstra, Rik van Riel, Thomas Gleixner, x86,
Yu-cheng Yu
On 4/21/26 08:19, Dave Hansen wrote:
> tl;dr: Revert an INVLPGB optimization that did not properly handle
> discontiguous virtual addresses.
Oh, and this obviously needs:
Cc: stable@vger.kernel.org
if/when it gets committed.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 15:19 [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code Dave Hansen
2026-04-21 15:20 ` Dave Hansen
@ 2026-04-21 15:25 ` Hellstrom, Thomas
2026-04-21 18:42 ` Edgecombe, Rick P
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Hellstrom, Thomas @ 2026-04-21 15:25 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com
Cc: Yu, Yu-cheng, riel@surriel.com, luto@kernel.org, x86@kernel.org,
Cui, Ling, peterz@infradead.org, bp@alien8.de, mingo@redhat.com,
tglx@kernel.org
On Tue, 2026-04-21 at 08:19 -0700, Dave Hansen wrote:
>
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> tl;dr: Revert an INVLPGB optimization that did not properly handle
> discontiguous virtual addresses.
>
> Full story:
>
> I got a report from some graphics (i915) folks that bisected a
> regression in their test suite to 86e6815b316e ("x86/mm: Change
> cpa_flush() to call flush_kernel_range() directly"). There was a bit
> of flip-flopping on the exact bisect, but the code here does seem
> wrong to me. The i915 folks were calling set_pages_array_wc(), so
> using the CPA_PAGES_ARRAY mode.
>
> Basically, the 'struct cpa_data' can wrap up all kinds of page table
> changes. Some of these are virtually contiguous, but some are very
> much not which is one reason why there are ->vaddr and ->pages
> arrays.
>
> 86e6815b316e made the mistake of assuming that the virtual addresses
> in the cpa_data are always contiguous. It got things right when
> neither
> CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when
> either
> of those is used.
>
> In the i915 case, it probably failed to flush some WB TLB entries and
> install WC ones, leaving some data in the caches and not flushing it
> out to where the device could see it. That eventually caused graphics
> problems.
>
> Revert the INVLPGB optimization. It can be reintroduced later, but it
> will need to be a bit careful about the array modes.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reported-by: Cui, Ling <ling.cui@intel.com>
> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Hellstrom, Thomas <thomas.hellstrom@intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Thanks,
Thomas
> ---
>
> b/arch/x86/mm/pat/set_memory.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff -puN arch/x86/mm/pat/set_memory.c~revert-86e6815b316ec0ea8
> arch/x86/mm/pat/set_memory.c
> --- a/arch/x86/mm/pat/set_memory.c~revert-
> 86e6815b316ec0ea8 2026-04-21 06:56:29.208686362 -0700
> +++ b/arch/x86/mm/pat/set_memory.c 2026-04-21
> 06:57:54.946011984 -0700
> @@ -399,6 +399,15 @@ static void cpa_flush_all(unsigned long
> on_each_cpu(__cpa_flush_all, (void *) cache, 1);
> }
>
> +static void __cpa_flush_tlb(void *data)
> +{
> + struct cpa_data *cpa = data;
> + unsigned int i;
> +
> + for (i = 0; i < cpa->numpages; i++)
> + flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
> +}
> +
> static int collapse_large_pages(unsigned long addr, struct list_head
> *pgtables);
>
> static void cpa_collapse_large_pages(struct cpa_data *cpa)
> @@ -435,7 +444,6 @@ static void cpa_collapse_large_pages(str
>
> static void cpa_flush(struct cpa_data *cpa, int cache)
> {
> - unsigned long start, end;
> unsigned int i;
>
> BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
> @@ -445,12 +453,10 @@ static void cpa_flush(struct cpa_data *c
> goto collapse_large_pages;
> }
>
> - start = fix_addr(__cpa_addr(cpa, 0));
> - end = start + cpa->numpages * PAGE_SIZE;
> - if (cpa->force_flush_all)
> - end = TLB_FLUSH_ALL;
> -
> - flush_tlb_kernel_range(start, end);
> + if (cpa->force_flush_all || cpa->numpages >
> tlb_single_page_flush_ceiling)
> + flush_tlb_all();
> + else
> + on_each_cpu(__cpa_flush_tlb, cpa, 1);
>
> if (!cache)
> goto collapse_large_pages;
> _
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 15:19 [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code Dave Hansen
2026-04-21 15:20 ` Dave Hansen
2026-04-21 15:25 ` Hellstrom, Thomas
@ 2026-04-21 18:42 ` Edgecombe, Rick P
2026-04-21 18:46 ` Dave Hansen
2026-04-22 19:15 ` [tip: x86/urgent] " tip-bot2 for Dave Hansen
2026-04-24 13:46 ` tip-bot2 for Dave Hansen
4 siblings, 1 reply; 9+ messages in thread
From: Edgecombe, Rick P @ 2026-04-21 18:42 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org, dave.hansen@linux.intel.com
Cc: Yu, Yu-cheng, riel@surriel.com, luto@kernel.org,
Hellstrom, Thomas, Cui, Ling, peterz@infradead.org, bp@alien8.de,
mingo@redhat.com, x86@kernel.org, tglx@kernel.org
On Tue, 2026-04-21 at 08:19 -0700, Dave Hansen wrote:
> 86e6815b316e made the mistake of assuming that the virtual addresses
> in the cpa_data are always contiguous. It got things right when neither
> CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when either
> of those is used.
>
> In the i915 case, it probably failed to flush some WB TLB entries and
> install WC ones, leaving some data in the caches and not flushing it
> out to where the device could see it. That eventually caused graphics
> problems.
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Makes sense. And I see the merit in just trying to revert the change. But I
think a change to fix the optimization is also temptingly small:
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 40581a720fe82..2a7ee1848f7ed 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2089,6 +2089,7 @@ static int change_page_attr_set_clr(unsigned long *addr,
int numpages,
cpa.flags = in_flag;
cpa.curpage = 0;
cpa.force_split = force_split;
+ cpa.force_flush_all = in_flag & (CPA_PAGES_ARRAY | CPA_ARRAY);
ret = __change_page_attr_set_clr(&cpa, 1);
Also, could add:
Fixes: 86e6815b316ec ("x86/mm: Change cpa_flush() to call flush_kernel_range()
directly")
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 18:42 ` Edgecombe, Rick P
@ 2026-04-21 18:46 ` Dave Hansen
2026-04-22 7:59 ` Hellstrom, Thomas
0 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2026-04-21 18:46 UTC (permalink / raw)
To: Edgecombe, Rick P, linux-kernel@vger.kernel.org,
dave.hansen@linux.intel.com
Cc: Yu, Yu-cheng, riel@surriel.com, luto@kernel.org,
Hellstrom, Thomas, Cui, Ling, peterz@infradead.org, bp@alien8.de,
mingo@redhat.com, x86@kernel.org, tglx@kernel.org
On 4/21/26 11:42, Edgecombe, Rick P wrote:
> Makes sense. And I see the merit in just trying to revert the change. But I
> think a change to fix the optimization is also temptingly small:
Yeah, it is tempting. It's probably what I would have done if this
wasn't easy to revert, or if it wasn't _just_ an optimization.
But once -rc1 hits, we should definitely revisit the optimization.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 18:46 ` Dave Hansen
@ 2026-04-22 7:59 ` Hellstrom, Thomas
2026-04-22 14:01 ` Dave Hansen
0 siblings, 1 reply; 9+ messages in thread
From: Hellstrom, Thomas @ 2026-04-22 7:59 UTC (permalink / raw)
To: Hansen, Dave, linux-kernel@vger.kernel.org, Edgecombe, Rick P,
dave.hansen@linux.intel.com
Cc: Yu, Yu-cheng, riel@surriel.com, luto@kernel.org, x86@kernel.org,
Cui, Ling, peterz@infradead.org, bp@alien8.de, mingo@redhat.com,
tglx@kernel.org
On Tue, 2026-04-21 at 11:46 -0700, Dave Hansen wrote:
> On 4/21/26 11:42, Edgecombe, Rick P wrote:
> > Makes sense. And I see the merit in just trying to revert the
> > change. But I
> > think a change to fix the optimization is also temptingly small:
>
> Yeah, it is tempting. It's probably what I would have done if this
> wasn't easy to revert, or if it wasn't _just_ an optimization.
>
> But once -rc1 hits, we should definitely revisit the optimization.
Are there any timings available for how bad a global TLB flush affects
system performance vs a single IPI invalidating a limited set of TLB
entries that aren't likely to be re-populated soon?
An uneducated guess would probably always favor the latter.
The set_pages_array_wc() is unfortunately a rather common operation
when allocating integrated graphics buffer objects. At least until a
pool of WC pages has been established by the graphics drivers. And I
think when this is happening it's reasonable to accept a predictable
allocation delay vs to have the full TLB invalidated across all cores
repeatedly?
Thanks,
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-22 7:59 ` Hellstrom, Thomas
@ 2026-04-22 14:01 ` Dave Hansen
0 siblings, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2026-04-22 14:01 UTC (permalink / raw)
To: Hellstrom, Thomas, linux-kernel@vger.kernel.org,
Edgecombe, Rick P, dave.hansen@linux.intel.com
Cc: Yu, Yu-cheng, riel@surriel.com, luto@kernel.org, x86@kernel.org,
Cui, Ling, peterz@infradead.org, bp@alien8.de, mingo@redhat.com,
tglx@kernel.org
On 4/22/26 00:59, Hellstrom, Thomas wrote:
> On Tue, 2026-04-21 at 11:46 -0700, Dave Hansen wrote:
>> On 4/21/26 11:42, Edgecombe, Rick P wrote:
>>> Makes sense. And I see the merit in just trying to revert the
>>> change. But I
>>> think a change to fix the optimization is also temptingly small:
>>
>> Yeah, it is tempting. It's probably what I would have done if this
>> wasn't easy to revert, or if it wasn't _just_ an optimization.
>>
>> But once -rc1 hits, we should definitely revisit the optimization.
>
> Are there any timings available for how bad a global TLB flush affects
> system performance vs a single IPI invalidating a limited set of TLB
> entries that aren't likely to be re-populated soon?
> An uneducated guess would probably always favor the latter.
Not in a while. I ran a bunch of numbers a decade or so ago and that's
where the /sys/kernel/debug/x86/tlb_single_page_flush_ceiling default
came from. But that was mostly focused on userspace and before PCIDs if
I remember right.
Rik also looked at some things recently on the AMD side when he was
trying to figure out when to use INVLPGB versus IPIs.
> The set_pages_array_wc() is unfortunately a rather common operation
> when allocating integrated graphics buffer objects. At least until a
> pool of WC pages has been established by the graphics drivers. And I
> think when this is happening it's reasonable to accept a predictable
> allocation delay vs to have the full TLB invalidated across all cores
> repeatedly?
Heh, I'd worry about shattering the direct map first. That costs a
performance penalty on everything, all the time, forever, before I
worried about a few measly one-off global TLB flushes.
There are a million ways to make all of this better. For instance,
during boot, the drm_gem_get_pages() are *probably* physically
contiguous, despite being stored in a ->pages[] array. The cpa code
could look for that. It could also be more careful about INVLPG versus
full flushes when shattering large pages.
But it would need an actual investigation with actual data before we
could make reasonable progress.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip: x86/urgent] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 15:19 [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code Dave Hansen
` (2 preceding siblings ...)
2026-04-21 18:42 ` Edgecombe, Rick P
@ 2026-04-22 19:15 ` tip-bot2 for Dave Hansen
2026-04-24 13:46 ` tip-bot2 for Dave Hansen
4 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Dave Hansen @ 2026-04-22 19:15 UTC (permalink / raw)
To: linux-tip-commits
Cc: Cui, Ling, Dave Hansen, Rick Edgecombe, thomas.hellstrom, x86,
linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 8e3705911af255ac12af7bc73140e885b61c4666
Gitweb: https://git.kernel.org/tip/8e3705911af255ac12af7bc73140e885b61c4666
Author: Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Tue, 21 Apr 2026 08:19:09 -07:00
Committer: Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Wed, 22 Apr 2026 12:11:55 -07:00
x86/mm: Revert INVLPGB optimization for set_memory code
tl;dr: Revert an INVLPGB optimization that did not properly handle
discontiguous virtual addresses.
Full story:
I got a report from some graphics (i915) folks that bisected a
regression in their test suite to 86e6815b316e ("x86/mm: Change
cpa_flush() to call flush_kernel_range() directly"). There was a bit
of flip-flopping on the exact bisect, but the code here does seem
wrong to me. The i915 folks were calling set_pages_array_wc(), so
using the CPA_PAGES_ARRAY mode.
Basically, the 'struct cpa_data' can wrap up all kinds of page table
changes. Some of these are virtually contiguous, but some are very
much not which is one reason why there are ->vaddr and ->pages arrays.
86e6815b316e made the mistake of assuming that the virtual addresses
in the cpa_data are always contiguous. It got things right when neither
CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when either
of those is used.
In the i915 case, it probably failed to flush some WB TLB entries and
install WC ones, leaving some data in the caches and not flushing it
out to where the device could see it. That eventually caused graphics
problems.
Revert the INVLPGB optimization. It can be reintroduced later, but it
will need to be a bit careful about the array modes.
Fixes: 86e6815b316ec ("x86/mm: Change cpa_flush() to call flush_kernel_range()
Reported-by: Cui, Ling <ling.cui@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260421151909.6B3281C6@davehans-spike.ostc.intel.com
---
arch/x86/mm/pat/set_memory.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index cba907c..d023a40 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -399,6 +399,15 @@ static void cpa_flush_all(unsigned long cache)
on_each_cpu(__cpa_flush_all, (void *) cache, 1);
}
+static void __cpa_flush_tlb(void *data)
+{
+ struct cpa_data *cpa = data;
+ unsigned int i;
+
+ for (i = 0; i < cpa->numpages; i++)
+ flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
+}
+
static int collapse_large_pages(unsigned long addr, struct list_head *pgtables);
static void cpa_collapse_large_pages(struct cpa_data *cpa)
@@ -435,7 +444,6 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
static void cpa_flush(struct cpa_data *cpa, int cache)
{
- unsigned long start, end;
unsigned int i;
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
@@ -445,12 +453,10 @@ static void cpa_flush(struct cpa_data *cpa, int cache)
goto collapse_large_pages;
}
- start = fix_addr(__cpa_addr(cpa, 0));
- end = start + cpa->numpages * PAGE_SIZE;
- if (cpa->force_flush_all)
- end = TLB_FLUSH_ALL;
-
- flush_tlb_kernel_range(start, end);
+ if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
+ flush_tlb_all();
+ else
+ on_each_cpu(__cpa_flush_tlb, cpa, 1);
if (!cache)
goto collapse_large_pages;
^ permalink raw reply related [flat|nested] 9+ messages in thread* [tip: x86/urgent] x86/mm: Revert INVLPGB optimization for set_memory code
2026-04-21 15:19 [PATCH] x86/mm: Revert INVLPGB optimization for set_memory code Dave Hansen
` (3 preceding siblings ...)
2026-04-22 19:15 ` [tip: x86/urgent] " tip-bot2 for Dave Hansen
@ 2026-04-24 13:46 ` tip-bot2 for Dave Hansen
4 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Dave Hansen @ 2026-04-24 13:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: Cui, Ling, Dave Hansen, Ingo Molnar, Rick Edgecombe,
thomas.hellstrom, x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: a39a7014825bd8d10b94fa4f953141b9473c25b4
Gitweb: https://git.kernel.org/tip/a39a7014825bd8d10b94fa4f953141b9473c25b4
Author: Dave Hansen <dave.hansen@linux.intel.com>
AuthorDate: Tue, 21 Apr 2026 08:19:09 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 24 Apr 2026 15:42:48 +02:00
x86/mm: Revert INVLPGB optimization for set_memory code
tl;dr: Revert an INVLPGB optimization that did not properly handle
discontiguous virtual addresses.
Full story:
I got a report from some graphics (i915) folks that bisected a
regression in their test suite to 86e6815b316e ("x86/mm: Change
cpa_flush() to call flush_kernel_range() directly"). There was a bit
of flip-flopping on the exact bisect, but the code here does seem
wrong to me. The i915 folks were calling set_pages_array_wc(), so
using the CPA_PAGES_ARRAY mode.
Basically, the 'struct cpa_data' can wrap up all kinds of page table
changes. Some of these are virtually contiguous, but some are very
much not which is one reason why there are ->vaddr and ->pages arrays.
86e6815b316e made the mistake of assuming that the virtual addresses
in the cpa_data are always contiguous. It got things right when neither
CPA_ARRAY/CPA_PAGES_ARRAY is used, but theoretically wrong when either
of those is used.
In the i915 case, it probably failed to flush some WB TLB entries and
install WC ones, leaving some data in the caches and not flushing it
out to where the device could see it. That eventually caused graphics
problems.
Revert the INVLPGB optimization. It can be reintroduced later, but it
will need to be a bit careful about the array modes.
Fixes: 86e6815b316ec ("x86/mm: Change cpa_flush() to call flush_kernel_range()")
Reported-by: Cui, Ling <ling.cui@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20260421151909.6B3281C6@davehans-spike.ostc.intel.com
---
arch/x86/mm/pat/set_memory.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index cba907c..d023a40 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -399,6 +399,15 @@ static void cpa_flush_all(unsigned long cache)
on_each_cpu(__cpa_flush_all, (void *) cache, 1);
}
+static void __cpa_flush_tlb(void *data)
+{
+ struct cpa_data *cpa = data;
+ unsigned int i;
+
+ for (i = 0; i < cpa->numpages; i++)
+ flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i)));
+}
+
static int collapse_large_pages(unsigned long addr, struct list_head *pgtables);
static void cpa_collapse_large_pages(struct cpa_data *cpa)
@@ -435,7 +444,6 @@ static void cpa_collapse_large_pages(struct cpa_data *cpa)
static void cpa_flush(struct cpa_data *cpa, int cache)
{
- unsigned long start, end;
unsigned int i;
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
@@ -445,12 +453,10 @@ static void cpa_flush(struct cpa_data *cpa, int cache)
goto collapse_large_pages;
}
- start = fix_addr(__cpa_addr(cpa, 0));
- end = start + cpa->numpages * PAGE_SIZE;
- if (cpa->force_flush_all)
- end = TLB_FLUSH_ALL;
-
- flush_tlb_kernel_range(start, end);
+ if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling)
+ flush_tlb_all();
+ else
+ on_each_cpu(__cpa_flush_tlb, cpa, 1);
if (!cache)
goto collapse_large_pages;
^ permalink raw reply related [flat|nested] 9+ messages in thread