* [PATCH v2 1/6] x86: support cache-writeback in flush_area_local() et al
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
@ 2023-05-03 9:44 ` Jan Beulich
2025-05-13 13:01 ` Roger Pau Monné
2023-05-03 9:45 ` [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback Jan Beulich
` (4 subsequent siblings)
5 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:44 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné
The majority of the present callers really aren't after invalidating
cache contents, but only after writeback. Make this available by simply
extending the FLUSH_CACHE handling accordingly. No feature checks are
required here: cache_writeback() falls back to cache_flush() as
necessary, while WBNOINVD degenerates to WBINVD on older hardware.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: FLUSH_WRITEBACK -> FLUSH_CACHE_WRITEBACK.
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -232,7 +232,7 @@ unsigned int flush_area_local(const void
if ( flags & FLUSH_HVM_ASID_CORE )
hvm_flush_guest_tlbs();
- if ( flags & FLUSH_CACHE )
+ if ( flags & (FLUSH_CACHE | FLUSH_CACHE_WRITEBACK) )
{
const struct cpuinfo_x86 *c = ¤t_cpu_data;
unsigned long sz = 0;
@@ -245,13 +245,16 @@ unsigned int flush_area_local(const void
c->x86_clflush_size && c->x86_cache_size && sz &&
((sz >> 10) < c->x86_cache_size) )
{
- cache_flush(va, sz);
- flags &= ~FLUSH_CACHE;
+ if ( flags & FLUSH_CACHE )
+ cache_flush(va, sz);
+ else
+ cache_writeback(va, sz);
+ flags &= ~(FLUSH_CACHE | FLUSH_CACHE_WRITEBACK);
}
- else
- {
+ else if ( flags & FLUSH_CACHE )
wbinvd();
- }
+ else
+ wbnoinvd();
}
if ( flags & FLUSH_ROOT_PGTBL )
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -135,6 +135,8 @@ void switch_cr3_cr4(unsigned long cr3, u
#else
# define FLUSH_NO_ASSIST 0
#endif
+ /* Write back data cache contents */
+#define FLUSH_CACHE_WRITEBACK 0x10000
/* Flush local TLBs/caches. */
unsigned int flush_area_local(const void *va, unsigned int flags);
@@ -194,7 +196,11 @@ static inline int clean_and_invalidate_d
}
static inline int clean_dcache_va_range(const void *p, unsigned long size)
{
- return clean_and_invalidate_dcache_va_range(p, size);
+ unsigned int order = get_order_from_bytes(size);
+
+ /* sub-page granularity support needs to be added if necessary */
+ flush_area_local(p, FLUSH_CACHE_WRITEBACK | FLUSH_ORDER(order));
+ return 0;
}
unsigned int guest_flush_tlb_flags(const struct domain *d);
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 1/6] x86: support cache-writeback in flush_area_local() et al
2023-05-03 9:44 ` [PATCH v2 1/6] x86: support cache-writeback in flush_area_local() et al Jan Beulich
@ 2025-05-13 13:01 ` Roger Pau Monné
0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-13 13:01 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu
On Wed, May 03, 2023 at 11:44:39AM +0200, Jan Beulich wrote:
> The majority of the present callers really aren't after invalidating
> cache contents, but only after writeback. Make this available by simply
> extending the FLUSH_CACHE handling accordingly. No feature checks are
> required here: cache_writeback() falls back to cache_flush() as
> necessary, while WBNOINVD degenerates to WBINVD on older hardware.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> v2: FLUSH_WRITEBACK -> FLUSH_CACHE_WRITEBACK.
>
> --- a/xen/arch/x86/flushtlb.c
> +++ b/xen/arch/x86/flushtlb.c
> @@ -232,7 +232,7 @@ unsigned int flush_area_local(const void
> if ( flags & FLUSH_HVM_ASID_CORE )
> hvm_flush_guest_tlbs();
>
> - if ( flags & FLUSH_CACHE )
> + if ( flags & (FLUSH_CACHE | FLUSH_CACHE_WRITEBACK) )
> {
> const struct cpuinfo_x86 *c = ¤t_cpu_data;
> unsigned long sz = 0;
> @@ -245,13 +245,16 @@ unsigned int flush_area_local(const void
> c->x86_clflush_size && c->x86_cache_size && sz &&
> ((sz >> 10) < c->x86_cache_size) )
> {
> - cache_flush(va, sz);
> - flags &= ~FLUSH_CACHE;
> + if ( flags & FLUSH_CACHE )
> + cache_flush(va, sz);
> + else
> + cache_writeback(va, sz);
> + flags &= ~(FLUSH_CACHE | FLUSH_CACHE_WRITEBACK);
> }
> - else
> - {
> + else if ( flags & FLUSH_CACHE )
> wbinvd();
> - }
> + else
> + wbnoinvd();
> }
>
> if ( flags & FLUSH_ROOT_PGTBL )
> --- a/xen/arch/x86/include/asm/flushtlb.h
> +++ b/xen/arch/x86/include/asm/flushtlb.h
> @@ -135,6 +135,8 @@ void switch_cr3_cr4(unsigned long cr3, u
> #else
> # define FLUSH_NO_ASSIST 0
> #endif
> + /* Write back data cache contents */
> +#define FLUSH_CACHE_WRITEBACK 0x10000
>
> /* Flush local TLBs/caches. */
> unsigned int flush_area_local(const void *va, unsigned int flags);
> @@ -194,7 +196,11 @@ static inline int clean_and_invalidate_d
> }
> static inline int clean_dcache_va_range(const void *p, unsigned long size)
> {
> - return clean_and_invalidate_dcache_va_range(p, size);
> + unsigned int order = get_order_from_bytes(size);
> +
> + /* sub-page granularity support needs to be added if necessary */
> + flush_area_local(p, FLUSH_CACHE_WRITEBACK | FLUSH_ORDER(order));
> + return 0;
> }
I'm planning to get rid of the clean_dcache_va_range() helper on x86,
but I don't want to force you to rebase on top of that.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
2023-05-03 9:44 ` [PATCH v2 1/6] x86: support cache-writeback in flush_area_local() et al Jan Beulich
@ 2023-05-03 9:45 ` Jan Beulich
2025-05-13 13:12 ` Andrew Cooper
2025-05-13 13:41 ` Roger Pau Monné
2023-05-03 9:45 ` [PATCH v2 3/6] x86/PV: restrict guest-induced WBINVD (or alike) " Jan Beulich
` (3 subsequent siblings)
5 siblings, 2 replies; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:45 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian,
Jun Nakajima
We allow its use for writeback purposes only anyway, so let's also carry
these out that way on capable hardware.
With it now known that WBNOINVD uses the same VM exit code as WBINVD for
both SVM and VT-x, we can now also expose the feature that way without
further distinguishing the specific cases of those VM exits. Note that
on SVM this builds upon INSTR_WBINVD also covering WBNOINVD, as the
decoder won't set prefix related bits for this encoding in the resulting
canonicalized opcode.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: FLUSH_WRITEBACK -> FLUSH_CACHE_WRITEBACK.
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2364,7 +2364,7 @@ static void svm_vmexit_mce_intercept(
static void cf_check svm_wbinvd_intercept(void)
{
if ( cache_flush_permitted(current->domain) )
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_WRITEBACK);
}
static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs *regs,
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1881,12 +1881,12 @@ void cf_check vmx_do_resume(void)
{
/*
* For pass-through domain, guest PCI-E device driver may leverage the
- * "Non-Snoop" I/O, and explicitly WBINVD or CLFLUSH to a RAM space.
- * Since migration may occur before WBINVD or CLFLUSH, we need to
- * maintain data consistency either by:
- * 1: flushing cache (wbinvd) when the guest is scheduled out if
+ * "Non-Snoop" I/O, and explicitly WB{NO,}INVD or CL{WB,FLUSH} RAM space.
+ * Since migration may occur before WB{NO,}INVD or CL{WB,FLUSH}, we need
+ * to maintain data consistency either by:
+ * 1: flushing cache (wbnoinvd) when the guest is scheduled out if
* there is no wbinvd exit, or
- * 2: execute wbinvd on all dirty pCPUs when guest wbinvd exits.
+ * 2: execute wbnoinvd on all dirty pCPUs when guest wbinvd exits.
* If VT-d engine can force snooping, we don't need to do these.
*/
if ( has_arch_pdevs(v->domain) && !iommu_snoop
@@ -1894,7 +1894,7 @@ void cf_check vmx_do_resume(void)
{
int cpu = v->arch.hvm.vmx.active_cpu;
if ( cpu != -1 )
- flush_mask(cpumask_of(cpu), FLUSH_CACHE);
+ flush_mask(cpumask_of(cpu), FLUSH_CACHE_WRITEBACK);
}
vmx_clear_vmcs(v);
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3714,9 +3714,9 @@ static void cf_check vmx_wbinvd_intercep
return;
if ( cpu_has_wbinvd_exiting )
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_WRITEBACK);
else
- wbinvd();
+ wbnoinvd();
}
static void ept_handle_violation(ept_qual_t q, paddr_t gpa)
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -238,7 +238,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /
/* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */
XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
-XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /* WBNOINVD instruction */
+XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /*S WBNOINVD instruction */
XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */
XEN_CPUFEATURE(IBRS, 8*32+14) /*S MSR_SPEC_CTRL.IBRS */
XEN_CPUFEATURE(AMD_STIBP, 8*32+15) /*S MSR_SPEC_CTRL.STIBP */
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2023-05-03 9:45 ` [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback Jan Beulich
@ 2025-05-13 13:12 ` Andrew Cooper
2025-05-13 13:24 ` Jan Beulich
2025-05-13 13:41 ` Roger Pau Monné
1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2025-05-13 13:12 UTC (permalink / raw)
To: Jan Beulich, xen-devel@lists.xenproject.org; +Cc: Wei Liu, Roger Pau Monné
On 03/05/2023 10:45 am, Jan Beulich wrote:
> We allow its use for writeback purposes only anyway, so let's also carry
> these out that way on capable hardware.
>
> With it now known that WBNOINVD uses the same VM exit code as WBINVD for
> both SVM and VT-x, we can now also expose the feature that way without
> further distinguishing the specific cases of those VM exits. Note that
> on SVM this builds upon INSTR_WBINVD also covering WBNOINVD, as the
> decoder won't set prefix related bits for this encoding in the resulting
> canonicalized opcode.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
I realise this is an old series, but this patch is unsafe with AMD
SEV-SNP, and with CLX devices.
Both have cases needing "genuinely evicted from the cache" semantics.
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-13 13:12 ` Andrew Cooper
@ 2025-05-13 13:24 ` Jan Beulich
0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2025-05-13 13:24 UTC (permalink / raw)
To: Andrew Cooper
Cc: Wei Liu, Roger Pau Monné, xen-devel@lists.xenproject.org
On 13.05.2025 15:12, Andrew Cooper wrote:
> On 03/05/2023 10:45 am, Jan Beulich wrote:
>> We allow its use for writeback purposes only anyway, so let's also carry
>> these out that way on capable hardware.
By implication of what you say ...
>> With it now known that WBNOINVD uses the same VM exit code as WBINVD for
>> both SVM and VT-x, we can now also expose the feature that way without
>> further distinguishing the specific cases of those VM exits. Note that
>> on SVM this builds upon INSTR_WBINVD also covering WBNOINVD, as the
>> decoder won't set prefix related bits for this encoding in the resulting
>> canonicalized opcode.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> I realise this is an old series,
Well, it's old because no-one cared to look at the v2 posting in over 2
years.
> but this patch is unsafe with AMD SEV-SNP, and with CLX devices.
>
> Both have cases needing "genuinely evicted from the cache" semantics.
... here, are you suggesting that our present behavior is wrong? We don't
have SEV-SNP support yet, so there it would only be "latently wrong" (and
hence the change here would still be correct for now). What CLX is in
this context I'm afraid I don't know (since surely you're not talking
about vapes, the closest I can spot in search results is something gaming
related, yet still not becoming very clear). I might have guessed
Cascade Lake X, but then why would just that one model be a problem?
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2023-05-03 9:45 ` [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback Jan Beulich
2025-05-13 13:12 ` Andrew Cooper
@ 2025-05-13 13:41 ` Roger Pau Monné
2025-05-13 13:54 ` Jan Beulich
1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-13 13:41 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian, Jun Nakajima
On Wed, May 03, 2023 at 11:45:22AM +0200, Jan Beulich wrote:
> We allow its use for writeback purposes only anyway, so let's also carry
> these out that way on capable hardware.
"for writeback purposes only" > is such the case because we cannot
guarantee the guest in which state the cache will be when resuming
from a wbinvd operation, and hence won't be "clean"?
It's my understanding that the same is possible on native, as the CPU
might speculatively pull lines into the cache. So there's no reason
for an OS to use wbinvd if wbnoinvd is available?
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-13 13:41 ` Roger Pau Monné
@ 2025-05-13 13:54 ` Jan Beulich
2025-05-14 11:30 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2025-05-13 13:54 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian, Jun Nakajima
On 13.05.2025 15:41, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 11:45:22AM +0200, Jan Beulich wrote:
>> We allow its use for writeback purposes only anyway, so let's also carry
>> these out that way on capable hardware.
>
> "for writeback purposes only" > is such the case because we cannot
> guarantee the guest in which state the cache will be when resuming
> from a wbinvd operation, and hence won't be "clean"?
Not really, no (see below). This is inferred from us limiting flushing
operations to domains with physical hardware assigned plus the fact
that we avoid the overhead in vmx_do_resume() when the IOMMU is snoop-
capable. (Plus I did all this work over 2 years ago, and hence I now
don't recall what other commentary I may have come across saying
things along these lines.)
Which, thinking about it while writing this reply (and also taking
into consideration Andrew's earlier reply), may be all wrong.
> It's my understanding that the same is possible on native, as the CPU
> might speculatively pull lines into the cache. So there's no reason
> for an OS to use wbinvd if wbnoinvd is available?
Speculatively pulling data into the cache is possible only when page
table entries permit caching. Hence after changing all mappings of a
certain page to UC, an OS may have a need to ensure that no data of
this page is left in any cache (and it can't be pulled back in
speculatively then).
IOW if the goal of the OS is write-back, then indeed it should prefer
WBNOINVD if available. Yet the same "replacement" isn't possible when
the goal is pruning of the cache(s).
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-13 13:54 ` Jan Beulich
@ 2025-05-14 11:30 ` Roger Pau Monné
2025-05-14 12:46 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 11:30 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian, Jun Nakajima
On Tue, May 13, 2025 at 03:54:56PM +0200, Jan Beulich wrote:
> On 13.05.2025 15:41, Roger Pau Monné wrote:
> > On Wed, May 03, 2023 at 11:45:22AM +0200, Jan Beulich wrote:
> >> We allow its use for writeback purposes only anyway, so let's also carry
> >> these out that way on capable hardware.
> >
> > "for writeback purposes only" > is such the case because we cannot
> > guarantee the guest in which state the cache will be when resuming
> > from a wbinvd operation, and hence won't be "clean"?
>
> Not really, no (see below). This is inferred from us limiting flushing
> operations to domains with physical hardware assigned plus the fact
> that we avoid the overhead in vmx_do_resume() when the IOMMU is snoop-
> capable. (Plus I did all this work over 2 years ago, and hence I now
> don't recall what other commentary I may have come across saying
> things along these lines.)
>
> Which, thinking about it while writing this reply (and also taking
> into consideration Andrew's earlier reply), may be all wrong.
As part of my series I was introducing a new option to allow guests
cache control even when no devices are assigned. My intention was
that whether a guest needs cache control or not is known at creation
time and stays immutable, and to also allow easier testing of the
cache control code without a physical device assigned.
> > It's my understanding that the same is possible on native, as the CPU
> > might speculatively pull lines into the cache. So there's no reason
> > for an OS to use wbinvd if wbnoinvd is available?
>
> Speculatively pulling data into the cache is possible only when page
> table entries permit caching. Hence after changing all mappings of a
> certain page to UC, an OS may have a need to ensure that no data of
> this page is left in any cache (and it can't be pulled back in
> speculatively then).
Is this realistic taking into account the OS is running virtualized?
At least with Xen there's the direct map, so once context is switched
back to Xen (for example to execute the wbinvd itself) there's no
guarantee the CPU won't speculatively populate the cache with entries
from the direct map.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-14 11:30 ` Roger Pau Monné
@ 2025-05-14 12:46 ` Jan Beulich
2025-05-14 14:42 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 12:46 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 14.05.2025 13:30, Roger Pau Monné wrote:
> On Tue, May 13, 2025 at 03:54:56PM +0200, Jan Beulich wrote:
>> On 13.05.2025 15:41, Roger Pau Monné wrote:
>>> It's my understanding that the same is possible on native, as the CPU
>>> might speculatively pull lines into the cache. So there's no reason
>>> for an OS to use wbinvd if wbnoinvd is available?
>>
>> Speculatively pulling data into the cache is possible only when page
>> table entries permit caching. Hence after changing all mappings of a
>> certain page to UC, an OS may have a need to ensure that no data of
>> this page is left in any cache (and it can't be pulled back in
>> speculatively then).
>
> Is this realistic taking into account the OS is running virtualized?
>
> At least with Xen there's the direct map, so once context is switched
> back to Xen (for example to execute the wbinvd itself) there's no
> guarantee the CPU won't speculatively populate the cache with entries
> from the direct map.
Well, we've been knowing for a long time that we're not doing things fully
correctly there. Once a guest has changed all mappings of a page it owns,
we ought to make the direct map one follow suit (or simply unmap it from
there).
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-14 12:46 ` Jan Beulich
@ 2025-05-14 14:42 ` Roger Pau Monné
2025-05-14 14:53 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 14:42 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Wed, May 14, 2025 at 02:46:32PM +0200, Jan Beulich wrote:
> On 14.05.2025 13:30, Roger Pau Monné wrote:
> > On Tue, May 13, 2025 at 03:54:56PM +0200, Jan Beulich wrote:
> >> On 13.05.2025 15:41, Roger Pau Monné wrote:
> >>> It's my understanding that the same is possible on native, as the CPU
> >>> might speculatively pull lines into the cache. So there's no reason
> >>> for an OS to use wbinvd if wbnoinvd is available?
> >>
> >> Speculatively pulling data into the cache is possible only when page
> >> table entries permit caching. Hence after changing all mappings of a
> >> certain page to UC, an OS may have a need to ensure that no data of
> >> this page is left in any cache (and it can't be pulled back in
> >> speculatively then).
> >
> > Is this realistic taking into account the OS is running virtualized?
> >
> > At least with Xen there's the direct map, so once context is switched
> > back to Xen (for example to execute the wbinvd itself) there's no
> > guarantee the CPU won't speculatively populate the cache with entries
> > from the direct map.
>
> Well, we've been knowing for a long time that we're not doing things fully
> correctly there. Once a guest has changed all mappings of a page it owns,
> we ought to make the direct map one follow suit (or simply unmap it from
> there).
Keeping track of guests mappings seems extremely complicated - maybe
doable for PV, but not for HVM with HAP I would think?
Also we would need to do something similar if guest enables CR0.CD and
switch all the direct map entries to uncached?
Address Space Isolation (and the removal of the direct map) might
solve part of this, but still I don't think we can fully guarantee Xen
won't have any mapping of guest pages with a different set of cache
attributes.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback
2025-05-14 14:42 ` Roger Pau Monné
@ 2025-05-14 14:53 ` Jan Beulich
0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 14:53 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 14.05.2025 16:42, Roger Pau Monné wrote:
> On Wed, May 14, 2025 at 02:46:32PM +0200, Jan Beulich wrote:
>> On 14.05.2025 13:30, Roger Pau Monné wrote:
>>> On Tue, May 13, 2025 at 03:54:56PM +0200, Jan Beulich wrote:
>>>> On 13.05.2025 15:41, Roger Pau Monné wrote:
>>>>> It's my understanding that the same is possible on native, as the CPU
>>>>> might speculatively pull lines into the cache. So there's no reason
>>>>> for an OS to use wbinvd if wbnoinvd is available?
>>>>
>>>> Speculatively pulling data into the cache is possible only when page
>>>> table entries permit caching. Hence after changing all mappings of a
>>>> certain page to UC, an OS may have a need to ensure that no data of
>>>> this page is left in any cache (and it can't be pulled back in
>>>> speculatively then).
>>>
>>> Is this realistic taking into account the OS is running virtualized?
>>>
>>> At least with Xen there's the direct map, so once context is switched
>>> back to Xen (for example to execute the wbinvd itself) there's no
>>> guarantee the CPU won't speculatively populate the cache with entries
>>> from the direct map.
>>
>> Well, we've been knowing for a long time that we're not doing things fully
>> correctly there. Once a guest has changed all mappings of a page it owns,
>> we ought to make the direct map one follow suit (or simply unmap it from
>> there).
>
> Keeping track of guests mappings seems extremely complicated - maybe
> doable for PV, but not for HVM with HAP I would think?
Indeed.
> Also we would need to do something similar if guest enables CR0.CD and
> switch all the direct map entries to uncached?
Likely, yes.
> Address Space Isolation (and the removal of the direct map) might
> solve part of this, but still I don't think we can fully guarantee Xen
> won't have any mapping of guest pages with a different set of cache
> attributes.
Yet for correctness we ought to, I fear.
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 3/6] x86/PV: restrict guest-induced WBINVD (or alike) to cache writeback
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
2023-05-03 9:44 ` [PATCH v2 1/6] x86: support cache-writeback in flush_area_local() et al Jan Beulich
2023-05-03 9:45 ` [PATCH v2 2/6] x86/HVM: restrict guest-induced WBINVD to cache writeback Jan Beulich
@ 2023-05-03 9:45 ` Jan Beulich
2023-05-03 9:46 ` [PATCH v2 4/6] VT-d: restrict iommu_flush_all() " Jan Beulich
` (2 subsequent siblings)
5 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:45 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné
We allow its use for writeback purposes only anyway, so let's also carry
these out that way on capable hardware.
We can then also expose the WBNOINVD feature, as there's no difference
to WBINVD anymore. Note that respective emulation logic has already been
in place since ad3abc47dd23 ("x86emul: support WBNOINVD").
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: FLUSH_WRITEBACK -> FLUSH_CACHE_WRITEBACK.
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3772,7 +3772,7 @@ long do_mmuext_op(
else if ( unlikely(!cache_flush_permitted(currd)) )
rc = -EACCES;
else
- wbinvd();
+ wbnoinvd();
break;
case MMUEXT_FLUSH_CACHE_GLOBAL:
@@ -3788,7 +3788,7 @@ long do_mmuext_op(
if ( !cpumask_intersects(mask,
per_cpu(cpu_sibling_mask, cpu)) )
__cpumask_set_cpu(cpu, mask);
- flush_mask(mask, FLUSH_CACHE);
+ flush_mask(mask, FLUSH_CACHE_WRITEBACK);
}
else
rc = -EINVAL;
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -1196,10 +1196,8 @@ static int cf_check cache_op(
* newer linux uses this in some start-of-day timing loops.
*/
;
- else if ( op == x86emul_wbnoinvd /* && cpu_has_wbnoinvd */ )
- wbnoinvd();
else
- wbinvd();
+ wbnoinvd();
return X86EMUL_OKAY;
}
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -238,7 +238,7 @@ XEN_CPUFEATURE(EFRO, 7*32+10) /
/* AMD-defined CPU features, CPUID level 0x80000008.ebx, word 8 */
XEN_CPUFEATURE(CLZERO, 8*32+ 0) /*A CLZERO instruction */
XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
-XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /*S WBNOINVD instruction */
+XEN_CPUFEATURE(WBNOINVD, 8*32+ 9) /*A WBNOINVD instruction */
XEN_CPUFEATURE(IBPB, 8*32+12) /*A IBPB support only (no IBRS, used by AMD) */
XEN_CPUFEATURE(IBRS, 8*32+14) /*S MSR_SPEC_CTRL.IBRS */
XEN_CPUFEATURE(AMD_STIBP, 8*32+15) /*S MSR_SPEC_CTRL.STIBP */
^ permalink raw reply [flat|nested] 29+ messages in thread* [PATCH v2 4/6] VT-d: restrict iommu_flush_all() to cache writeback
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
` (2 preceding siblings ...)
2023-05-03 9:45 ` [PATCH v2 3/6] x86/PV: restrict guest-induced WBINVD (or alike) " Jan Beulich
@ 2023-05-03 9:46 ` Jan Beulich
2025-05-14 11:40 ` Roger Pau Monné
2025-05-14 14:44 ` Roger Pau Monné
2023-05-03 9:46 ` [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT Jan Beulich
2023-05-03 9:47 ` [PATCH v2 6/6] x86/HVM: limit cache writeback overhead Jan Beulich
5 siblings, 2 replies; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:46 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian
We don't need to invalidate caches here; all we're after is that earlier
writes have made it to main memory (and aiui even that just in case).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
This, aiui, being an analogue to uses of iommu_sync_cache() (just not
range restricted), I wonder whether it shouldn't be conditional upon
iommu_non_coherent. Then again I'm vaguely under the impression that
we had been here before, possibly even as far as questioning the need
for this call altogether.
---
v2: FLUSH_WRITEBACK -> FLUSH_CACHE_WRITEBACK.
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -693,7 +693,7 @@ static int __must_check iommu_flush_all(
bool_t flush_dev_iotlb;
int rc = 0;
- flush_local(FLUSH_CACHE);
+ flush_local(FLUSH_CACHE_WRITEBACK);
for_each_drhd_unit ( drhd )
{
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 4/6] VT-d: restrict iommu_flush_all() to cache writeback
2023-05-03 9:46 ` [PATCH v2 4/6] VT-d: restrict iommu_flush_all() " Jan Beulich
@ 2025-05-14 11:40 ` Roger Pau Monné
2025-05-14 12:49 ` Jan Beulich
2025-05-14 14:44 ` Roger Pau Monné
1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 11:40 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian
On Wed, May 03, 2023 at 11:46:11AM +0200, Jan Beulich wrote:
> We don't need to invalidate caches here; all we're after is that earlier
> writes have made it to main memory (and aiui even that just in case).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> This, aiui, being an analogue to uses of iommu_sync_cache() (just not
> range restricted), I wonder whether it shouldn't be conditional upon
> iommu_non_coherent. Then again I'm vaguely under the impression that
> we had been here before, possibly even as far as questioning the need
> for this call altogether.
I think yes, it would better be only done for iommu_non_coherent. Yet
in that case I wonder why we need this wide flush. In principle all
accesses should already have their own write-back calls if the IOMMU
is non-coherent?
There's maybe the call from vtd_crash_shutdown() which I guess could
trigger in the middle of some interaction with the IOMMU, but at that
point do we really care to flush anyway if Xen is going to crash?
Otherwise it seems fine to switch to write-back.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 4/6] VT-d: restrict iommu_flush_all() to cache writeback
2025-05-14 11:40 ` Roger Pau Monné
@ 2025-05-14 12:49 ` Jan Beulich
0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 12:49 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 14.05.2025 13:40, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 11:46:11AM +0200, Jan Beulich wrote:
>> We don't need to invalidate caches here; all we're after is that earlier
>> writes have made it to main memory (and aiui even that just in case).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> This, aiui, being an analogue to uses of iommu_sync_cache() (just not
>> range restricted), I wonder whether it shouldn't be conditional upon
>> iommu_non_coherent. Then again I'm vaguely under the impression that
>> we had been here before, possibly even as far as questioning the need
>> for this call altogether.
>
> I think yes, it would better be only done for iommu_non_coherent. Yet
> in that case I wonder why we need this wide flush. In principle all
> accesses should already have their own write-back calls if the IOMMU
> is non-coherent?
>
> There's maybe the call from vtd_crash_shutdown() which I guess could
> trigger in the middle of some interaction with the IOMMU, but at that
> point do we really care to flush anyway if Xen is going to crash?
>
> Otherwise it seems fine to switch to write-back.
In which case why don't we do it in two steps: This patch, and then one
removing the call altogether. Just in case the latter one turns out wrong
for whatever reason.
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 4/6] VT-d: restrict iommu_flush_all() to cache writeback
2023-05-03 9:46 ` [PATCH v2 4/6] VT-d: restrict iommu_flush_all() " Jan Beulich
2025-05-14 11:40 ` Roger Pau Monné
@ 2025-05-14 14:44 ` Roger Pau Monné
2025-05-14 14:57 ` Jan Beulich
1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 14:44 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian
On Wed, May 03, 2023 at 11:46:11AM +0200, Jan Beulich wrote:
> We don't need to invalidate caches here; all we're after is that earlier
> writes have made it to main memory (and aiui even that just in case).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
I wouldn't mind if you gated the flush to iommu_non_coherent, but
that's not very relevant if we plan to remove the call anyway.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 4/6] VT-d: restrict iommu_flush_all() to cache writeback
2025-05-14 14:44 ` Roger Pau Monné
@ 2025-05-14 14:57 ` Jan Beulich
0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 14:57 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian
On 14.05.2025 16:44, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 11:46:11AM +0200, Jan Beulich wrote:
>> We don't need to invalidate caches here; all we're after is that earlier
>> writes have made it to main memory (and aiui even that just in case).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
> I wouldn't mind if you gated the flush to iommu_non_coherent, but
> that's not very relevant if we plan to remove the call anyway.
Actually I think I'll do that. Then for a fair number of systems it
won't matter anymore whether we remove this call sooner vs later.
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
` (3 preceding siblings ...)
2023-05-03 9:46 ` [PATCH v2 4/6] VT-d: restrict iommu_flush_all() " Jan Beulich
@ 2023-05-03 9:46 ` Jan Beulich
2025-05-14 11:44 ` Roger Pau Monné
2023-05-03 9:47 ` [PATCH v2 6/6] x86/HVM: limit cache writeback overhead Jan Beulich
5 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:46 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné
This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note that this (of course) collides with "x86/HVM: restrict use of
pinned cache attributes as well as associated flushing".
---
v2: New.
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -232,7 +232,7 @@ unsigned int flush_area_local(const void
if ( flags & FLUSH_HVM_ASID_CORE )
hvm_flush_guest_tlbs();
- if ( flags & (FLUSH_CACHE | FLUSH_CACHE_WRITEBACK) )
+ if ( flags & (FLUSH_CACHE_EVICT | FLUSH_CACHE_WRITEBACK) )
{
const struct cpuinfo_x86 *c = ¤t_cpu_data;
unsigned long sz = 0;
@@ -245,13 +245,13 @@ unsigned int flush_area_local(const void
c->x86_clflush_size && c->x86_cache_size && sz &&
((sz >> 10) < c->x86_cache_size) )
{
- if ( flags & FLUSH_CACHE )
+ if ( flags & FLUSH_CACHE_EVICT )
cache_flush(va, sz);
else
cache_writeback(va, sz);
- flags &= ~(FLUSH_CACHE | FLUSH_CACHE_WRITEBACK);
+ flags &= ~(FLUSH_CACHE_EVICT | FLUSH_CACHE_WRITEBACK);
}
- else if ( flags & FLUSH_CACHE )
+ else if ( flags & FLUSH_CACHE_EVICT )
wbinvd();
else
wbnoinvd();
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2228,7 +2228,7 @@ void hvm_shadow_handle_cd(struct vcpu *v
domain_pause_nosync(v->domain);
/* Flush physical caches. */
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_EVICT);
hvm_set_uc_mode(v, 1);
domain_unpause(v->domain);
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -614,7 +614,7 @@ int hvm_set_mem_pinned_cacheattr(struct
break;
/* fall through */
default:
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_EVICT);
break;
}
return 0;
@@ -680,7 +680,7 @@ int hvm_set_mem_pinned_cacheattr(struct
p2m_memory_type_changed(d);
if ( type != X86_MT_WB )
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_EVICT);
return rc;
}
@@ -782,7 +782,7 @@ void memory_type_changed(struct domain *
d->vcpu && d->vcpu[0] )
{
p2m_memory_type_changed(d);
- flush_all(FLUSH_CACHE);
+ flush_all(FLUSH_CACHE_EVICT);
}
}
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -113,7 +113,7 @@ void switch_cr3_cr4(unsigned long cr3, u
/* Flush TLBs (or parts thereof) including global mappings */
#define FLUSH_TLB_GLOBAL 0x200
/* Flush data caches */
-#define FLUSH_CACHE 0x400
+#define FLUSH_CACHE_EVICT 0x400
/* VA for the flush has a valid mapping */
#define FLUSH_VA_VALID 0x800
/* Flush CPU state */
@@ -191,7 +191,7 @@ static inline int clean_and_invalidate_d
{
unsigned int order = get_order_from_bytes(size);
/* sub-page granularity support needs to be added if necessary */
- flush_area_local(p, FLUSH_CACHE|FLUSH_ORDER(order));
+ flush_area_local(p, FLUSH_CACHE_EVICT | FLUSH_ORDER(order));
return 0;
}
static inline int clean_dcache_va_range(const void *p, unsigned long size)
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5193,7 +5193,7 @@ int map_pages_to_xen(
if ( (flags & _PAGE_PRESENT) && \
(((o_) ^ flags) & PAGE_CACHE_ATTRS) ) \
{ \
- flush_flags |= FLUSH_CACHE; \
+ flush_flags |= FLUSH_CACHE_EVICT; \
if ( virt >= DIRECTMAP_VIRT_START && \
virt < HYPERVISOR_VIRT_END ) \
flush_flags |= FLUSH_VA_VALID; \
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT
2023-05-03 9:46 ` [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT Jan Beulich
@ 2025-05-14 11:44 ` Roger Pau Monné
2025-05-14 12:52 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 11:44 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu
On Wed, May 03, 2023 at 11:46:41AM +0200, Jan Beulich wrote:
> This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.
>
> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
This is however missing the previous calls from SVM/VMX that at this
point in the series have already been switched to write-back instead
of evict. Maybe this patch should be the 1st of 2nd, so that changes
from evict to write-back are done afterwards?
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT
2025-05-14 11:44 ` Roger Pau Monné
@ 2025-05-14 12:52 ` Jan Beulich
2025-05-14 14:45 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 12:52 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu
On 14.05.2025 13:44, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 11:46:41AM +0200, Jan Beulich wrote:
>> This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.
>>
>> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
> This is however missing the previous calls from SVM/VMX that at this
> point in the series have already been switched to write-back instead
> of evict.
Hence why it's this late in the series, i.e. ...
> Maybe this patch should be the 1st of 2nd, so that changes
> from evict to write-back are done afterwards?
... I wanted to avoid touching those places twice. (IOW re-ordering would
be possible, with the extra changes you mention, but I'd prefer not to.)
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT
2025-05-14 12:52 ` Jan Beulich
@ 2025-05-14 14:45 ` Roger Pau Monné
2025-05-15 6:46 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 14:45 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu
On Wed, May 14, 2025 at 02:52:39PM +0200, Jan Beulich wrote:
> On 14.05.2025 13:44, Roger Pau Monné wrote:
> > On Wed, May 03, 2023 at 11:46:41AM +0200, Jan Beulich wrote:
> >> This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.
> >>
> >> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >
> > Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Thanks.
>
> > This is however missing the previous calls from SVM/VMX that at this
> > point in the series have already been switched to write-back instead
> > of evict.
>
> Hence why it's this late in the series, i.e. ...
>
> > Maybe this patch should be the 1st of 2nd, so that changes
> > from evict to write-back are done afterwards?
>
> ... I wanted to avoid touching those places twice. (IOW re-ordering would
> be possible, with the extra changes you mention, but I'd prefer not to.)
Given the concerns with patch 2 I think some re-ordering will be
needed?
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT
2025-05-14 14:45 ` Roger Pau Monné
@ 2025-05-15 6:46 ` Jan Beulich
0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2025-05-15 6:46 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu
On 14.05.2025 16:45, Roger Pau Monné wrote:
> On Wed, May 14, 2025 at 02:52:39PM +0200, Jan Beulich wrote:
>> On 14.05.2025 13:44, Roger Pau Monné wrote:
>>> On Wed, May 03, 2023 at 11:46:41AM +0200, Jan Beulich wrote:
>>>> This is to make the difference to FLUSH_CACHE_WRITEBACK more explicit.
>>>>
>>>> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Thanks.
>>
>>> This is however missing the previous calls from SVM/VMX that at this
>>> point in the series have already been switched to write-back instead
>>> of evict.
>>
>> Hence why it's this late in the series, i.e. ...
>>
>>> Maybe this patch should be the 1st of 2nd, so that changes
>>> from evict to write-back are done afterwards?
>>
>> ... I wanted to avoid touching those places twice. (IOW re-ordering would
>> be possible, with the extra changes you mention, but I'd prefer not to.)
>
> Given the concerns with patch 2 I think some re-ordering will be
> needed?
Well, if patches 2 and 3 need dropping, this one would naturally move forwards
of course. The more that 1 was already committed, and 4 is soon going to be (as
being independent of 2 and 3).
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2023-05-03 9:43 [PATCH v2 0/6] x86: reduce cache flushing overhead Jan Beulich
` (4 preceding siblings ...)
2023-05-03 9:46 ` [PATCH v2 5/6] x86: FLUSH_CACHE -> FLUSH_CACHE_EVICT Jan Beulich
@ 2023-05-03 9:47 ` Jan Beulich
2025-05-14 13:00 ` Roger Pau Monné
5 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2023-05-03 9:47 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org
Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Kevin Tian,
Jun Nakajima
There's no need to write back caches on all CPUs upon seeing a WBINVD
exit; ones that a vCPU hasn't run on since the last writeback (or since
it was started) can't hold data which may need writing back.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
With us not running AMD IOMMUs in non-coherent ways, I wonder whether
svm_wbinvd_intercept() really needs to do anything (or whether it
couldn't check iommu_snoop just like VMX does, knowing that as of
c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
HAP's"] that's always set; this would largely serve as grep fodder then,
to make sure this code is updated once / when we do away with this
global variable, and it would be the penultimate step to being able to
fold SVM's and VT-x'es functions).
---
v2: Re-base over changes earlier in the series.
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -537,6 +537,8 @@ void hvm_do_resume(struct vcpu *v)
v->arch.hvm.inject_event.vector = HVM_EVENT_VECTOR_UNSET;
}
+ __cpumask_set_cpu(v->processor, v->arch.hvm.cache_dirty_mask);
+
if ( unlikely(v->arch.vm_event) && v->arch.monitor.next_interrupt_enabled )
{
struct x86_event info;
@@ -1592,6 +1594,10 @@ int hvm_vcpu_initialise(struct vcpu *v)
if ( rc )
goto fail6;
+ rc = -ENOMEM;
+ if ( !zalloc_cpumask_var(&v->arch.hvm.cache_dirty_mask) )
+ goto fail6;
+
rc = ioreq_server_add_vcpu_all(d, v);
if ( rc != 0 )
goto fail6;
@@ -1621,6 +1627,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
hvm_vcpu_cacheattr_destroy(v);
fail1:
viridian_vcpu_deinit(v);
+ FREE_CPUMASK_VAR(v->arch.hvm.cache_dirty_mask);
return rc;
}
@@ -1628,6 +1635,8 @@ void hvm_vcpu_destroy(struct vcpu *v)
{
viridian_vcpu_deinit(v);
+ FREE_CPUMASK_VAR(v->arch.hvm.cache_dirty_mask);
+
ioreq_server_remove_vcpu_all(v->domain, v);
if ( hvm_altp2m_supported() )
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2363,8 +2363,14 @@ static void svm_vmexit_mce_intercept(
static void cf_check svm_wbinvd_intercept(void)
{
- if ( cache_flush_permitted(current->domain) )
- flush_all(FLUSH_CACHE_WRITEBACK);
+ struct vcpu *curr = current;
+
+ if ( !cache_flush_permitted(curr->domain) )
+ return;
+
+ flush_mask(curr->arch.hvm.cache_dirty_mask, FLUSH_CACHE_WRITEBACK);
+ cpumask_copy(curr->arch.hvm.cache_dirty_mask,
+ cpumask_of(curr->processor));
}
static void svm_vmexit_do_invalidate_cache(struct cpu_user_regs *regs,
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3710,11 +3710,17 @@ static void vmx_do_extint(struct cpu_use
static void cf_check vmx_wbinvd_intercept(void)
{
- if ( !cache_flush_permitted(current->domain) || iommu_snoop )
+ struct vcpu *curr = current;
+
+ if ( !cache_flush_permitted(curr->domain) || iommu_snoop )
return;
if ( cpu_has_wbinvd_exiting )
- flush_all(FLUSH_CACHE_WRITEBACK);
+ {
+ flush_mask(curr->arch.hvm.cache_dirty_mask, FLUSH_CACHE_WRITEBACK);
+ cpumask_copy(curr->arch.hvm.cache_dirty_mask,
+ cpumask_of(curr->processor));
+ }
else
wbnoinvd();
}
--- a/xen/arch/x86/include/asm/hvm/vcpu.h
+++ b/xen/arch/x86/include/asm/hvm/vcpu.h
@@ -161,6 +161,8 @@ struct hvm_vcpu {
struct svm_vcpu svm;
};
+ cpumask_var_t cache_dirty_mask;
+
struct tasklet assert_evtchn_irq_tasklet;
struct nestedvcpu nvcpu;
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2023-05-03 9:47 ` [PATCH v2 6/6] x86/HVM: limit cache writeback overhead Jan Beulich
@ 2025-05-14 13:00 ` Roger Pau Monné
2025-05-14 13:20 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 13:00 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel@lists.xenproject.org, Andrew Cooper, Wei Liu,
Kevin Tian, Jun Nakajima
On Wed, May 03, 2023 at 11:47:18AM +0200, Jan Beulich wrote:
> There's no need to write back caches on all CPUs upon seeing a WBINVD
> exit; ones that a vCPU hasn't run on since the last writeback (or since
> it was started) can't hold data which may need writing back.
Couldn't you do the same with PV mode, and hence put the cpumask in
arch_vcpu rather than hvm_vcpu?
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> With us not running AMD IOMMUs in non-coherent ways, I wonder whether
> svm_wbinvd_intercept() really needs to do anything (or whether it
> couldn't check iommu_snoop just like VMX does, knowing that as of
> c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
> HAP's"] that's always set; this would largely serve as grep fodder then,
> to make sure this code is updated once / when we do away with this
> global variable, and it would be the penultimate step to being able to
> fold SVM's and VT-x'es functions).
On my series I expand cache_flush_permitted() to also account for
iommu_snoop, I think it's easier to have a single check that signals
whether cache control is allowed for a domain, rather that having to
check multiple different conditions.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2025-05-14 13:00 ` Roger Pau Monné
@ 2025-05-14 13:20 ` Jan Beulich
2025-05-14 15:12 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2025-05-14 13:20 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 14.05.2025 15:00, Roger Pau Monné wrote:
> On Wed, May 03, 2023 at 11:47:18AM +0200, Jan Beulich wrote:
>> There's no need to write back caches on all CPUs upon seeing a WBINVD
>> exit; ones that a vCPU hasn't run on since the last writeback (or since
>> it was started) can't hold data which may need writing back.
>
> Couldn't you do the same with PV mode, and hence put the cpumask in
> arch_vcpu rather than hvm_vcpu?
We could in principle, but there's no use of flush_all() there right now
(i.e. nothing to "win").
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> With us not running AMD IOMMUs in non-coherent ways, I wonder whether
>> svm_wbinvd_intercept() really needs to do anything (or whether it
>> couldn't check iommu_snoop just like VMX does, knowing that as of
>> c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
>> HAP's"] that's always set; this would largely serve as grep fodder then,
>> to make sure this code is updated once / when we do away with this
>> global variable, and it would be the penultimate step to being able to
>> fold SVM's and VT-x'es functions).
>
> On my series I expand cache_flush_permitted() to also account for
> iommu_snoop, I think it's easier to have a single check that signals
> whether cache control is allowed for a domain, rather that having to
> check multiple different conditions.
Right, adjustments here would want making (in whichever series goes in
later).
For both of the responses: I think with patch 1 of this series having
gone in and with in particular Andrew's concern over patch 2 (which
may extend to patch 3), it may make sense for your series to go next.
I shall then re-base, while considering what to do with patches 2 and 3
(they may need dropping in the end).
Jan
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2025-05-14 13:20 ` Jan Beulich
@ 2025-05-14 15:12 ` Roger Pau Monné
2025-05-15 6:47 ` Jan Beulich
0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-14 15:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Wed, May 14, 2025 at 03:20:56PM +0200, Jan Beulich wrote:
> On 14.05.2025 15:00, Roger Pau Monné wrote:
> > On Wed, May 03, 2023 at 11:47:18AM +0200, Jan Beulich wrote:
> >> There's no need to write back caches on all CPUs upon seeing a WBINVD
> >> exit; ones that a vCPU hasn't run on since the last writeback (or since
> >> it was started) can't hold data which may need writing back.
> >
> > Couldn't you do the same with PV mode, and hence put the cpumask in
> > arch_vcpu rather than hvm_vcpu?
>
> We could in principle, but there's no use of flush_all() there right now
> (i.e. nothing to "win").
Yes, that will get "fixed" if we take patch 2 from my series. That
fixes the lack of broadcasting of wb{,no}invd when emulating it for
PV domains.
I think this patch would be better after my fix to cache_op(), and
then the uncertainty around patch 2 makes it unclear whether we want
this.
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> With us not running AMD IOMMUs in non-coherent ways, I wonder whether
> >> svm_wbinvd_intercept() really needs to do anything (or whether it
> >> couldn't check iommu_snoop just like VMX does, knowing that as of
> >> c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
> >> HAP's"] that's always set; this would largely serve as grep fodder then,
> >> to make sure this code is updated once / when we do away with this
> >> global variable, and it would be the penultimate step to being able to
> >> fold SVM's and VT-x'es functions).
> >
> > On my series I expand cache_flush_permitted() to also account for
> > iommu_snoop, I think it's easier to have a single check that signals
> > whether cache control is allowed for a domain, rather that having to
> > check multiple different conditions.
>
> Right, adjustments here would want making (in whichever series goes in
> later).
>
> For both of the responses: I think with patch 1 of this series having
> gone in and with in particular Andrew's concern over patch 2 (which
> may extend to patch 3), it may make sense for your series to go next.
> I shall then re-base, while considering what to do with patches 2 and 3
> (they may need dropping in the end).
Makes sense, I still need to get over your feedback on my series, I've
been distracted with other stuff.
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2025-05-14 15:12 ` Roger Pau Monné
@ 2025-05-15 6:47 ` Jan Beulich
2025-05-15 8:09 ` Roger Pau Monné
0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2025-05-15 6:47 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On 14.05.2025 17:12, Roger Pau Monné wrote:
> On Wed, May 14, 2025 at 03:20:56PM +0200, Jan Beulich wrote:
>> On 14.05.2025 15:00, Roger Pau Monné wrote:
>>> On Wed, May 03, 2023 at 11:47:18AM +0200, Jan Beulich wrote:
>>>> There's no need to write back caches on all CPUs upon seeing a WBINVD
>>>> exit; ones that a vCPU hasn't run on since the last writeback (or since
>>>> it was started) can't hold data which may need writing back.
>>>
>>> Couldn't you do the same with PV mode, and hence put the cpumask in
>>> arch_vcpu rather than hvm_vcpu?
>>
>> We could in principle, but there's no use of flush_all() there right now
>> (i.e. nothing to "win").
>
> Yes, that will get "fixed" if we take patch 2 from my series. That
> fixes the lack of broadcasting of wb{,no}invd when emulating it for
> PV domains.
>
> I think this patch would be better after my fix to cache_op(),
Right, this matches what I said ...
> and
> then the uncertainty around patch 2 makes it unclear whether we want
> this.
>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> With us not running AMD IOMMUs in non-coherent ways, I wonder whether
>>>> svm_wbinvd_intercept() really needs to do anything (or whether it
>>>> couldn't check iommu_snoop just like VMX does, knowing that as of
>>>> c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
>>>> HAP's"] that's always set; this would largely serve as grep fodder then,
>>>> to make sure this code is updated once / when we do away with this
>>>> global variable, and it would be the penultimate step to being able to
>>>> fold SVM's and VT-x'es functions).
>>>
>>> On my series I expand cache_flush_permitted() to also account for
>>> iommu_snoop, I think it's easier to have a single check that signals
>>> whether cache control is allowed for a domain, rather that having to
>>> check multiple different conditions.
>>
>> Right, adjustments here would want making (in whichever series goes in
>> later).
>>
>> For both of the responses: I think with patch 1 of this series having
>> gone in and with in particular Andrew's concern over patch 2 (which
>> may extend to patch 3), it may make sense for your series to go next.
>> I shall then re-base, while considering what to do with patches 2 and 3
>> (they may need dropping in the end).
... here.
Jan
> Makes sense, I still need to get over your feedback on my series, I've
> been distracted with other stuff.
>
> Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: [PATCH v2 6/6] x86/HVM: limit cache writeback overhead
2025-05-15 6:47 ` Jan Beulich
@ 2025-05-15 8:09 ` Roger Pau Monné
0 siblings, 0 replies; 29+ messages in thread
From: Roger Pau Monné @ 2025-05-15 8:09 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper
On Thu, May 15, 2025 at 08:47:46AM +0200, Jan Beulich wrote:
> On 14.05.2025 17:12, Roger Pau Monné wrote:
> > On Wed, May 14, 2025 at 03:20:56PM +0200, Jan Beulich wrote:
> >> On 14.05.2025 15:00, Roger Pau Monné wrote:
> >>> On Wed, May 03, 2023 at 11:47:18AM +0200, Jan Beulich wrote:
> >>>> There's no need to write back caches on all CPUs upon seeing a WBINVD
> >>>> exit; ones that a vCPU hasn't run on since the last writeback (or since
> >>>> it was started) can't hold data which may need writing back.
> >>>
> >>> Couldn't you do the same with PV mode, and hence put the cpumask in
> >>> arch_vcpu rather than hvm_vcpu?
> >>
> >> We could in principle, but there's no use of flush_all() there right now
> >> (i.e. nothing to "win").
> >
> > Yes, that will get "fixed" if we take patch 2 from my series. That
> > fixes the lack of broadcasting of wb{,no}invd when emulating it for
> > PV domains.
> >
> > I think this patch would be better after my fix to cache_op(),
>
> Right, this matches what I said ...
>
> > and
> > then the uncertainty around patch 2 makes it unclear whether we want
> > this.
> >
> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>> ---
> >>>> With us not running AMD IOMMUs in non-coherent ways, I wonder whether
> >>>> svm_wbinvd_intercept() really needs to do anything (or whether it
> >>>> couldn't check iommu_snoop just like VMX does, knowing that as of
> >>>> c609108b2190 ["x86/shadow: make iommu_snoop usage consistent with
> >>>> HAP's"] that's always set; this would largely serve as grep fodder then,
> >>>> to make sure this code is updated once / when we do away with this
> >>>> global variable, and it would be the penultimate step to being able to
> >>>> fold SVM's and VT-x'es functions).
> >>>
> >>> On my series I expand cache_flush_permitted() to also account for
> >>> iommu_snoop, I think it's easier to have a single check that signals
> >>> whether cache control is allowed for a domain, rather that having to
> >>> check multiple different conditions.
> >>
> >> Right, adjustments here would want making (in whichever series goes in
> >> later).
> >>
> >> For both of the responses: I think with patch 1 of this series having
> >> gone in and with in particular Andrew's concern over patch 2 (which
> >> may extend to patch 3), it may make sense for your series to go next.
> >> I shall then re-base, while considering what to do with patches 2 and 3
> >> (they may need dropping in the end).
>
> ... here.
By the time I saw your paragraph, I had already written mine, so I
just left it, we reached the same conclusion which is good IMO :).
Thanks, Roger.
^ permalink raw reply [flat|nested] 29+ messages in thread