All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Lockless SMP function call and TLB flushing
@ 2026-04-01 16:35 Ross Lagerwall
  2026-04-01 16:35 ` [PATCH v1 1/3] x86/hap: Wait for remote CPUs during TLB flush Ross Lagerwall
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Ross Lagerwall @ 2026-04-01 16:35 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini

Hi,

This series implements lockless SMP function call and then rewrites x86 TLB
flushing to use SMP function calls.

We have observed that the TLB flush lock can be a point of contention for
certain workloads, e.g. migrating 10 VMs off a host during a host evacuation.

Performance numbers:

I wrote a synthetic benchmark to measure the performance. The benchmark has one
or more CPUs in Xen calling on_selected_cpus() with between 1 and 64 CPUs in
the selected mask. The executed function simply delays for 500 microseconds.

The table below shows the % change in execution time of on_selected_cpus():

                  1 thread   2 threads    4 threads
1 CPU in mask     0.02       -35.23       -51.18
2 CPUs in mask    0.01       -47.20       -69.27
4 CPUs in mask    -0.02      -42.40       -66.55
8 CPUs in mask    -0.03      -47.82       -68.39
16 CPUs in mask   0.12       -41.95       -58.26
32 CPUs in mask   0.02       -25.43       -39.35
64 CPUs in mask   0.00       -24.70       -37.83

With 1 thread (i.e. no contention), there is no regression in execution time.
With multiple threads, as expected there is a significant improvement in
execution time.

As a more practical benchmark to simulate host evacuation, I measured the
memory dirtying rate across 10 VMs after enabling log dirty (on an AMD system,
so without PML). The rate increased by 16% with this patch series, even
after the recent deferred TLB flush changes.

FWIW, my first attempt at this was to port the SMP call functionality from
Linux. I found it didn't scale well as the number of CPUs in the mask
increases so I've taken a different approach here.

Thanks,
Ross

Ross Lagerwall (3):
  x86/hap: Wait for remote CPUs during TLB flush
  xen/smp: Rewrite on_selected_cpus() to be lockless
  x86/smp: Rewrite TLB flush using on_selected_cpus()

 tools/xentrace/xenalyze.c              |   2 -
 xen/arch/x86/include/asm/irq-vectors.h |   1 -
 xen/arch/x86/include/asm/irq.h         |   1 -
 xen/arch/x86/mm/hap/hap.c              |   2 +-
 xen/arch/x86/smp.c                     |  30 ++++----
 xen/arch/x86/smpboot.c                 |   1 -
 xen/common/smp.c                       | 101 ++++++++++++++++---------
 7 files changed, 80 insertions(+), 58 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-04-20 14:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 16:35 [PATCH v1 0/3] Lockless SMP function call and TLB flushing Ross Lagerwall
2026-04-01 16:35 ` [PATCH v1 1/3] x86/hap: Wait for remote CPUs during TLB flush Ross Lagerwall
2026-04-08 15:21   ` Jan Beulich
2026-04-08 15:48     ` Ross Lagerwall
2026-04-09  6:55       ` Jan Beulich
2026-04-01 16:35 ` [PATCH v1 2/3] xen/smp: Rewrite on_selected_cpus() to be lockless Ross Lagerwall
2026-04-08 16:11   ` Jan Beulich
2026-04-09  8:09   ` Roger Pau Monné
2026-04-09 11:46   ` Jan Beulich
2026-04-01 16:35 ` [PATCH v1 3/3] x86/smp: Rewrite TLB flush using on_selected_cpus() Ross Lagerwall
2026-04-20 14:04   ` Jan Beulich
2026-04-02  6:09 ` [PATCH v1 0/3] Lockless SMP function call and TLB flushing Jan Beulich
2026-04-02  8:40   ` Ross Lagerwall
2026-04-02  8:49     ` Jan Beulich
2026-04-02 10:57       ` Ross Lagerwall
2026-04-02 11:57         ` Andrew Cooper
2026-04-09  8:13           ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.