Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: Tao Cui <cui.tao@linux.dev>
To: Bibo Mao <maobibo@loongson.cn>,
	zhaotianrui@loongson.cn, chenhuacai@kernel.org,
	loongarch@lists.linux.dev
Cc: cui.tao@linux.dev, kernel@xen0n.name, kvm@vger.kernel.org,
	Tao Cui <cuitao@kylinos.cn>
Subject: Re: [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush
Date: Thu, 25 Jun 2026 20:51:45 +0800	[thread overview]
Message-ID: <416dfbf8-f765-442e-b6de-6fc0fe1a4b5f@linux.dev> (raw)
In-Reply-To: <835a9c5c-2f66-293b-d093-fc59bac26a01@loongson.cn>



在 2026/6/25 15:36, Bibo Mao 写道:
> 
> 
> On 2026/6/25 下午3:15, Tao Cui wrote:
>>
>>
>> 在 2026/6/25 14:11, Bibo Mao 写道:
>>>
>>>
>>> On 2026/6/25 上午11:31, Bibo Mao wrote:
>>>>
>>>>
>>>> On 2026/6/25 上午10:27, Tao Cui wrote:
>>>>>
>>>>> Hi Bibo,
>>>>>
>>>>> 在 2026/6/17 09:05, Bibo Mao 写道:
>>>>>>
>>>>>>> Rather than argue from intuition, I'd like to try the hypercall approach
>>>>>>> you suggested and measure the performance improvement against the current
>>>>>>> path. I'll share the results with you once the testing is done, so we
>>>>>>> can decide the direction based on the numbers.
>>>>>> well, that is the best. It is my pleasure to discuss this with you.
>>>>>>
>>>>>
>>>>> A quick update on the testing. I put both the hypercall and the
>>>>> steal-time variants through two benchmarks on an 8-core host with a
>>>>> 4:1 overcommitted guest (32 vCPUs), and wanted to share where things
>>>>> stand.
>>>>>
>>>>> The two workloads:
>>>>>    - ebizzy (all threads busy, mm-flush heavy)
>>>>>    - tlb_bench in sleep-idle mode (1 flusher + 31 sleeping idle threads,
>>>>>      so the idle vCPUs get preempted)
>>>>>
>>>>> ebizzy (records/s, higher is better), 32 vCPUs:
>>>>>     no-PV       ~103,737
>>>>>     hypercall   ~105,779
>>>>>     steal-time  ~105,872
>>>>>     -> all within noise (±2%); no measurable difference.
>>>> Hi Tao,
>>>>
>>>> what is ebizzy command? ebizzy -m or ebizzy -M.
>>>>
>>>> could you try command on host and one VM without over-committed at first, and then two VMs and three VMs?
>>>>
>>>> Here is result on my 3C5000 Dual-way machines with 32 cores and two numa nodes:
>>>>                   ./ebizzy -m          ./ebizzy -M
>>>> host             8633                 158898
>>>> VM(32 vCPUs)     6610                 133153
>>>> VM/host          76%                  83%
>>>>
>>> just ./ebizzy -M is enough, it seems that CPU number is one key factor.
>>>
>>
>> Sorry for the delay — it turned out my ebizzy command was wrong. I had
>> been running `ebizzy -t <vcpus> -S 10`, which is neither -m nor -M, so
>> neither mmap mode was active and the workload wasn't really stressing
>> the TLB-flush path. Thanks for catching it.
>>
>> I re-ran with -m and -M on host and a single VM (8-core LoongArch
>> KVM host, 8 vCPU guest, 1:1, no overcommit).
>>
>>                   ebizzy -m        ebizzy -M
>> host             ~20,000          ~55,000
>> VM (8 vCPU,1:1)  ~17,000          ~53,000
>> VM/host          ~86%             ~97%
>>
>> The -m ratio (86%) is close to your 76%.
> On my 3C5000 Dual-way machine, VM has the same CPU/memory topology with physical machine, the kernel is mainline without any patch.
>                           ./ebizzy -M
> Host (32 pCPUs)           158898
> One VM(32 vCPUs)          133153                 83% of host
> Two VMs(32 vCPUs each)    9083 + 9630 = 18713    11% of host
> 
> It seems that with ebizzy benchmark, there is big difference if vCPU is preempted. Even if vCPU is not preempted, the performance is only 83% of host on my 3C5000 Dual-way machine.

After fixing the ebizzy command, I
have multi-VM overcommit results for both approaches.

Setup: 8-core LoongArch (single-socket, single NUMA), KVM,
linux-next-20260623, 8 vCPU per VM. All VMs run ebizzy -M
simultaneously, 3 runs each. The PV-off baseline uses a guest kernel
with CONFIG_PARAVIRT=y but without the PV TLB flush patches, so
PV IPI and steal-time are active in all three columns.

ebizzy -M, total records/s across all VMs:

              PV-off      steal-time   hypercall
  1:1 (1VM)   53,600      53,800       53,900
  2:1 (2VM)    2,600      42,600       45,300
  3:1 (3VM)    2,800      44,700       46,000

At 1:1 there is no difference — no vCPU gets preempted. Under
overcommit, without PV TLB flush the throughput drops to ~3-5% of
the single-VM case, because every remote TLB flush sends IPIs to
preempted vCPUs. With either PV TLB flush variant, preempted vCPUs
are skipped, and total throughput stays at ~85-90% of single-VM
(bounded by physical cores).

Hypercall is consistently 3-6% above steal-time in the overcommit
cases. A possible reason is that the hypercall hands the entire
target set to the host in one call, while steal-time still IPIs the
running vCPUs and only defers the preempted ones.

On our 8-core machine the collapse is more severe than on your
3C5000 (~5% vs 11% of host at 2 VMs), likely due to the smaller
core count and single-NUMA topology.

Thanks,
Tao

> 
> Regards
> Bibo Mao
>>
>> I then tried multi-VM overcommit (2 and 3 VMs, all running ebizzy -M
>> simultaneously). The initial result showed a large gap between the
>> PV-TLB-flush kernel and the baseline under overcommit. Both kernels
>> have CONFIG_PARAVIRT enabled (PV IPI and steal-time are active in
>> both), so PV TLB flush should be the main differentiator — but since
>> they are two separate kernel images rather than a clean on/off toggle,
>> there may be some noise from other differences. I'm now re-running
>> with a QEMU CPU property (kvm-pv-tlb-flush on/off) on the same kernel
>> to isolate the effect cleanly.
>>
>> I'll share the verified numbers once I have them.
>>
>> Thanks,
>> Tao
>>
>>>> Regards
>>>> Bibo Mao
>>>>
>>>>>
>>>>> tlb_bench sleep-idle (ns/flush, lower is better), 1 flusher + 31 idle:
>>>>>     no-PV       ~166,536
>>>>>     steal-time  ~149,553
>>>>>     hypercall    ~88,686
>>>>>
>>>>> ebizzy's workload is mostly threads staying busy with alloc/copy/free,
>>>>> which drives remote TLB flushes against running vCPUs — that may not be
>>>>> the path this feature is meant to optimize, so the flat result there
>>>>> probably says more about the workload mismatch than about the feature
>>>>> itself. I need to take another look at whether the benchmark actually
>>>>> exercises the cases PV TLB flush targets before reading too much into
>>>>> the numbers, including the tlb_bench figure above.
>>>>>
>>>>> So I'd hold off on any conclusion for now. Next I'll re-examine the
>>>>> test setup / pick a workload that better matches the feature, and keep
>>>>> you posted once I have something more representative.
>>>>>
>>>>> Best,
>>>>> Tao
>>>>>
>>>
> 


  reply	other threads:[~2026-06-25 12:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15  8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
2026-06-15  8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
2026-06-15  8:35   ` sashiko-bot
2026-06-16  1:03   ` Bibo Mao
2026-06-16 14:14     ` Tao Cui
2026-06-15  8:21 ` [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush Tao Cui
2026-06-16  1:14   ` Bibo Mao
2026-06-16 15:08     ` Tao Cui
2026-06-17  1:05       ` Bibo Mao
2026-06-25  2:27         ` Tao Cui
2026-06-25  3:31           ` Bibo Mao
2026-06-25  6:11             ` Bibo Mao
2026-06-25  7:15               ` Tao Cui
2026-06-25  7:36                 ` Bibo Mao
2026-06-25 12:51                   ` Tao Cui [this message]
2026-06-26  1:37                     ` Bibo Mao
2026-06-26  2:53                       ` Tao Cui
2026-06-16  2:19   ` Bibo Mao
2026-06-15  8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
2026-06-15  8:29   ` sashiko-bot
2026-06-15  9:24   ` Bibo Mao
2026-06-16 15:42     ` Tao Cui
2026-06-17  0:59       ` Bibo Mao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=416dfbf8-f765-442e-b6de-6fc0fe1a4b5f@linux.dev \
    --to=cui.tao@linux.dev \
    --cc=chenhuacai@kernel.org \
    --cc=cuitao@kylinos.cn \
    --cc=kernel@xen0n.name \
    --cc=kvm@vger.kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maobibo@loongson.cn \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox