From: Tao Cui <cui.tao@linux.dev>
To: Bibo Mao <maobibo@loongson.cn>,
zhaotianrui@loongson.cn, chenhuacai@kernel.org,
loongarch@lists.linux.dev
Cc: cui.tao@linux.dev, kernel@xen0n.name, kvm@vger.kernel.org,
Tao Cui <cuitao@kylinos.cn>
Subject: Re: [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush
Date: Fri, 26 Jun 2026 10:53:19 +0800 [thread overview]
Message-ID: <d09a90f7-94b1-4a2f-bcea-d42802d2353e@linux.dev> (raw)
In-Reply-To: <a7e7a17c-fdaa-b168-2a7b-d4cd4018b662@loongson.cn>
在 2026/6/26 09:37, Bibo Mao 写道:
>
>
> On 2026/6/25 下午8:51, Tao Cui wrote:
>>
>>
>> 在 2026/6/25 15:36, Bibo Mao 写道:
>>>
>>>
>>> On 2026/6/25 下午3:15, Tao Cui wrote:
>>>>
>>>>
>>>> 在 2026/6/25 14:11, Bibo Mao 写道:
>>>>>
>>>>>
>>>>> On 2026/6/25 上午11:31, Bibo Mao wrote:
>>>>>>
>>>>>>
>>>>>> On 2026/6/25 上午10:27, Tao Cui wrote:
>>>>>>>
>>>>>>> Hi Bibo,
>>>>>>>
>>>>>>> 在 2026/6/17 09:05, Bibo Mao 写道:
>>>>>>>>
>>>>>>>>> Rather than argue from intuition, I'd like to try the hypercall approach
>>>>>>>>> you suggested and measure the performance improvement against the current
>>>>>>>>> path. I'll share the results with you once the testing is done, so we
>>>>>>>>> can decide the direction based on the numbers.
>>>>>>>> well, that is the best. It is my pleasure to discuss this with you.
>>>>>>>>
>>>>>>>
>>>>>>> A quick update on the testing. I put both the hypercall and the
>>>>>>> steal-time variants through two benchmarks on an 8-core host with a
>>>>>>> 4:1 overcommitted guest (32 vCPUs), and wanted to share where things
>>>>>>> stand.
>>>>>>>
>>>>>>> The two workloads:
>>>>>>> - ebizzy (all threads busy, mm-flush heavy)
>>>>>>> - tlb_bench in sleep-idle mode (1 flusher + 31 sleeping idle threads,
>>>>>>> so the idle vCPUs get preempted)
>>>>>>>
>>>>>>> ebizzy (records/s, higher is better), 32 vCPUs:
>>>>>>> no-PV ~103,737
>>>>>>> hypercall ~105,779
>>>>>>> steal-time ~105,872
>>>>>>> -> all within noise (±2%); no measurable difference.
>>>>>> Hi Tao,
>>>>>>
>>>>>> what is ebizzy command? ebizzy -m or ebizzy -M.
>>>>>>
>>>>>> could you try command on host and one VM without over-committed at first, and then two VMs and three VMs?
>>>>>>
>>>>>> Here is result on my 3C5000 Dual-way machines with 32 cores and two numa nodes:
>>>>>> ./ebizzy -m ./ebizzy -M
>>>>>> host 8633 158898
>>>>>> VM(32 vCPUs) 6610 133153
>>>>>> VM/host 76% 83%
>>>>>>
>>>>> just ./ebizzy -M is enough, it seems that CPU number is one key factor.
>>>>>
>>>>
>>>> Sorry for the delay — it turned out my ebizzy command was wrong. I had
>>>> been running `ebizzy -t <vcpus> -S 10`, which is neither -m nor -M, so
>>>> neither mmap mode was active and the workload wasn't really stressing
>>>> the TLB-flush path. Thanks for catching it.
>>>>
>>>> I re-ran with -m and -M on host and a single VM (8-core LoongArch
>>>> KVM host, 8 vCPU guest, 1:1, no overcommit).
>>>>
>>>> ebizzy -m ebizzy -M
>>>> host ~20,000 ~55,000
>>>> VM (8 vCPU,1:1) ~17,000 ~53,000
>>>> VM/host ~86% ~97%
>>>>
>>>> The -m ratio (86%) is close to your 76%.
>>> On my 3C5000 Dual-way machine, VM has the same CPU/memory topology with physical machine, the kernel is mainline without any patch.
>>> ./ebizzy -M
>>> Host (32 pCPUs) 158898
>>> One VM(32 vCPUs) 133153 83% of host
>>> Two VMs(32 vCPUs each) 9083 + 9630 = 18713 11% of host
>>>
>>> It seems that with ebizzy benchmark, there is big difference if vCPU is preempted. Even if vCPU is not preempted, the performance is only 83% of host on my 3C5000 Dual-way machine.
>>
>> After fixing the ebizzy command, I
>> have multi-VM overcommit results for both approaches.
>>
>> Setup: 8-core LoongArch (single-socket, single NUMA), KVM,
>> linux-next-20260623, 8 vCPU per VM. All VMs run ebizzy -M
>> simultaneously, 3 runs each. The PV-off baseline uses a guest kernel
>> with CONFIG_PARAVIRT=y but without the PV TLB flush patches, so
>> PV IPI and steal-time are active in all three columns.
>>
>> ebizzy -M, total records/s across all VMs:
>>
>> PV-off steal-time hypercall
>> 1:1 (1VM) 53,600 53,800 53,900
>> 2:1 (2VM) 2,600 42,600 45,300
>> 3:1 (3VM) 2,800 44,700 46,000
>>
>> At 1:1 there is no difference — no vCPU gets preempted. Under
> From the previous result, the score on host is 55,000, with 1:1 mode the virtualization efficiency is 97%, it is hard to improve actually. On my 3C5000 Dual-way machine, the virtualization efficiency is 83%, I think it can improve even with 1:1 mode.
>
> Also the paper at https://dl.acm.org/doi/pdf/10.1145/2892242.2892245 proves it with 1:1 mode.
Thanks for the reference. That makes sense — the 1:1 overhead is
larger on multi-socket systems, so there may be room to improve
even without overcommit.
>> overcommit, without PV TLB flush the throughput drops to ~3-5% of
>> the single-VM case, because every remote TLB flush sends IPIs to
>> preempted vCPUs. With either PV TLB flush variant, preempted vCPUs
>> are skipped, and total throughput stays at ~85-90% of single-VM
>> (bounded by physical cores).
>>
>> Hypercall is consistently 3-6% above steal-time in the overcommit
>> cases. A possible reason is that the hypercall hands the entire
>> target set to the host in one call, while steal-time still IPIs the
>> running vCPUs and only defers the preempted ones.
>>
>> On our 8-core machine the collapse is more severe than on your
>> 3C5000 (~5% vs 11% of host at 2 VMs), likely due to the smaller
>> core count and single-NUMA topology.
> yeap, with more CPUs the benefit is more high. I think that this kind of patch had better be tested on server platform if there is such hardware by hand after all KVM is used in server platform in most time. I will test the patch in later.
>
I'll try to get access to a server-class LoongArch machine for
further testing. If you're able to run the patches on the 3C5000,
that would be great too.
> Regards
> Bibo Mao
>>
>> Thanks,
>> Tao
>>
>>>
>>> Regards
>>> Bibo Mao
>>>>
>>>> I then tried multi-VM overcommit (2 and 3 VMs, all running ebizzy -M
>>>> simultaneously). The initial result showed a large gap between the
>>>> PV-TLB-flush kernel and the baseline under overcommit. Both kernels
>>>> have CONFIG_PARAVIRT enabled (PV IPI and steal-time are active in
>>>> both), so PV TLB flush should be the main differentiator — but since
>>>> they are two separate kernel images rather than a clean on/off toggle,
>>>> there may be some noise from other differences. I'm now re-running
>>>> with a QEMU CPU property (kvm-pv-tlb-flush on/off) on the same kernel
>>>> to isolate the effect cleanly.
>>>>
>>>> I'll share the verified numbers once I have them.
>>>>
>>>> Thanks,
>>>> Tao
>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>
>>>>>>>
>>>>>>> tlb_bench sleep-idle (ns/flush, lower is better), 1 flusher + 31 idle:
>>>>>>> no-PV ~166,536
>>>>>>> steal-time ~149,553
>>>>>>> hypercall ~88,686
>>>>>>>
>>>>>>> ebizzy's workload is mostly threads staying busy with alloc/copy/free,
>>>>>>> which drives remote TLB flushes against running vCPUs — that may not be
>>>>>>> the path this feature is meant to optimize, so the flat result there
>>>>>>> probably says more about the workload mismatch than about the feature
>>>>>>> itself. I need to take another look at whether the benchmark actually
>>>>>>> exercises the cases PV TLB flush targets before reading too much into
>>>>>>> the numbers, including the tlb_bench figure above.
>>>>>>>
>>>>>>> So I'd hold off on any conclusion for now. Next I'll re-examine the
>>>>>>> test setup / pick a workload that better matches the feature, and keep
>>>>>>> you posted once I have something more representative.
>>>>>>>
>>>>>>> Best,
>>>>>>> Tao
>>>>>>>
>>>>>
>>>
>
next prev parent reply other threads:[~2026-06-26 2:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 8:21 [PATCH v4 0/3] LoongArch: KVM: Add PV TLB flush support Tao Cui
2026-06-15 8:21 ` [PATCH v4 1/3] LoongArch: KVM: Add PV TLB flush support via steal-time shared memory Tao Cui
2026-06-15 8:35 ` sashiko-bot
2026-06-16 1:03 ` Bibo Mao
2026-06-16 14:14 ` Tao Cui
2026-06-15 8:21 ` [PATCH v4 2/3] LoongArch: KVM: Implement guest-side PV TLB flush Tao Cui
2026-06-16 1:14 ` Bibo Mao
2026-06-16 15:08 ` Tao Cui
2026-06-17 1:05 ` Bibo Mao
2026-06-25 2:27 ` Tao Cui
2026-06-25 3:31 ` Bibo Mao
2026-06-25 6:11 ` Bibo Mao
2026-06-25 7:15 ` Tao Cui
2026-06-25 7:36 ` Bibo Mao
2026-06-25 12:51 ` Tao Cui
2026-06-26 1:37 ` Bibo Mao
2026-06-26 2:53 ` Tao Cui [this message]
2026-06-16 2:19 ` Bibo Mao
2026-06-15 8:21 ` [PATCH v4 3/3] KVM: selftests: loongarch: Add PV TLB flush performance test Tao Cui
2026-06-15 8:29 ` sashiko-bot
2026-06-15 9:24 ` Bibo Mao
2026-06-16 15:42 ` Tao Cui
2026-06-17 0:59 ` Bibo Mao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d09a90f7-94b1-4a2f-bcea-d42802d2353e@linux.dev \
--to=cui.tao@linux.dev \
--cc=chenhuacai@kernel.org \
--cc=cuitao@kylinos.cn \
--cc=kernel@xen0n.name \
--cc=kvm@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=maobibo@loongson.cn \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.