linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [bug report] GICv4.1: VM performance degradation due to not trapping vCPU WFI
@ 2024-01-16  3:26 sundongxu (A)
  2024-01-16 11:13 ` Marc Zyngier
  0 siblings, 1 reply; 5+ messages in thread
From: sundongxu (A) @ 2024-01-16  3:26 UTC (permalink / raw)
  To: maz, oliver.upton, yuzenghui, james.morse, suzuki.poulose, will,
	catalin.marinas
  Cc: wanghaibin.wang, kvmarm, linux-kernel, linux-arm-kernel

Hi Guys,

We found a problem about GICv4/4.1, for example:
We use QEMU to start a VM (4 vCPUs and 8G memory), VM disk was
configured with virtio, and the network is configured with vhost-net,
the CPU affinity of the vCPU and emulator is as follows, in VM xml:

  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='4,5,6,7'/>
  </cputune>

Running Mysql in the VM, and sysbench (Mysql benchmark) on the host,
the performance index is tps, the higher the better.
If the host only enabled GICv3, the tps will be around 1400.
If the host enabled GICv4.1, other configurations remain unchanged, the
tps will be around 40.

We found that when the host enabled GICv4.1, because vSGI is directly
injected to VM, and most time vCPU exclusively occupy the pCPU, vCPU
will not trap when executing the WFI instruction. Then from the host
view, the CPU usage of vCPU0~vCPU3 is almost 100%. When running mysql
service in VM, the vhost-net and qemu processes also need to obtain
enough CPU time, but unfortunately these processes cannot get that much
time (for example, only GICv3 enabled, the cpu usage of vhost-net is
about 43%, but with GICv4.1 enabled, it becomes 0~2%). During the test,
it was found that vhost-net sleeps and wakes up very frequently. When
vhost-net wakes up, it often cannot obtain CPU in time (because of
wake-up preemption check). After waking up, vhost-net will usually run
for a short period of time before going to sleep again.

If the host enabled GICv4.1, and force vCPU to trap when executing WFI,
the tps will be around 1400.

On the other hand, when vCPU executes WFI instruction without trapping,
the vcpu wake-up delay will be significantly improved. For example, the
result of running cyclictest in VM:
WFI trap           6us
WFI no trap        2us

Currently, I add a KVM module parameter to control whether the vCPU
traps (by set or clear HCR_TWI) when executing the WFI instruction with
host enabled GICv4/4.1, and by default, vCPU traps are set.

Or, it there a better way?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-18  7:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16  3:26 [bug report] GICv4.1: VM performance degradation due to not trapping vCPU WFI sundongxu (A)
2024-01-16 11:13 ` Marc Zyngier
2024-01-17 14:20   ` sundongxu (A)
2024-01-17 16:50     ` Oliver Upton
2024-01-18  7:56       ` sundongxu (A)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).