Latency issues inside KVM.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Latency issues inside KVM.
@ 2023-04-27 12:38 zhuangel570
  2023-04-29  3:28 ` Robert Hoo
  0 siblings, 1 reply; 9+ messages in thread
From: zhuangel570 @ 2023-04-27 12:38 UTC (permalink / raw)
  To: kvm

Hi

We found some latency issue in high-density and high-concurrency scenarios, we
are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
and register irqfd, after trace with funclatency (a tool of bcc-tools,
https://github.com/iovisor/bcc), we found the latency introduced by following
functions:

- irq_bypass_register_consumer introduce more than 60ms per VM.
  This function was called when registering irqfd, the function will register
  irqfd as consumer to irqbypass, wait for connecting from irqbypass producers,
  like VFIO or VDPA. In our test, one irqfd register will get about 4ms
  latency, and 5 devices with total 16 irqfd will introduce more than 60ms
  latency.

- kvm_vm_create_worker_thread introduce tail latency more than 100ms.
  This function was called when create "kvm-nx-lpage-recovery" kthread when
  create a new VM, this patch was introduced to recovery large page to relief
  performance loss caused by software mitigation of ITLB_MULTIHIT, see
  b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
  ("kvm: x86: mmu: Recovery of shattered NX large pages").

Here is a simple case, which can emulate the latency issue (the real latency
is lager). The case create 800 VM as background do nothing, then repeatedly
create 20 VM then destroy them after 400ms, every VM will do simple thing,
create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
device has 3 irqfd), just trace the two function latency, you will reproduce
such kind latency issue. Here is a trace log on Xeon(R) Platinum 8255C server
(96C, 2 sockets) with linux 6.2.20.

Reproduce Case
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
Reproduce log
https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log

To fix these latencies, I didn't have a graceful method, just simple ideas
is give user a chance to avoid these latencies, like a module parameter to
disable "kvm-nx-lpage-recovery" kthread and new flag to disable irqbypass
for each irqfd.

Any suggestion to fix the issue if welcomed.

Thanks!

-- 
——————————
   zhuangel570
——————————

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-04-27 12:38 Latency issues inside KVM zhuangel570
@ 2023-04-29  3:28 ` Robert Hoo
  2023-05-01 14:51   ` Sean Christopherson
  2023-05-04 13:32   ` zhuangel570
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Hoo @ 2023-04-29  3:28 UTC (permalink / raw)
  To: zhuangel570, kvm

On 4/27/2023 8:38 PM, zhuangel570 wrote:
> Hi
> 
> We found some latency issue in high-density and high-concurrency scenarios, we
> are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> and register irqfd, after trace with funclatency (a tool of bcc-tools,
> https://github.com/iovisor/bcc), we found the latency introduced by following
> functions:
> 
> - irq_bypass_register_consumer introduce more than 60ms per VM.
>    This function was called when registering irqfd, the function will register
>    irqfd as consumer to irqbypass, wait for connecting from irqbypass producers,
>    like VFIO or VDPA. In our test, one irqfd register will get about 4ms
>    latency, and 5 devices with total 16 irqfd will introduce more than 60ms
>    latency.
> 
> - kvm_vm_create_worker_thread introduce tail latency more than 100ms.
>    This function was called when create "kvm-nx-lpage-recovery" kthread when
>    create a new VM, this patch was introduced to recovery large page to relief
>    performance loss caused by software mitigation of ITLB_MULTIHIT, see
>    b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
>    ("kvm: x86: mmu: Recovery of shattered NX large pages").
> 
Yes, this kthread is for NX-HugePage feature and NX-HugePage in turn is to 
SW mitigate itlb-multihit issue.
However, HW level mitigation has been available for quite a while, you can 
check "/sys/devices/system/cpu/vulnerabilities/itlb_multihit" for your 
system's mitigation status.
I believe most recent Intel CPUs have this HW mitigated (check 
MSR_ARCH_CAPABILITIES::IF_PSCHANGE_MC_NO), let alone non-Intel CPUs.
But, the kvm_vm_create_worker_thread is still created anyway, nonsense I 
think. I previously had a internal patch getting rid of it but didn't get a 
chance to send out.

As more and more old CPUs retires, I think NX-HugePage code will become 
more and more minority code path/situation, and be refactored out 
eventually one day.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-04-29  3:28 ` Robert Hoo
@ 2023-05-01 14:51   ` Sean Christopherson
  2023-05-01 20:03     ` Jim Mattson
  2023-05-02  1:16     ` Robert Hoo
  2023-05-04 13:32   ` zhuangel570
  1 sibling, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2023-05-01 14:51 UTC (permalink / raw)
  To: Robert Hoo; +Cc: zhuangel570, kvm

On Sat, Apr 29, 2023, Robert Hoo wrote:
> On 4/27/2023 8:38 PM, zhuangel570 wrote:
> > - kvm_vm_create_worker_thread introduce tail latency more than 100ms.
> >    This function was called when create "kvm-nx-lpage-recovery" kthread when
> >    create a new VM, this patch was introduced to recovery large page to relief
> >    performance loss caused by software mitigation of ITLB_MULTIHIT, see
> >    b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
> >    ("kvm: x86: mmu: Recovery of shattered NX large pages").
> > 
> Yes, this kthread is for NX-HugePage feature and NX-HugePage in turn is to
> SW mitigate itlb-multihit issue.
> However, HW level mitigation has been available for quite a while, you can
> check "/sys/devices/system/cpu/vulnerabilities/itlb_multihit" for your
> system's mitigation status.
> I believe most recent Intel CPUs have this HW mitigated (check
> MSR_ARCH_CAPABILITIES::IF_PSCHANGE_MC_NO), let alone non-Intel CPUs.
> But, the kvm_vm_create_worker_thread is still created anyway, nonsense I
> think. I previously had a internal patch getting rid of it but didn't get a
> chance to send out.

For the NX hugepage mitation, I think it makes sense to restart the discussion
in the context of this thread: https://lore.kernel.org/all/ZBxf+ewCimtHY2XO@google.com

TL;DR: I am open to providng an option to hard disable the mitigation, but there
needs to be sufficient justification, e.g. that the above 100ms latency is a
problem for real world deployments.

> As more and more old CPUs retires, I think NX-HugePage code will become more
> and more minority code path/situation, and be refactored out eventually one
> day.

Heh, yeah, one day.  But "one day" is likely 10+ years away.  Intel discontinuing
a CPU has practically zero relevance to KVM removing support a CPU, e.g. KVM still
supports the original Core CPUs from ~2006, which were launched in 2006 and
discontinued in 2008.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-05-01 14:51   ` Sean Christopherson
@ 2023-05-01 20:03     ` Jim Mattson
  2023-05-01 21:11       ` Sean Christopherson
  2023-05-02  1:16     ` Robert Hoo
  1 sibling, 1 reply; 9+ messages in thread
From: Jim Mattson @ 2023-05-01 20:03 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Robert Hoo, zhuangel570, kvm

On Mon, May 1, 2023 at 7:51 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Sat, Apr 29, 2023, Robert Hoo wrote:
> > On 4/27/2023 8:38 PM, zhuangel570 wrote:
> > > - kvm_vm_create_worker_thread introduce tail latency more than 100ms.
> > >    This function was called when create "kvm-nx-lpage-recovery" kthread when
> > >    create a new VM, this patch was introduced to recovery large page to relief
> > >    performance loss caused by software mitigation of ITLB_MULTIHIT, see
> > >    b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
> > >    ("kvm: x86: mmu: Recovery of shattered NX large pages").
> > >
> > Yes, this kthread is for NX-HugePage feature and NX-HugePage in turn is to
> > SW mitigate itlb-multihit issue.
> > However, HW level mitigation has been available for quite a while, you can
> > check "/sys/devices/system/cpu/vulnerabilities/itlb_multihit" for your
> > system's mitigation status.
> > I believe most recent Intel CPUs have this HW mitigated (check
> > MSR_ARCH_CAPABILITIES::IF_PSCHANGE_MC_NO), let alone non-Intel CPUs.
> > But, the kvm_vm_create_worker_thread is still created anyway, nonsense I
> > think. I previously had a internal patch getting rid of it but didn't get a
> > chance to send out.
>
> For the NX hugepage mitation, I think it makes sense to restart the discussion
> in the context of this thread: https://lore.kernel.org/all/ZBxf+ewCimtHY2XO@google.com
>
> TL;DR: I am open to providng an option to hard disable the mitigation, but there
> needs to be sufficient justification, e.g. that the above 100ms latency is a
> problem for real world deployments.

Whatever became of
https://lore.kernel.org/kvm/20220613212523.3436117-1-bgardon@google.com/?

> > As more and more old CPUs retires, I think NX-HugePage code will become more
> > and more minority code path/situation, and be refactored out eventually one
> > day.
>
> Heh, yeah, one day.  But "one day" is likely 10+ years away.  Intel discontinuing
> a CPU has practically zero relevance to KVM removing support a CPU, e.g. KVM still
> supports the original Core CPUs from ~2006, which were launched in 2006 and
> discontinued in 2008.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-05-01 20:03     ` Jim Mattson
@ 2023-05-01 21:11       ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2023-05-01 21:11 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Robert Hoo, zhuangel570, kvm

On Mon, May 01, 2023, Jim Mattson wrote:
> On Mon, May 1, 2023 at 7:51 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Sat, Apr 29, 2023, Robert Hoo wrote:
> > > On 4/27/2023 8:38 PM, zhuangel570 wrote:
> > > > - kvm_vm_create_worker_thread introduce tail latency more than 100ms.
> > > >    This function was called when create "kvm-nx-lpage-recovery" kthread when
> > > >    create a new VM, this patch was introduced to recovery large page to relief
> > > >    performance loss caused by software mitigation of ITLB_MULTIHIT, see
> > > >    b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
> > > >    ("kvm: x86: mmu: Recovery of shattered NX large pages").
> > > >
> > > Yes, this kthread is for NX-HugePage feature and NX-HugePage in turn is to
> > > SW mitigate itlb-multihit issue.
> > > However, HW level mitigation has been available for quite a while, you can
> > > check "/sys/devices/system/cpu/vulnerabilities/itlb_multihit" for your
> > > system's mitigation status.
> > > I believe most recent Intel CPUs have this HW mitigated (check
> > > MSR_ARCH_CAPABILITIES::IF_PSCHANGE_MC_NO), let alone non-Intel CPUs.
> > > But, the kvm_vm_create_worker_thread is still created anyway, nonsense I
> > > think. I previously had a internal patch getting rid of it but didn't get a
> > > chance to send out.
> >
> > For the NX hugepage mitation, I think it makes sense to restart the discussion
> > in the context of this thread: https://lore.kernel.org/all/ZBxf+ewCimtHY2XO@google.com
> >
> > TL;DR: I am open to providng an option to hard disable the mitigation, but there
> > needs to be sufficient justification, e.g. that the above 100ms latency is a
> > problem for real world deployments.
> 
> Whatever became of
> https://lore.kernel.org/kvm/20220613212523.3436117-1-bgardon@google.com/?

That's merged, but disabling the mitigation for a single VM doesn't stop the
worker thread (arguably that's a bug), let alone prevent creation of the worker
in the first place as KVM spawns the worker before the VM is exposed to
userspace.  I.e. there's no way for userspace to say "don't spawn workers, the
NX hugepage mitigation will *never* be enabled".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-05-01 14:51   ` Sean Christopherson
  2023-05-01 20:03     ` Jim Mattson
@ 2023-05-02  1:16     ` Robert Hoo
  2023-05-02  3:17       ` Jim Mattson
  1 sibling, 1 reply; 9+ messages in thread
From: Robert Hoo @ 2023-05-02  1:16 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: zhuangel570, kvm

On 5/1/2023 10:51 PM, Sean Christopherson wrote:
> 
> For the NX hugepage mitation, I think it makes sense to restart the discussion
> in the context of this thread: https://lore.kernel.org/all/ZBxf+ewCimtHY2XO@google.com
> 
OK, wasn't aware of that thread. Thanks for pointing out.
Just took a glance at it, I'll comment there.


> TL;DR: I am open to providng an option to hard disable the mitigation, 

Why hard disable? Isn't already "nx_huge_pages" parameter sufficient for this?
My aforementioned not-sent-out patch is to consider nx_huge_pages for 
creating the kthread or not, i.e. if nx_huge is enabled, start the kthread, 
if not, terminate the kthread; once re-enabled, spawn the kthread again...

> but there
> needs to be sufficient justification, e.g. that the above 100ms latency is a
> problem for real world deployments.
> 
Ah, I was objected by similar reason: the kthread does nothing if 
nx_huge_pages = false, it does no harm. Therefore I put the patch aside.

For the justification from real world, I guess Zhuangel570 can say more.

>> As more and more old CPUs retires, I think NX-HugePage code will become more
>> and more minority code path/situation, and be refactored out eventually one
>> day.
> 
> Heh, yeah, one day.  But "one day" is likely 10+ years away.  Intel discontinuing
> a CPU has practically zero relevance to KVM removing support a CPU, e.g. KVM still
> supports the original Core CPUs from ~2006, which were launched in 2006 and
> discontinued in 2008.

OK, got it.
Why does KVM still FULLY support so old CPUs? Any real world users? What's 
the rational/necessity? even if it's already EOL by manufacture.
My thought was that each new generation of CPU will linger in CSP's data 
center for 3~4 yrs.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-05-02  1:16     ` Robert Hoo
@ 2023-05-02  3:17       ` Jim Mattson
  2023-05-02  6:12         ` Robert Hoo
  0 siblings, 1 reply; 9+ messages in thread
From: Jim Mattson @ 2023-05-02  3:17 UTC (permalink / raw)
  To: Robert Hoo; +Cc: Sean Christopherson, zhuangel570, kvm

On Mon, May 1, 2023 at 6:16 PM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
>
> On 5/1/2023 10:51 PM, Sean Christopherson wrote:
> >
> > For the NX hugepage mitation, I think it makes sense to restart the discussion
> > in the context of this thread: https://lore.kernel.org/all/ZBxf+ewCimtHY2XO@google.com
> >
> OK, wasn't aware of that thread. Thanks for pointing out.
> Just took a glance at it, I'll comment there.
>
>
> > TL;DR: I am open to providng an option to hard disable the mitigation,
>
> Why hard disable? Isn't already "nx_huge_pages" parameter sufficient for this?
> My aforementioned not-sent-out patch is to consider nx_huge_pages for
> creating the kthread or not, i.e. if nx_huge is enabled, start the kthread,
> if not, terminate the kthread; once re-enabled, spawn the kthread again...
>
> > but there
> > needs to be sufficient justification, e.g. that the above 100ms latency is a
> > problem for real world deployments.
> >
> Ah, I was objected by similar reason: the kthread does nothing if
> nx_huge_pages = false, it does no harm. Therefore I put the patch aside.
>
> For the justification from real world, I guess Zhuangel570 can say more.
>
> >> As more and more old CPUs retires, I think NX-HugePage code will become more
> >> and more minority code path/situation, and be refactored out eventually one
> >> day.
> >
> > Heh, yeah, one day.  But "one day" is likely 10+ years away.  Intel discontinuing
> > a CPU has practically zero relevance to KVM removing support a CPU, e.g. KVM still
> > supports the original Core CPUs from ~2006, which were launched in 2006 and
> > discontinued in 2008.
>
> OK, got it.
> Why does KVM still FULLY support so old CPUs? Any real world users? What's
> the rational/necessity? even if it's already EOL by manufacture.
> My thought was that each new generation of CPU will linger in CSP's data
> center for 3~4 yrs.

Hobbyists drive the rationale for what kvm supports, not CSPs.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-05-02  3:17       ` Jim Mattson
@ 2023-05-02  6:12         ` Robert Hoo
  0 siblings, 0 replies; 9+ messages in thread
From: Robert Hoo @ 2023-05-02  6:12 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Sean Christopherson, zhuangel570, kvm

Jim Mattson <jmattson@google.com> 于2023年5月2日周二 11:17写道：
>
> Hobbyists drive the rationale for what kvm supports, not CSPs.

Got it, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Latency issues inside KVM.
  2023-04-29  3:28 ` Robert Hoo
  2023-05-01 14:51   ` Sean Christopherson
@ 2023-05-04 13:32   ` zhuangel570
  1 sibling, 0 replies; 9+ messages in thread
From: zhuangel570 @ 2023-05-04 13:32 UTC (permalink / raw)
  To: Robert Hoo; +Cc: kvm

Thanks, Robert, for the information about hardware mitigation, I looked up more
information about ITLB Multihit issue, seems the issue not exist after
Kaby Lake.

As you said, maybe we should skip start the kthread where an SW mitigation is
not needed, such as hardware mitigation is available.

On Sat, Apr 29, 2023 at 11:28 AM Robert Hoo <robert.hoo.linux@gmail.com> wrote:
>
> On 4/27/2023 8:38 PM, zhuangel570 wrote:
> > Hi
> >
> > We found some latency issue in high-density and high-concurrency scenarios, we
> > are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> > block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> > and register irqfd, after trace with funclatency (a tool of bcc-tools,
> > https://github.com/iovisor/bcc), we found the latency introduced by following
> > functions:
> >
> > - irq_bypass_register_consumer introduce more than 60ms per VM.
> >    This function was called when registering irqfd, the function will register
> >    irqfd as consumer to irqbypass, wait for connecting from irqbypass producers,
> >    like VFIO or VDPA. In our test, one irqfd register will get about 4ms
> >    latency, and 5 devices with total 16 irqfd will introduce more than 60ms
> >    latency.
> >
> > - kvm_vm_create_worker_thread introduce tail latency more than 100ms.
> >    This function was called when create "kvm-nx-lpage-recovery" kthread when
> >    create a new VM, this patch was introduced to recovery large page to relief
> >    performance loss caused by software mitigation of ITLB_MULTIHIT, see
> >    b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation") and 1aa9b9572b10
> >    ("kvm: x86: mmu: Recovery of shattered NX large pages").
> >
> Yes, this kthread is for NX-HugePage feature and NX-HugePage in turn is to
> SW mitigate itlb-multihit issue.
> However, HW level mitigation has been available for quite a while, you can
> check "/sys/devices/system/cpu/vulnerabilities/itlb_multihit" for your
> system's mitigation status.
> I believe most recent Intel CPUs have this HW mitigated (check
> MSR_ARCH_CAPABILITIES::IF_PSCHANGE_MC_NO), let alone non-Intel CPUs.
> But, the kvm_vm_create_worker_thread is still created anyway, nonsense I
> think. I previously had a internal patch getting rid of it but didn't get a
> chance to send out.
>
> As more and more old CPUs retires, I think NX-HugePage code will become
> more and more minority code path/situation, and be refactored out
> eventually one day.



-- 
——————————
   zhuangel570
——————————

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-04 13:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 12:38 Latency issues inside KVM zhuangel570
2023-04-29  3:28 ` Robert Hoo
2023-05-01 14:51   ` Sean Christopherson
2023-05-01 20:03     ` Jim Mattson
2023-05-01 21:11       ` Sean Christopherson
2023-05-02  1:16     ` Robert Hoo
2023-05-02  3:17       ` Jim Mattson
2023-05-02  6:12         ` Robert Hoo
2023-05-04 13:32   ` zhuangel570

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.