public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 217379] Latency issues in irq_bypass_register_consumer
Date: Mon, 01 May 2023 16:51:30 +0000	[thread overview]
Message-ID: <bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217379-28872@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=217379

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Fri, Apr 28, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217379
> 
>             Bug ID: 217379
>            Summary: Latency issues in irq_bypass_register_consumer
>            Product: Virtualization
>            Version: unspecified
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: kvm
>           Assignee: virtualization_kvm@kernel-bugs.osdl.org
>           Reporter: zhuangel570@gmail.com
>         Regression: No
> 
> We found some latency issue in high-density and high-concurrency scenarios,
> we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> and register irqfd, after trace with funclatency (a tool of bcc-tools,
> https://github.com/iovisor/bcc), we found the latency introduced by following
> functions:
> 
> - irq_bypass_register_consumer introduce more than 60ms per VM.
>   This function was called when registering irqfd, the function will register
>   irqfd as consumer to irqbypass, wait for connecting from irqbypass
>   producers,
>   like VFIO or VDPA. In our test, one irqfd register will get about 4ms
>   latency, and 5 devices with total 16 irqfd will introduce more than 60ms
>   latency.
> 
> Here is a simple case, which can emulate the latency issue (the real latency
> is lager). The case create 800 VM as background do nothing, then repeatedly
> create 20 VM then destroy them after 400ms, every VM will do simple thing,
> create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
> device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency,
> you
> will reproduce such kind latency issue. Here is a trace log on Xeon(R)
> Platinum
> 8255C server (96C, 2 sockets) with linux 6.2.20.
> 
> Reproduce Case
>
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
> Reproduce log
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log
> 
> To fix these latencies, I didn't have a graceful method, just simple ideas
> is give user a chance to avoid these latencies, like new flag to disable
> irqbypass for each irqfd.
> 
> Any suggestion to fix the issue if welcomed.

Looking at the code, it's not surprising that irq_bypass_register_consumer()
can
exhibit high latencies.  The producers and consumers are stored in simple
linked
lists, and a single mutex is held while traversing the lists *and* connecting
a consumer to a producer (and vice versa).

There are two obvious optimizations that can be done to reduce latency in
irq_bypass_register_consumer():

   - Use a different data type to track the producers and consumers so that
lookups
     don't require a linear walk.  AIUI, the "tokens" used to match producers
and
     consumers are just kernel pointers, so I _think_ XArray would perform
reasonably
     well.

   - Connect producers and consumers outside of a global mutex.

Unfortunately, because .add_producer() and .add_consumer() can fail, and
because
connections can be established by adding a consumer _or_ a producer, getting
the
locking right without a global mutex is quite difficult.  It's certainly doable
to move the (dis)connect logic out of a global lock, but it's going to require
a
dedicated effort, i.e. not something that can be sketched out in a few minutes
(I played around with the code for the better part of an hour trying to do just
that and kept running into edge case race conditions).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2023-05-01 16:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-28  7:27 [Bug 217379] New: Latency issues in irq_bypass_register_consumer bugzilla-daemon
2023-05-01 16:51 ` Sean Christopherson
2023-07-17 11:58   ` Like Xu
2023-07-17 15:25     ` Paolo Bonzini
2023-07-18  3:43       ` Like Xu
2023-07-18  9:51         ` Paolo Bonzini
2023-05-01 16:51 ` bugzilla-daemon [this message]
2023-05-11  9:59 ` [Bug 217379] " bugzilla-daemon
2025-04-04 21:26 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox