From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 217379] Latency issues in irq_bypass_register_consumer
Date: Mon, 01 May 2023 16:51:30 +0000 [thread overview]
Message-ID: <bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217379-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=217379
--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Fri, Apr 28, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217379
>
> Bug ID: 217379
> Summary: Latency issues in irq_bypass_register_consumer
> Product: Virtualization
> Version: unspecified
> Hardware: Intel
> OS: Linux
> Status: NEW
> Severity: normal
> Priority: P3
> Component: kvm
> Assignee: virtualization_kvm@kernel-bugs.osdl.org
> Reporter: zhuangel570@gmail.com
> Regression: No
>
> We found some latency issue in high-density and high-concurrency scenarios,
> we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> and register irqfd, after trace with funclatency (a tool of bcc-tools,
> https://github.com/iovisor/bcc), we found the latency introduced by following
> functions:
>
> - irq_bypass_register_consumer introduce more than 60ms per VM.
> This function was called when registering irqfd, the function will register
> irqfd as consumer to irqbypass, wait for connecting from irqbypass
> producers,
> like VFIO or VDPA. In our test, one irqfd register will get about 4ms
> latency, and 5 devices with total 16 irqfd will introduce more than 60ms
> latency.
>
> Here is a simple case, which can emulate the latency issue (the real latency
> is lager). The case create 800 VM as background do nothing, then repeatedly
> create 20 VM then destroy them after 400ms, every VM will do simple thing,
> create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
> device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency,
> you
> will reproduce such kind latency issue. Here is a trace log on Xeon(R)
> Platinum
> 8255C server (96C, 2 sockets) with linux 6.2.20.
>
> Reproduce Case
>
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
> Reproduce log
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log
>
> To fix these latencies, I didn't have a graceful method, just simple ideas
> is give user a chance to avoid these latencies, like new flag to disable
> irqbypass for each irqfd.
>
> Any suggestion to fix the issue if welcomed.
Looking at the code, it's not surprising that irq_bypass_register_consumer()
can
exhibit high latencies. The producers and consumers are stored in simple
linked
lists, and a single mutex is held while traversing the lists *and* connecting
a consumer to a producer (and vice versa).
There are two obvious optimizations that can be done to reduce latency in
irq_bypass_register_consumer():
- Use a different data type to track the producers and consumers so that
lookups
don't require a linear walk. AIUI, the "tokens" used to match producers
and
consumers are just kernel pointers, so I _think_ XArray would perform
reasonably
well.
- Connect producers and consumers outside of a global mutex.
Unfortunately, because .add_producer() and .add_consumer() can fail, and
because
connections can be established by adding a consumer _or_ a producer, getting
the
locking right without a global mutex is quite difficult. It's certainly doable
to move the (dis)connect logic out of a global lock, but it's going to require
a
dedicated effort, i.e. not something that can be sketched out in a few minutes
(I played around with the code for the better part of an hour trying to do just
that and kept running into edge case race conditions).
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2023-05-01 16:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-28 7:27 [Bug 217379] New: Latency issues in irq_bypass_register_consumer bugzilla-daemon
2023-05-01 16:51 ` Sean Christopherson
2023-07-17 11:58 ` Like Xu
2023-07-17 15:25 ` Paolo Bonzini
2023-07-18 3:43 ` Like Xu
2023-07-18 9:51 ` Paolo Bonzini
2023-05-01 16:51 ` bugzilla-daemon [this message]
2023-05-11 9:59 ` [Bug 217379] " bugzilla-daemon
2025-04-04 21:26 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox