From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 217379] Latency issues in irq_bypass_register_consumer
Date: Mon, 01 May 2023 16:51:30 +0000 [thread overview]
Message-ID: <bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217379-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=217379
--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Fri, Apr 28, 2023, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217379
>
> Bug ID: 217379
> Summary: Latency issues in irq_bypass_register_consumer
> Product: Virtualization
> Version: unspecified
> Hardware: Intel
> OS: Linux
> Status: NEW
> Severity: normal
> Priority: P3
> Component: kvm
> Assignee: virtualization_kvm@kernel-bugs.osdl.org
> Reporter: zhuangel570@gmail.com
> Regression: No
>
> We found some latency issue in high-density and high-concurrency scenarios,
> we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and
> block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM
> and register irqfd, after trace with funclatency (a tool of bcc-tools,
> https://github.com/iovisor/bcc), we found the latency introduced by following
> functions:
>
> - irq_bypass_register_consumer introduce more than 60ms per VM.
> This function was called when registering irqfd, the function will register
> irqfd as consumer to irqbypass, wait for connecting from irqbypass
> producers,
> like VFIO or VDPA. In our test, one irqfd register will get about 4ms
> latency, and 5 devices with total 16 irqfd will introduce more than 60ms
> latency.
>
> Here is a simple case, which can emulate the latency issue (the real latency
> is lager). The case create 800 VM as background do nothing, then repeatedly
> create 20 VM then destroy them after 400ms, every VM will do simple thing,
> create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every
> device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency,
> you
> will reproduce such kind latency issue. Here is a trace log on Xeon(R)
> Platinum
> 8255C server (96C, 2 sockets) with linux 6.2.20.
>
> Reproduce Case
>
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c
> Reproduce log
> https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log
>
> To fix these latencies, I didn't have a graceful method, just simple ideas
> is give user a chance to avoid these latencies, like new flag to disable
> irqbypass for each irqfd.
>
> Any suggestion to fix the issue if welcomed.
Looking at the code, it's not surprising that irq_bypass_register_consumer()
can
exhibit high latencies. The producers and consumers are stored in simple
linked
lists, and a single mutex is held while traversing the lists *and* connecting
a consumer to a producer (and vice versa).
There are two obvious optimizations that can be done to reduce latency in
irq_bypass_register_consumer():
- Use a different data type to track the producers and consumers so that
lookups
don't require a linear walk. AIUI, the "tokens" used to match producers
and
consumers are just kernel pointers, so I _think_ XArray would perform
reasonably
well.
- Connect producers and consumers outside of a global mutex.
Unfortunately, because .add_producer() and .add_consumer() can fail, and
because
connections can be established by adding a consumer _or_ a producer, getting
the
locking right without a global mutex is quite difficult. It's certainly doable
to move the (dis)connect logic out of a global lock, but it's going to require
a
dedicated effort, i.e. not something that can be sketched out in a few minutes
(I played around with the code for the better part of an hour trying to do just
that and kept running into edge case race conditions).
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2023-05-01 16:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-28 7:27 [Bug 217379] New: Latency issues in irq_bypass_register_consumer bugzilla-daemon
2023-05-01 16:51 ` Sean Christopherson
2023-07-17 11:58 ` Like Xu
2023-07-17 15:25 ` Paolo Bonzini
2023-07-18 3:43 ` Like Xu
2023-07-18 9:51 ` Paolo Bonzini
2023-05-01 16:51 ` bugzilla-daemon [this message]
2023-05-11 9:59 ` [Bug 217379] " bugzilla-daemon
2025-04-04 21:26 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-217379-28872-YDqpbtJUh7@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.