From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2E4EC77B73 for ; Mon, 1 May 2023 16:51:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232231AbjEAQva (ORCPT ); Mon, 1 May 2023 12:51:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232249AbjEAQv3 (ORCPT ); Mon, 1 May 2023 12:51:29 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2057D10C4 for ; Mon, 1 May 2023 09:51:28 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-b9a7ddd9aceso5472714276.3 for ; Mon, 01 May 2023 09:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682959887; x=1685551887; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=C2rfiEmWDajPuoi1PDzr0CcD+6OVfLG8SyOWiFJYlOo=; b=NhrajK9MpJwQZo6Gx74SY9Ltv4xXnVecOUxhE1EV8+iAg3B/ZIvvbc/FDeFXXDjq4+ PoA2H8u+H96wStPyKw9mtJetPYnatI9rm9MQOGgvU3QfMhMGirbWsEvNpkNuSZzqxMNX mDdqSzaiKWmdDDUgDVZFVCCyqX4Q+nOG1lJXrRzmBGs07VnIlhcjDaAO8GoyVJdgsrbC GiScw0nDtN91lRt31cKmCjUmuz4IkeiycLLvJ4PR1e//fOLTLes3NM82gdEPUkGZrNTM BcuI0MdiJyKI3H9f0PZx+dRXPHWO2xRAvgNWsyE5fhjNufYQhXBesZBy/Ph+oFNjlqjc 5b/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682959887; x=1685551887; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=C2rfiEmWDajPuoi1PDzr0CcD+6OVfLG8SyOWiFJYlOo=; b=Uw6/b9CYKhrpoDoIWAfwBVZIxHUiNkwNq/PyVb+dSqCWQltx1UDtVBzE2U7Ox+fAmE 1obXeKwqetggTgs+pWBa0CNIwV2u91S8fmYeH+tzMMO7F9kBoNK7IQz2wku7BR3dqoj3 BwMZ173X/DnjyRdnafYka+nDVntFfBmIRfZFElQ3Ds6KNcKY3eGHBqLYE9BXWKelq7jL UTrlW+KC9aL37KW4fx6rvMIPqD8lUtXwTykUojMFf3ygnyIEbsrXqotOBiKHFJ+Rpr0l 1Okb5mO13d++PquFGvAmqeqW8Jwm2C8Lw+SdA4/lf0ZB8KiPp/igXwxP6x4+Onu0W+yl ddSg== X-Gm-Message-State: AC+VfDz7MDVFQ1wEGoHx0hJpJUhUg0Vl5HAUiPBYvVujiIlivGmmMHUb 5meDp4WXGNEXC8ecD9TqZQnshKrH9Sc= X-Google-Smtp-Source: ACHHUZ6kOfYcWCuAHhXf1CyoEH7OyiUtQBZYequOpzcpklhI0P2OXyILWSt+76V1UZX6M3zPHlfPIV75kjk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:2c8:0:b0:b8f:6b3b:8a0a with SMTP id 191-20020a2502c8000000b00b8f6b3b8a0amr8156781ybc.6.1682959887273; Mon, 01 May 2023 09:51:27 -0700 (PDT) Date: Mon, 1 May 2023 09:51:25 -0700 In-Reply-To: Mime-Version: 1.0 References: Message-ID: Subject: Re: [Bug 217379] New: Latency issues in irq_bypass_register_consumer From: Sean Christopherson To: bugzilla-daemon@kernel.org Cc: kvm@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Fri, Apr 28, 2023, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217379 > > Bug ID: 217379 > Summary: Latency issues in irq_bypass_register_consumer > Product: Virtualization > Version: unspecified > Hardware: Intel > OS: Linux > Status: NEW > Severity: normal > Priority: P3 > Component: kvm > Assignee: virtualization_kvm@kernel-bugs.osdl.org > Reporter: zhuangel570@gmail.com > Regression: No > > We found some latency issue in high-density and high-concurrency scenarios, > we are using cloud hypervisor as vmm for lightweight VM, using VIRTIO net and > block for VM. In our test, we got about 50ms to 100ms+ latency in creating VM > and register irqfd, after trace with funclatency (a tool of bcc-tools, > https://github.com/iovisor/bcc), we found the latency introduced by following > functions: > > - irq_bypass_register_consumer introduce more than 60ms per VM. > This function was called when registering irqfd, the function will register > irqfd as consumer to irqbypass, wait for connecting from irqbypass producers, > like VFIO or VDPA. In our test, one irqfd register will get about 4ms > latency, and 5 devices with total 16 irqfd will introduce more than 60ms > latency. > > Here is a simple case, which can emulate the latency issue (the real latency > is lager). The case create 800 VM as background do nothing, then repeatedly > create 20 VM then destroy them after 400ms, every VM will do simple thing, > create in kernel irq chip, and register 15 riqfd (emulate 5 devices and every > device has 3 irqfd), just trace the "irq_bypass_register_consumer" latency, you > will reproduce such kind latency issue. Here is a trace log on Xeon(R) Platinum > 8255C server (96C, 2 sockets) with linux 6.2.20. > > Reproduce Case > https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/kvm_irqfd_fork.c > Reproduce log > https://github.com/zhuangel/misc/blob/main/test/kvm_irqfd_fork/test.log > > To fix these latencies, I didn't have a graceful method, just simple ideas > is give user a chance to avoid these latencies, like new flag to disable > irqbypass for each irqfd. > > Any suggestion to fix the issue if welcomed. Looking at the code, it's not surprising that irq_bypass_register_consumer() can exhibit high latencies. The producers and consumers are stored in simple linked lists, and a single mutex is held while traversing the lists *and* connecting a consumer to a producer (and vice versa). There are two obvious optimizations that can be done to reduce latency in irq_bypass_register_consumer(): - Use a different data type to track the producers and consumers so that lookups don't require a linear walk. AIUI, the "tokens" used to match producers and consumers are just kernel pointers, so I _think_ XArray would perform reasonably well. - Connect producers and consumers outside of a global mutex. Unfortunately, because .add_producer() and .add_consumer() can fail, and because connections can be established by adding a consumer _or_ a producer, getting the locking right without a global mutex is quite difficult. It's certainly doable to move the (dis)connect logic out of a global lock, but it's going to require a dedicated effort, i.e. not something that can be sketched out in a few minutes (I played around with the code for the better part of an hour trying to do just that and kept running into edge case race conditions).