From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A68AC433FE for ; Thu, 30 Sep 2021 18:05:58 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id BD5C661A2F for ; Thu, 30 Sep 2021 18:05:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BD5C661A2F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 1739C4B099; Thu, 30 Sep 2021 14:05:57 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@google.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id es9+iEdXnTem; Thu, 30 Sep 2021 14:05:55 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 8CEDC4B0B6; Thu, 30 Sep 2021 14:05:55 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id A28004A3BF for ; Thu, 30 Sep 2021 14:05:54 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mWQ7dGnQBbui for ; Thu, 30 Sep 2021 14:05:53 -0400 (EDT) Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 3949549F5F for ; Thu, 30 Sep 2021 14:05:53 -0400 (EDT) Received: by mail-pj1-f41.google.com with SMTP id oj16so1493154pjb.1 for ; Thu, 30 Sep 2021 11:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=YuOwAQkWjXznFXi6rLlMpeEj3pgnoZUS9VAR7tSyz6I=; b=YrTTbAhpXtued6yk9uC6PVSQ+GtYLJbkKmrVpJPwt3QklBL2b+uNE8TXilaT/DYofo 5Ie58IWMhHlTNlQUvr7bSKpZ4Aa9nSPeW3uWcc/UuVk0lQbxlk3eJ11vqlcKtW19ePN7 lchqwiACnk5e+wUAyBPi+5vGCjBm/OxlsGLpIycw3GkLAK09z9PlzZoJEmx1oQEM6dV1 19E/sIT5irQ0E7EzW6WXQe0lQ1D3N+hdvAy2L6Bgmic7wWu17g+KtGSqKjf5bqbqS3w5 XlqGr/D7HrR5xosTtqV7kalARiXpUXKIvqeCM3brW1ojq5kJl3g06KI6uzgAPbnqOKXS KkZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YuOwAQkWjXznFXi6rLlMpeEj3pgnoZUS9VAR7tSyz6I=; b=eE9m5XPx5E/Rda1yEhUf9Xf+24J8dMRaglczE0nSFaCukMsspisl4MUmbVSe+yHUYa OCt5Z1axFkXuHn+oRDlG1GYJzKl5QczXCbFBFwFp1gtaTX8ohGuPU6ZcuB8fS0oMnUGB xGsXYSBrRfuyOc94VLHeXmk3e7nyuYYSKJc0x/5lie76cf26eFV69nUiBnKqjsmQRU1g vP1il/iu+2pAd2JxG2PgiLsnZizhhR813EQvdTuv0+Gcs1m+RwGd5LuisIDR9QkhE6LA PMtm20J0qIUlfj4QnNAMl4uMXSqADXciK9covgOjUQyIH7OgMxlGnbrydhYc/ffSCVbm rkhA== X-Gm-Message-State: AOAM532CFr9jI2XF6o7WWVTcxsaoIRuSk03U5CNCtrfRmS67KunbYSUs 1HzKNyd6Wx0nMZ3R1E0q/xdu4A== X-Google-Smtp-Source: ABdhPJwHSsswlwCvX1kziArEs9ci87GuqE1qECOm/i/AFl04pgRGM/ROFtbputz8GVgxVVVUBnO7CA== X-Received: by 2002:a17:902:868d:b0:13d:dfa7:f3f2 with SMTP id g13-20020a170902868d00b0013ddfa7f3f2mr5590566plo.30.1633025151825; Thu, 30 Sep 2021 11:05:51 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id n66sm3845886pfd.21.2021.09.30.11.05.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Sep 2021 11:05:51 -0700 (PDT) Date: Thu, 30 Sep 2021 18:05:47 +0000 From: Sean Christopherson To: Marc Zyngier Subject: Re: [PATCH v1 3/3] KVM: arm64: Add histogram stats for handling time of arch specific exit reasons Message-ID: References: <20210922010851.2312845-1-jingzhangos@google.com> <20210922010851.2312845-3-jingzhangos@google.com> <87czp0voqg.wl-maz@kernel.org> <875yusv3vm.wl-maz@kernel.org> <87zgrurwgq.wl-maz@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87zgrurwgq.wl-maz@kernel.org> Cc: Aaron Lewis , KVM , Venkatesh Srinivas , Peter Shier , Ben Gardon , David Matlack , Paolo Bonzini , Will Deacon , KVMARM , Jim Mattson X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Thu, Sep 30, 2021, Marc Zyngier wrote: > On Thu, 23 Sep 2021 00:22:12 +0100, David Matlack wrote: > > > > On Wed, Sep 22, 2021 at 11:53 AM Marc Zyngier wrote: > > > And I'm all for adding these hooks where it matters as long as they > > > are not considered ABI and don't appear in /sys/debug/tracing (in > > > general, no userspace visibility). On our side (GCP), we very much want these to be ABI so that upgrading the kernel/KVM doesn't break the userspace collection of stats. > > > The scheduler is a interesting example of this, as it exposes all sort > > > of hooks for people to look under the hood. No user of the hook? No > > > overhead, no additional memory used. I may have heard that Android > > > makes heavy use of this. > > > > > > Because I'm pretty sure that whatever stat we expose, every cloud > > > vendor will want their own variant, so we may just as well put the > > > matter in their own hands. > > > > I think this can be mitigated by requiring sufficient justification > > when adding a new stat to KVM. There has to be a real use-case and it > > has to be explained in the changelog. If a stat has a use-case for one > > cloud provider, it will likely be useful to other cloud providers as > > well. > > My (limited) personal experience is significantly different. The > diversity of setups make the set of relevant stats pretty hard to > guess (there isn't much in common if you use KVM to strictly partition > a system vs oversubscribing it). To some extent that's true for GCP and x86[*]. I think our position can be succinctly described as a shotgun approach; we don't know exactly which stats will be useful when, so grab as many as we can within reason. As mentioned earlier, burning 8kb+ of stats per vCPU is perfectly ok for our use cases because it's more or less negligible compared to the amount of memory assigned to VMs. This is why I think we should explore an approach that allows for enabling/disabling groups of stats at compile time. [*] For x86 specifically, I think it's a bit easier to predict which stats are useful because KVM x86 is responsible for a smaller set of functionality compared to arm64, e.g. nearly everything at the system/platform level is handled by userspace, and so there are natural exit points to userspace for many of the intersesting touch points. The places where we need stats are where userspace doesn't have any visibility into what KVM is doing. > > > I also wouldn't discount BPF as a possibility. You could perfectly > > > have permanent BPF programs running from the moment you boot the > > > system, and use that to generate your histograms. This isn't necessary > > > a one off, debug only solution. > > > > > > > Logging is a similar story, e.g. using _ratelimited() printk to aid > > > > debug works well when there are a very limited number of VMs and > > > > there is a human that can react to arbitrary kernel messages, but > > > > it's basically useless when there are 10s or 100s of VMs and taking > > > > action on a kernel message requires a prior knowledge of the > > > > message. > > > > > > I'm not sure logging is remotely the same. For a start, the kernel > > > should not log anything unless something has oopsed (and yes, I still > > > have some bits to clean on the arm64 side). I'm not even sure what you > > > would want to log. I'd like to understand the need here, because I > > > feel like I'm missing something. I think we're generally on the same page: kernel logging bad. x86 has historically used printks to "alert" userspace to notable behavior, e.g. when KVM knowingly emulates an instruction incorrectly, or when the guest crashes. The incorrect instruction emulation isn't all that interesting since we're well aware of KVM's shortcomings, but guest crashes are an instance where "logging" _to userspace_ is very desirable, e.g. so that we can identify trends that may be due to bugs in the host, or to provide the customer with additional data to help them figure out what's wrong on their end. "logging" in quotes because it doesn't necessarily have to be traditional logging, e.g. for the crash cases, KVM already exits to userspace so the "hook" is there, what's lacking is a way for KVM to dump additional information about the crash. I brought up logging here purely to highlight that KVM, at least on the x86 side, lacks infrastructure for running at scale, likely because it hasn't been needed in the past. > > > > I'm certainly not expecting other people to solve our challenges, > > > > and I fully appreciate that there are many KVM users that don't care > > > > at all about scalability, but I'm hoping we can get the community at > > > > large, and especially maintainers and reviewers, to also consider > > > > at-scale use cases when designing, implementing, reviewing, etc... > > > > > > My take is that scalability has to go with flexibility. Anything that > > > gets hardcoded will quickly become a burden: I definitely regret > > > adding the current KVM trace points, as they don't show what I need, > > > and I can't change them as they are ABI. > > > > This brings up a good discussion topic: To what extent are the KVM > > stats themselves an ABI? I don't think this is documented anywhere. > > The API itself is completely dynamic and does not hardcode a list of > > stats or metadata about them. Userspace has to read stats fd to see > > what's there. > > > > Fwiw we just deleted the lpages stat without any drama. > > Maybe the new discoverable interface makes dropping some stats > easier. But it still remains that what is useless for someone has the > potential of being crucial for someone else. I wouldn't be surprised > if someone would ask for this stat back once they upgrade to a recent > host kernel, probably in a couple of years from now. lpages is bad example, it wasn't deleted so much as it was replaced by stats for each page size (4kb, 2mb, 1gb). I don't think x86 has any (recent) examples of a stat being truly dropped (though there definitely some that IMO are quite useless). _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm