From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EE79C4167B for ; Mon, 26 Dec 2022 10:09:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231658AbiLZKJN (ORCPT ); Mon, 26 Dec 2022 05:09:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229643AbiLZKJM (ORCPT ); Mon, 26 Dec 2022 05:09:12 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBEA6B91 for ; Mon, 26 Dec 2022 02:09:10 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id AA786B802BE for ; Mon, 26 Dec 2022 10:09:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 50BF3C433EF; Mon, 26 Dec 2022 10:09:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672049348; bh=fUH0/nqWbICMXF5To7PRfA6XTb37qIj/+e/XMPfX+JU=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=lAwpFuDNLwa3Au3Fp+0j6FHTcEC+MIDd2WZDNAT20IxY5/H7+yirS7aFX6DR7R5O3 +hkVCYw+mc4DtQD9HApOjvoLSnKJTUXfy/U+ECdXJesnGiA+xJmdB180tsxfMRM7mS o9eit3BgWltHOJ6OhbSrim7Q9BFsbjPeA5PdGeUys37FncDeb6zN7FUtIQB26z1azg OBTznwNmemVeRXsKuzxvpNYX5i3vLzNHCdckFEG3VNxv1jBFQ8eXAYLOyXs488obKx BmPVl3rxL+PYpIQ3Hup4rhaqPRnJzPEx6gXIhaf4KUT8W8lzpRDkAdTJ/icCCPyur+ 6N9Z43vgEKfRA== Received: from host81-132-227-111.range81-132.btcentralplus.com ([81.132.227.111] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p9kPp-00FCPR-Vq; Mon, 26 Dec 2022 10:09:06 +0000 Date: Mon, 26 Dec 2022 10:07:43 +0000 Message-ID: <874jtifpg0.wl-maz@kernel.org> From: Marc Zyngier To: Shivam Kumar Cc: Sean Christopherson , pbonzini@redhat.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, Shaju Abraham , Manish Mishra , Anurag Madnawat Subject: Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus In-Reply-To: References: <20221113170507.208810-1-shivam.kumar1@nutanix.com> <20221113170507.208810-2-shivam.kumar1@nutanix.com> <86zgcpo00m.wl-maz@kernel.org> <18b66b42-0bb4-4b32-e92c-3dce61d8e6a4@nutanix.com> <86mt8iopb7.wl-maz@kernel.org> <86ilinqi3l.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 81.132.227.111 X-SA-Exim-Rcpt-To: shivam.kumar1@nutanix.com, seanjc@google.com, pbonzini@redhat.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, shaju.abraham@nutanix.com, manish.mishra@nutanix.com, anurag.madnawat@nutanix.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Sun, 25 Dec 2022 16:50:04 +0000, Shivam Kumar wrote: >=20 >=20 >=20 > On 08/12/22 1:00 pm, Shivam Kumar wrote: > >=20 > >=20 > > On 08/12/22 1:23 am, Sean Christopherson wrote: > >> On Wed, Dec 07, 2022, Marc Zyngier wrote: > >>> On Tue, 06 Dec 2022 06:22:45 +0000, > >>> Shivam Kumar wrote: > >>> You need to define the granularity of the counter, and account for > >>> each fault according to its mapping size. If an architecture has 16kB > >>> as the base page size, a 32MB fault (the size of the smallest block > >>> mapping) must bump the counter by 2048. That's the only way userspace > >>> can figure out what is going on. > >>=20 > >> I don't think that's true for the dirty logging case.=C2=A0 IIUC, when= a > >> memslot is > >> being dirty logged, KVM forces the memory to be mapped with > >> PAGE_SIZE granularity, > >> and that base PAGE_SIZE is fixed and known to userspace.=C2=A0 > >> I.e. accuracy is naturally > >> provided for this primary use case where accuracy really matters, > >> and so this is > >> effectively a documentation issue and not a functional issue. > >=20 > > So, does defining "count" as "the number of write permission faults" > > help in addressing the documentation issue? My understanding too is > > that for dirty logging, we will have uniform granularity. > >=20 > > Thanks. > >=20 > >>=20 > >>> Without that, you may as well add a random number to the counter, it > >>> won't be any worse. > >>=20 > >> The stat will be wildly inaccurate when dirty logging isn't > >> enabled, but that doesn't > >> necessarily make the stat useless, e.g. it might be useful as a > >> very rough guage > >> of which vCPUs are likely to be writing memory.=C2=A0 I do agree though > >> that the value > >> provided is questionable and/or highly speculative. > >>=20 > >>> [...] > >>>=20 > >>>>>>> If you introduce additional #ifdefery here, why are the additional > >>>>>>> fields in the vcpu structure unconditional? > >>>>>>=20 > >>>>>> pages_dirtied can be a useful information even if dirty quota > >>>>>> throttling is not used. So, I kept it unconditional based on > >>>>>> feedback. > >>>>>=20 > >>>>> Useful for whom? This creates an ABI for all architectures, and this > >>>>> needs buy-in from everyone. Personally, I think it is a pretty usel= ess > >>>>> stat. > >>>>=20 > >>>> When we started this patch series, it was a member of the kvm_run > >>>> struct. I made this a stat based on the feedback I received from the > >>>> reviews. If you think otherwise, I can move it back to where it was. > >>>=20 > >>> I'm certainly totally opposed to stats that don't have a clear use > >>> case. People keep piling random stats that satisfy their pet usage, > >>> and this only bloats the various structures for no overall benefit > >>> other than "hey, it might be useful". This is death by a thousand cut. > >>=20 > >> I don't have a strong opinion on putting the counter into kvm_run > >> as an "out" > >> fields vs. making it a state.=C2=A0 I originally suggested making it a > >> stat because > >> KVM needs to capture the information somewhere, so why not make it > >> a stat?=C2=A0 But > >> I am definitely much more cavalier when it comes to adding stats, > >> so I've no > >> objection to dropping the stat side of things. > >=20 > > I'll be skeptical about making it a stat if we plan to allow the > > userspace to reset it at will. > >=20 > >=20 > > Thank you so much for the comments. > >=20 > > Thanks, > > Shivam >=20 > Hi Marc, > Hi Sean, >=20 > Please let me know if there's any further question or feedback. My earlier comments still stand: the proposed API is not usable as a general purpose memory-tracking API because it counts faults instead of memory, making it inadequate except for the most trivial cases. And I cannot believe you were serious when you mentioned that you were happy to make that the API. This requires some serious work, and this series is not yet near a state where it could be merged. Thanks, M. --=20 Without deviation from the norm, progress is not possible.