From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43F2DC352A1 for ; Wed, 7 Dec 2022 16:44:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229989AbiLGQoc (ORCPT ); Wed, 7 Dec 2022 11:44:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35952 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230028AbiLGQoZ (ORCPT ); Wed, 7 Dec 2022 11:44:25 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FAE661746 for ; Wed, 7 Dec 2022 08:44:24 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F1CEEB81F30 for ; Wed, 7 Dec 2022 16:44:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 61405C433C1; Wed, 7 Dec 2022 16:44:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670431461; bh=zmCkclrRUCBzXngwQwzU/xVJaRBK3F8IRYex269YxTc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=gXcyf3ScbVgMx63WfboXd6CajSQbmO+YJCT1RIKkFtwFXszbdeKJ/ZI9NqsZsuCNb wakmx0zVWe1APYobEImGoIeoIc3R5WmBf6FmpwTMIywJIBE5tOL83U7+bVO6HwuBdD 3JeTlUgZpuPo9wq/sZNIZ1Or30ur4++ODTj+Fh/bP3z9ZIoGJFEozzhkkXU6nCOiTD VcCNYbsvUHu3l033BwQGwyEDmn6jcyeq+rvg0SW2tLGOO1SkHSTFdY6Glua0nmbX1L Al+fXOKXwW36TkxBJoIUzWhFcu56LOibUwoYZhFLLmko5s0qB/Z9+u6mzVMQNDjr3P I5bVU59tSkOKw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p2xWt-00BAQE-4L; Wed, 07 Dec 2022 16:44:19 +0000 Date: Wed, 07 Dec 2022 16:44:14 +0000 Message-ID: <86ilinqi3l.wl-maz@kernel.org> From: Marc Zyngier To: Shivam Kumar Cc: pbonzini@redhat.com, seanjc@google.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, Shaju Abraham , Manish Mishra , Anurag Madnawat Subject: Re: [PATCH v7 1/4] KVM: Implement dirty quota-based throttling of vcpus In-Reply-To: References: <20221113170507.208810-1-shivam.kumar1@nutanix.com> <20221113170507.208810-2-shivam.kumar1@nutanix.com> <86zgcpo00m.wl-maz@kernel.org> <18b66b42-0bb4-4b32-e92c-3dce61d8e6a4@nutanix.com> <86mt8iopb7.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: shivam.kumar1@nutanix.com, pbonzini@redhat.com, seanjc@google.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, shaju.abraham@nutanix.com, manish.mishra@nutanix.com, anurag.madnawat@nutanix.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, 06 Dec 2022 06:22:45 +0000, Shivam Kumar wrote: > > > > On 22/11/22 11:16 pm, Marc Zyngier wrote: > > On Fri, 18 Nov 2022 09:47:50 +0000, > > Shivam Kumar wrote: > >> > >> > >> > >> On 18/11/22 12:56 am, Marc Zyngier wrote: > >>> On Sun, 13 Nov 2022 17:05:06 +0000, > >>> Shivam Kumar wrote: > >>>> > >>>> + count: the current count of pages dirtied by the VCPU, can be > >>>> + skewed based on the size of the pages accessed by each vCPU. > >>> > >>> How can userspace make a decision on the amount of dirtying this > >>> represent if this doesn't represent a number of base pages? Or are you > >>> saying that this only counts the number of permission faults that have > >>> dirtied pages? > >> > >> Yes, this only counts the number of permission faults that have > >> dirtied pages. > > > > So how can userspace consistently set a quota of dirtied memory? This > > has to account for the size that has been faulted, because that's all > > userspace can reason about. Remember that at least on arm64, we're > > dealing with 3 different base page sizes, and many more large page > > sizes. > > I understand that this helps only when the majority of dirtying is > happening for the same page size. In our use case (VM live migration), > even large page is broken into 4k pages at first dirty. If required in > future, we can add individual counters for different page sizes. > > Thanks for pointing this out. Adding counters for different page sizes won't help. It will only make the API more complex and harder to reason about. arm64 has 3 base page sizes, up to 5 levels of translation, the ability to have block mappings at most levels, plus nice features such as contiguous hints that we treat as another block size. If you're lucky, that's about a dozen different sizes. Good luck with that. You really need to move away from your particular, 4kB centric, live migration use case. This is about throttling a vcpu based on how much memory it dirties. Not about the number of page faults it takes. You need to define the granularity of the counter, and account for each fault according to its mapping size. If an architecture has 16kB as the base page size, a 32MB fault (the size of the smallest block mapping) must bump the counter by 2048. That's the only way userspace can figure out what is going on. Without that, you may as well add a random number to the counter, it won't be any worse. [...] > >>> If you introduce additional #ifdefery here, why are the additional > >>> fields in the vcpu structure unconditional? > >> > >> pages_dirtied can be a useful information even if dirty quota > >> throttling is not used. So, I kept it unconditional based on > >> feedback. > > > > Useful for whom? This creates an ABI for all architectures, and this > > needs buy-in from everyone. Personally, I think it is a pretty useless > > stat. > > When we started this patch series, it was a member of the kvm_run > struct. I made this a stat based on the feedback I received from the > reviews. If you think otherwise, I can move it back to where it was. I'm certainly totally opposed to stats that don't have a clear use case. People keep piling random stats that satisfy their pet usage, and this only bloats the various structures for no overall benefit other than "hey, it might be useful". This is death by a thousand cut. > > And while we're talking about pages_dirtied, I really dislike the > > WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over? > > Shock, horror... > > Ack. I'll give it a thought but if you have any specific suggestion on > how I can make it better, kindly let me know. Thanks. What is the effect of counter overflowing? Why is it important to warn? What goes wrong? What could be changed to *avoid* this being an issue? M. -- Without deviation from the norm, progress is not possible.