From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1053C433EF for ; Thu, 26 May 2022 16:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243550AbiEZQQU (ORCPT ); Thu, 26 May 2022 12:16:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229780AbiEZQQS (ORCPT ); Thu, 26 May 2022 12:16:18 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0407564EA for ; Thu, 26 May 2022 09:16:15 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1979661CD9 for ; Thu, 26 May 2022 16:16:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 693C1C385A9; Thu, 26 May 2022 16:16:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653581773; bh=NvgMiO11Ml38NMl50/XvcBk5n1mSGj6kShlQq46LLVc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=PSvD7wAA/3rXmrSRQ8OK40OrnancbTtVd7gQlH8nr7D9K+A0003cldRL/2zKzBxWY FnevSd8v3WYfUqd+u6AAXgNvRtkn0WP5jBaZrKL3YWij+O37aATvl4Yo3Q3KV45GaR F9ID8dGGsB4TuAJzVIr1161PkEatQKm75iOF2/eBPoWn/cjWLLKf+vEPNpaCr95chX I7fSJI/ctePN1zHDlhgLtIDy31gIUPPR6qZfhaSp0COJZuuNwjXA/ISVyJjwL20GW4 EPHxZ9GAZaedtHAuYYAaGB4ZHs4aDCzxb2BU2qnR3oi8vb+53Kxz7a8wsp1GHw8Lzs aXge80GJ8/jBA== Received: from athedsl-4557779.home.otenet.gr ([94.70.87.219] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nuG9i-00Dt0c-OE; Thu, 26 May 2022 17:16:11 +0100 Date: Thu, 26 May 2022 17:16:08 +0100 Message-ID: <877d68mfqv.wl-maz@kernel.org> From: Marc Zyngier To: Sean Christopherson Cc: Shivam Kumar , pbonzini@redhat.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, Shaju Abraham , Manish Mishra , Anurag Madnawat Subject: Re: [PATCH v4 1/4] KVM: Implement dirty quota-based throttling of vcpus In-Reply-To: References: <20220521202937.184189-1-shivam.kumar1@nutanix.com> <20220521202937.184189-2-shivam.kumar1@nutanix.com> <87h75fmmkj.wl-maz@kernel.org> <878rqomnfr.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 94.70.87.219 X-SA-Exim-Rcpt-To: seanjc@google.com, shivam.kumar1@nutanix.com, pbonzini@redhat.com, james.morse@arm.com, borntraeger@linux.ibm.com, david@redhat.com, kvm@vger.kernel.org, shaju.abraham@nutanix.com, manish.mishra@nutanix.com, anurag.madnawat@nutanix.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, 26 May 2022 16:44:13 +0100, Sean Christopherson wrote: > > On Thu, May 26, 2022, Marc Zyngier wrote: > > > >> +{ > > > >> + struct kvm_run *run = vcpu->run; > > > >> + u64 dirty_quota = READ_ONCE(run->dirty_quota); > > > >> + u64 pages_dirtied = vcpu->stat.generic.pages_dirtied; > > > >> + > > > >> + if (!dirty_quota || (pages_dirtied < dirty_quota)) > > > >> + return 1; > > > > What happens when page_dirtied becomes large and dirty_quota has to > > > > wrap to allow further progress? > > > Every time the quota is exhausted, userspace is expected to set it to > > > pages_dirtied + new quota. So, pages_dirtied will always follow dirty > > > quota. I'll be sending the qemu patches soon. Thanks. > > > > Right, so let's assume that page_dirtied=0xffffffffffffffff (yes, I > > have dirtied that many pages). > > Really? Written that many bytes from a guest? Maybe. But actually > marked that many pages dirty in hardware, let alone in KVM? And on > a single CPU? > > By my back of the napkin math, a 4096 CPU system running at 16ghz > with each CPU able to access one page of memory per cycle would take > ~3 days to access 2^64 pages. > > Assuming a ridiculously optimistic ~20 cycles to walk page tables, > fetch the cache line from memory, insert into the TLB, and mark the > PTE dirty, that's still ~60 days to actually dirty that many pages > in hardware. > > Let's again be comically optimistic and assume KVM can somehow > propagate a dirty bit from hardware PTEs to the dirty bitmap/ring in > another ~20 cycles. That brings us to ~1200 days. > > But the stat is per vCPU, so that actually means it would take > ~13.8k years for a single vCPU/CPU to dirty 2^64 pages... running at > a ludicrous 16ghz on a CPU with latencies that are a likely an order > of magnitude faster than anything that exists today. Congratulations, you can multiply! ;-) It just shows that the proposed API is pretty bad, because instead of working as a credit, it works as a ceiling, based on a value that is dependent on the vpcu previous state (forcing userspace to recompute the next quota on each exit), and with undocumented, arbitrary limits as a bonus. I don't like it, and probably won't like it in 13.8k years either. M. -- Without deviation from the norm, progress is not possible.