From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE858C433F5 for ; Thu, 18 Nov 2021 17:46:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D008761A56 for ; Thu, 18 Nov 2021 17:46:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232543AbhKRRtx (ORCPT ); Thu, 18 Nov 2021 12:49:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231787AbhKRRtw (ORCPT ); Thu, 18 Nov 2021 12:49:52 -0500 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C00AC061574 for ; Thu, 18 Nov 2021 09:46:52 -0800 (PST) Received: by mail-pf1-x433.google.com with SMTP id n85so6752374pfd.10 for ; Thu, 18 Nov 2021 09:46:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=dsuS3eNeB8b/qtXnY2xvDYGXR2aF7cv/Utbyr7HwpZA=; b=kmIXIDpZKfMR/NLa0O906BLDK/RjljRK9R6Sev/H8rW+mo9l8vc+XFwam2QfYFDf8d rLqkSbqOnHSQDEVr59nJeNE8cUzWq/HdRNJABphWTVZQu/EaR3ZyH+7cxI4ZwEc/fS3w s3s1qUI2fiLk4gfzk9HYv68VEIz2xeswu91VDOTKDHABkKEvFLUv+pecNx7AWl9vZM2w bD5S/YBIjPSUeKp46q4ImDBWxzkKDAZRRotPTqdCgy2CeSu1XFieeDoqQgzxcVH9Jt5t +Ojltgu8EpXRcoJtBLduaFRD6AHlEBsLm82OIampANd1NI5Y425dQ+4zmCBt1ttIZc/k Bh6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=dsuS3eNeB8b/qtXnY2xvDYGXR2aF7cv/Utbyr7HwpZA=; b=Kfb1TFL58UYpslT+D7EKdAgX8Wot448AqUV+v76CI6rst3cCfXTBn60ZTFqGwdePxZ BU5N0ijX0LiKLTXNOwCoIcR49HDBczHNEqYag8V4QhcjslOtDge0UIuzvz/RfjduZBKR vyoXcvMwajM/xKDazeQRZb8omECFPzx8IF6ZY2zcmTyfNo/qazW/0d9Xl0wX6WajBc5D kPAqN09D+VxHKjnJTI1m94UKHdlNNriv0df6XV5SR9U33Y5YfmTYGWQqezUFllodQ0Cg hgoMqLAT66TY7llQYo6WTChwpsKxpjw/tDLagfxMivz4JtwxVZkxoXL/HVbjcBk8cWog Q8rA== X-Gm-Message-State: AOAM531bBOH2Vnb85l/btZmFiPpjEnyaiCYVIdN8HeYuhgArhCDWJCzI yHRgEYbewoE7RaRnxnIwgNQcsg== X-Google-Smtp-Source: ABdhPJwn1QMptzOpvmAEQGwbpEHuiPLZqhKPlaPSu9L1Vt2ay/FTo0QEguavjO4yD/S14Xlcay5lyA== X-Received: by 2002:a63:1343:: with SMTP id 3mr12737422pgt.326.1637257611806; Thu, 18 Nov 2021 09:46:51 -0800 (PST) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id j17sm216834pgh.85.2021.11.18.09.46.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Nov 2021 09:46:51 -0800 (PST) Date: Thu, 18 Nov 2021 17:46:47 +0000 From: Sean Christopherson To: Shivam Kumar Cc: pbonzini@redhat.com, kvm@vger.kernel.org Subject: Re: [PATCH 0/6] KVM: Dirty Quota-Based VM Live Migration Auto-Converge Message-ID: References: <20211114145721.209219-1-shivam.kumar1@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211114145721.209219-1-shivam.kumar1@nutanix.com> Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Sun, Nov 14, 2021, Shivam Kumar wrote: > One possible approach to distributing the overall scope of dirtying for a > dirty quota interval is to equally distribute it among all the vCPUs. This > approach to the distribution doesn't make sense if the distribution of > workloads among vCPUs is skewed. So, to counter such skewed cases, we > propose that if any vCPU doesn't need its quota for any given dirty > quota interval, we add this quota to a common pool. This common pool (or > "common quota") can be consumed on a first come first serve basis > by all vCPUs in the upcoming dirty quota intervals. Why not simply use a per-VM quota in combination with a percpu_counter to avoid bouncing the dirty counter? > Design > ---------- > ---------- > > Initialization > Feedback that applies to all patches: > vCPUDirtyQuotaContext keeps the dirty quota context for each vCPU. It keeps CamelCase is very frowned upon, please use whatever_case_this_is_called. The SOB chains are wrong. The person physically posting the patches needs to have their SOB last, as they are the person who last handled the patches. Co-developed-by: Anurag Madnawat Signed-off-by: Anurag Madnawat Signed-off-by: Shivam Kumar Signed-off-by: Shaju Abraham Signed-off-by: Manish Mishra These needs a Co-developed-by. The only other scenario is that you and Anurag wrote the patches, then handed them off to Shaju, who sent them to Manish, who sent them back to you for posting. I highly doubt that's the case, and if so, I would hope you've done due diligence to ensure what you handed off is the same as what you posted, i.e. the SOB chains for Shaju and Manish can be omitted. In general, please read through most of the stuff in Documentation/process. > the number of pages the vCPU has dirtied (dirty_counter) in the ongoing > dirty quota interval, and the maximum number of dirties allowed for the > vCPU (dirty_quota) in the ongoing dirty quota interval. > > struct vCPUDirtyQuotaContext { > u64 dirty_counter; > u64 dirty_quota; > }; > > The flag dirty_quota_migration_enabled determines whether dirty quota-based > throttling is enabled for an ongoing migration or not. > > > Handling page dirtying > > When the guest tries to dirty a page, it leads to a vmexit as each page is > write-protected. In the vmexit path, we increment the dirty_counter for the > corresponding vCPU. Then, we check if the vCPU has exceeded its quota. If > yes, we exit to userspace with a new exit reason KVM_EXIT_DIRTY_QUOTA_FULL. > This "quota full" event is further handled on the userspace side. > > > Please find the KVM Forum presentation on dirty quota-based throttling > here: https://www.youtube.com/watch?v=ZBkkJf78zFA > > > Shivam Kumar (6): > Define data structures for dirty quota migration. > Init dirty quota flag and allocate memory for vCPUdqctx. > Add KVM_CAP_DIRTY_QUOTA_MIGRATION and handle vCPU page faults. > Increment dirty counter for vmexit due to page write fault. > Exit to userspace when dirty quota is full. > Free vCPUdqctx memory on vCPU destroy. Freeing memory in a later patch is not an option. The purpose of splitting is to aid bisection and make the patches more reviewable, not to break bisection and confuse reviewers. In general, there are too many patches and things are split in weird ways, making this hard to review. This can probably be smushed to two patches: 1) implement the guts, 2) exposed to userspace and document. > Documentation/virt/kvm/api.rst | 39 +++++++++++++++++++ > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/Makefile | 3 +- > arch/x86/kvm/x86.c | 9 +++++ > include/linux/dirty_quota_migration.h | 52 +++++++++++++++++++++++++ > include/linux/kvm_host.h | 3 ++ > include/uapi/linux/kvm.h | 11 ++++++ > virt/kvm/dirty_quota_migration.c | 31 +++++++++++++++ I do not see any reason to add two new files for 84 lines, which I'm pretty sure we can trim down significantly in any case. Paolo has suggested creating files for the mm side of generic kvm, the helpers can go wherever that lands. > virt/kvm/kvm_main.c | 56 ++++++++++++++++++++++++++- > 9 files changed, 203 insertions(+), 2 deletions(-) > create mode 100644 include/linux/dirty_quota_migration.h > create mode 100644 virt/kvm/dirty_quota_migration.c As for the design, allocating a separate page for 16 bytes is wasteful and adds complexity that I don't think is strictly necessary. Assuming the quota isn't simply a per-VM thing.... Rather than have both the count and the quote writable by userspace, what about having KVM_CAP_DIRTY_QUOTA_MIGRATION (renamed to just KVM_CAP_DIRTY_QUOTA, because dirty logging can technically be used for things other than migration) define a default, per-VM dirty quota, that is snapshotted by each vCPU on creation. The ioctl() would need to be rejected if vCPUs have been created, but it already needs something along those lines because currently it has a TOCTOU race and can also race with vCPU readers. Anyways, vCPUs snapshot a default quota on creation, and then use struct kvm_run to update the quota upon return from userspace after KVM_EXIT_DIRTY_QUOTA_FULL instead of giving userspace free reign to change it the quota at will. There are a variety of ways to leverage kvm_run, the simplest I can think of would be to define the ABI such that calling KVM_RUN with "exit_reason == KVM_EXIT_DIRTY_QUOTA_FULL" would trigger an update. That would do the right thing even if userspace _doesn't_ update the count/quota, as KVM would simply copy back the original quota/count and exit back to userspace. E.g. diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 78f0719cc2a3..d4a7d1b7019e 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -487,6 +487,11 @@ struct kvm_run { unsigned long args[6]; unsigned long ret[2]; } riscv_sbi; + /* KVM_EXIT_DIRTY_QUOTA_FULL */ + struct { + u64 dirty_count; + u64 dirty_quota; + } /* Fix the size of the union. */ char padding[256]; }; Side topic, it might make sense to have the counter be a stat, the per-vCPU dirty rate could be useful info even if userspace isn't using quotas.