From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2010C433FE for ; Mon, 7 Nov 2022 09:38:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 56C874B8BC; Mon, 7 Nov 2022 04:38:37 -0500 (EST) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@kernel.org Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 27atJz0rG-EO; Mon, 7 Nov 2022 04:38:36 -0500 (EST) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 0039F4B875; Mon, 7 Nov 2022 04:38:36 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 692034B866 for ; Mon, 7 Nov 2022 04:38:35 -0500 (EST) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hGdXnam7+ncR for ; Mon, 7 Nov 2022 04:38:30 -0500 (EST) Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id A83814B85E for ; Mon, 7 Nov 2022 04:38:30 -0500 (EST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 382CBB80ECE; Mon, 7 Nov 2022 09:38:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E64C1C433C1; Mon, 7 Nov 2022 09:38:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667813907; bh=H0rFkYnoaXV7s4LrmOspIHxtkdjVnhbZoGTomVEV1Zc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=XF5PWoV8e8s5fiyQ5u+IOc8+i6yg3DgrJgBPSJontSGyIWfRRZbwoVAXAbvjU6ZCw dWTkaVtZdr6+haicqvQjBJ8Qc9pExcs4tmlCs3Pmm60QxBUQGU9hZ8MAQ2QRlHZ+5o vg6qVrWJZaiEl0tUzY6IHqhpjlabw27TqXCPdUAhT0Ixfst0vP1Q9ypB/cbuXweioH PQWqBK/mrp/oWIPBTTvsHEDzlzS61Qmg0orsXLJDb2RHRRH05EJHlhQveWgp8aKerP UiUaA0X1xg8dHCp8mDvaVxYiJAsmAMbdawcj2rsWlcgmQYVMAw9djG7dhLRLrOghEr Y2HWkKzwV3dIQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oryaH-004LrT-EH; Mon, 07 Nov 2022 09:38:25 +0000 Date: Mon, 07 Nov 2022 09:38:24 +0000 Message-ID: <865yfrqf3j.wl-maz@kernel.org> From: Marc Zyngier To: Gavin Shan Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap In-Reply-To: <35d005f3-655a-88f5-2de3-848576a26e42@redhat.com> References: <20221104234049.25103-1-gshan@redhat.com> <20221104234049.25103-4-gshan@redhat.com> <87o7tkf5re.wl-maz@kernel.org> <87iljrg7vd.wl-maz@kernel.org> <35d005f3-655a-88f5-2de3-848576a26e42@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gshan@redhat.com, peterx@redhat.com, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, shuah@kernel.org, catalin.marinas@arm.com, andrew.jones@linux.dev, ajones@ventanamicro.com, bgardon@google.com, dmatlack@google.com, will@kernel.org, suzuki.poulose@arm.com, alexandru.elisei@arm.com, pbonzini@redhat.com, seanjc@google.com, oliver.upton@linux.dev, zhenyzha@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Cc: kvm@vger.kernel.org, catalin.marinas@arm.com, andrew.jones@linux.dev, dmatlack@google.com, will@kernel.org, shan.gavin@gmail.com, bgardon@google.com, kvmarm@lists.linux.dev, pbonzini@redhat.com, zhenyzha@redhat.com, shuah@kernel.org, kvmarm@lists.cs.columbia.edu, ajones@ventanamicro.com X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Sun, 06 Nov 2022 21:23:13 +0000, Gavin Shan wrote: > > Hi Peter and Marc, > > On 11/7/22 5:06 AM, Peter Xu wrote: > > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote: > >> On Sun, 06 Nov 2022 16:22:29 +0000, > >> Peter Xu wrote: > >>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote: > >>>>> +Note that the bitmap here is only a backup of the ring structure, and > >>>>> +normally should only contain a very small amount of dirty pages, which > >>>> > >>>> I don't think we can claim this. It is whatever amount of memory is > >>>> dirtied outside of a vcpu context, and we shouldn't make any claim > >>>> regarding the number of dirty pages. > >>> > >>> The thing is the current with-bitmap design assumes that the two logs are > >>> collected in different windows of migration, while the dirty log is only > >>> collected after the VM is stopped. So collecting dirty bitmap and sending > >>> the dirty pages within the bitmap will be part of the VM downtime. > >>> > >>> It will stop to make sense if the dirty bitmap can contain a large portion > >>> of the guest memory, because then it'll be simpler to just stop the VM, > >>> transfer pages, and restart on dest node without any tracking mechanism. > >> > >> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense > >> in general. It only makes sense if the source of the dirty pages is > >> limited to the vcpus, which is literally a corner case. Look at any > >> real machine, and you'll quickly realise that this isn't the case, and > >> that DMA *is* a huge source of dirty pages. > >> > >> Here, we're just lucky enough not to have much DMA tracking yet. Once > >> that happens (and I have it from people doing the actual work that it > >> *is* happening), you'll realise that the dirty ring story is of very > >> limited use. So I'd rather drop anything quantitative here, as this is > >> likely to be wrong. > > > > Is it a must that arm64 needs to track device DMAs using the same dirty > > tracking interface rather than VFIO or any other interface? It's > > definitely not the case for x86, but if it's true for arm64, then could the > > DMA be spread across all the guest pages? If it's also true, I really > > don't know how this will work.. > > > > We're only syncing the dirty bitmap once right now with the protocol. If > > that can cover most of the guest mem, it's same as non-live. If we sync it > > periodically, then it's the same as enabling dirty-log alone and the rings > > are useless. > > > > For vgic/its tables, the number of dirty pages can be huge in theory. However, > they're limited in practice. So I intend to agree with Peter that dirty-ring > should be avoided and dirty-log needs to be used instead when the DMA case > is supported in future. As Peter said, the small amount of dirty pages in > the bitmap is the condition to use it here. I think it makes sense to mention > it in the document. And again, I disagree. This API has *nothing* to do with the ITS. It is completely general purpose and should work with anything because it is designed for that. The problem is that you're considering that RING+BITMAP is a different thing from BITMAP alone when it comes to non-CPU traffic. It really isn't. We can't say "there will only be a few pages dirtied", because we simply don't know. If you really want a quantitative argument then say something like: "The use of the ring+bitmap combination is only beneficial if there is only very little memory that is dirtied by non-CPU agents. Consider using the stand-alone bitmap API if this isn't the case." which clearly puts the choice in the hand of the user. [...] > How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU > to clear the dirty bitmap after it's collected in this particular case. Peter said there is an undefined behaviour. I want to understand whether this is the case or not. QEMU is only one of the users of this stuff, as all the vendors have their own custom VMM, and they do things in funny ways. M. -- Without deviation from the norm, progress is not possible. _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3619028E8 for ; Mon, 7 Nov 2022 09:38:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E64C1C433C1; Mon, 7 Nov 2022 09:38:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667813907; bh=H0rFkYnoaXV7s4LrmOspIHxtkdjVnhbZoGTomVEV1Zc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=XF5PWoV8e8s5fiyQ5u+IOc8+i6yg3DgrJgBPSJontSGyIWfRRZbwoVAXAbvjU6ZCw dWTkaVtZdr6+haicqvQjBJ8Qc9pExcs4tmlCs3Pmm60QxBUQGU9hZ8MAQ2QRlHZ+5o vg6qVrWJZaiEl0tUzY6IHqhpjlabw27TqXCPdUAhT0Ixfst0vP1Q9ypB/cbuXweioH PQWqBK/mrp/oWIPBTTvsHEDzlzS61Qmg0orsXLJDb2RHRRH05EJHlhQveWgp8aKerP UiUaA0X1xg8dHCp8mDvaVxYiJAsmAMbdawcj2rsWlcgmQYVMAw9djG7dhLRLrOghEr Y2HWkKzwV3dIQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oryaH-004LrT-EH; Mon, 07 Nov 2022 09:38:25 +0000 Date: Mon, 07 Nov 2022 09:38:24 +0000 Message-ID: <865yfrqf3j.wl-maz@kernel.org> From: Marc Zyngier To: Gavin Shan Cc: Peter Xu , kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, shuah@kernel.org, catalin.marinas@arm.com, andrew.jones@linux.dev, ajones@ventanamicro.com, bgardon@google.com, dmatlack@google.com, will@kernel.org, suzuki.poulose@arm.com, alexandru.elisei@arm.com, pbonzini@redhat.com, seanjc@google.com, oliver.upton@linux.dev, zhenyzha@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap In-Reply-To: <35d005f3-655a-88f5-2de3-848576a26e42@redhat.com> References: <20221104234049.25103-1-gshan@redhat.com> <20221104234049.25103-4-gshan@redhat.com> <87o7tkf5re.wl-maz@kernel.org> <87iljrg7vd.wl-maz@kernel.org> <35d005f3-655a-88f5-2de3-848576a26e42@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gshan@redhat.com, peterx@redhat.com, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, shuah@kernel.org, catalin.marinas@arm.com, andrew.jones@linux.dev, ajones@ventanamicro.com, bgardon@google.com, dmatlack@google.com, will@kernel.org, suzuki.poulose@arm.com, alexandru.elisei@arm.com, pbonzini@redhat.com, seanjc@google.com, oliver.upton@linux.dev, zhenyzha@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Message-ID: <20221107093824.OSHupak14KMRt4YEgkjkMqs5MTw58bcZZ_6rj80CTio@z> On Sun, 06 Nov 2022 21:23:13 +0000, Gavin Shan wrote: > > Hi Peter and Marc, > > On 11/7/22 5:06 AM, Peter Xu wrote: > > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote: > >> On Sun, 06 Nov 2022 16:22:29 +0000, > >> Peter Xu wrote: > >>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote: > >>>>> +Note that the bitmap here is only a backup of the ring structure, and > >>>>> +normally should only contain a very small amount of dirty pages, which > >>>> > >>>> I don't think we can claim this. It is whatever amount of memory is > >>>> dirtied outside of a vcpu context, and we shouldn't make any claim > >>>> regarding the number of dirty pages. > >>> > >>> The thing is the current with-bitmap design assumes that the two logs are > >>> collected in different windows of migration, while the dirty log is only > >>> collected after the VM is stopped. So collecting dirty bitmap and sending > >>> the dirty pages within the bitmap will be part of the VM downtime. > >>> > >>> It will stop to make sense if the dirty bitmap can contain a large portion > >>> of the guest memory, because then it'll be simpler to just stop the VM, > >>> transfer pages, and restart on dest node without any tracking mechanism. > >> > >> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense > >> in general. It only makes sense if the source of the dirty pages is > >> limited to the vcpus, which is literally a corner case. Look at any > >> real machine, and you'll quickly realise that this isn't the case, and > >> that DMA *is* a huge source of dirty pages. > >> > >> Here, we're just lucky enough not to have much DMA tracking yet. Once > >> that happens (and I have it from people doing the actual work that it > >> *is* happening), you'll realise that the dirty ring story is of very > >> limited use. So I'd rather drop anything quantitative here, as this is > >> likely to be wrong. > > > > Is it a must that arm64 needs to track device DMAs using the same dirty > > tracking interface rather than VFIO or any other interface? It's > > definitely not the case for x86, but if it's true for arm64, then could the > > DMA be spread across all the guest pages? If it's also true, I really > > don't know how this will work.. > > > > We're only syncing the dirty bitmap once right now with the protocol. If > > that can cover most of the guest mem, it's same as non-live. If we sync it > > periodically, then it's the same as enabling dirty-log alone and the rings > > are useless. > > > > For vgic/its tables, the number of dirty pages can be huge in theory. However, > they're limited in practice. So I intend to agree with Peter that dirty-ring > should be avoided and dirty-log needs to be used instead when the DMA case > is supported in future. As Peter said, the small amount of dirty pages in > the bitmap is the condition to use it here. I think it makes sense to mention > it in the document. And again, I disagree. This API has *nothing* to do with the ITS. It is completely general purpose and should work with anything because it is designed for that. The problem is that you're considering that RING+BITMAP is a different thing from BITMAP alone when it comes to non-CPU traffic. It really isn't. We can't say "there will only be a few pages dirtied", because we simply don't know. If you really want a quantitative argument then say something like: "The use of the ring+bitmap combination is only beneficial if there is only very little memory that is dirtied by non-CPU agents. Consider using the stand-alone bitmap API if this isn't the case." which clearly puts the choice in the hand of the user. [...] > How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU > to clear the dirty bitmap after it's collected in this particular case. Peter said there is an undefined behaviour. I want to understand whether this is the case or not. QEMU is only one of the users of this stuff, as all the vendors have their own custom VMM, and they do things in funny ways. M. -- Without deviation from the norm, progress is not possible.