From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C1E34C32772 for ; Tue, 23 Aug 2022 19:18:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=WsEqlADfoMKhcV8dSK8B4xqWCxk4Zeicn33ihkIwBAE=; b=US2ohBuViWSZ4L ZLKEImcWQ500J6hz/VbV53xBdC6832Nt288WYRJowgMi0+04ptNHvG1ZPth7HB48Bf9iZm0iTpP+O TMKCQAYQlrMg7jij4xlYRd8uZ2cZiohdtwc1YePzztaIGQUCZiCNrbf4zjwPnV2D4m9SVmJ0hZe3i dZtEZSB63lJjb26jWVn5b2/jHqwAcKx/jDG44NYyGV422TMB20gCMSe/aYbnhJLQBdOydbscUUeRU 7ocjXbi2jUpcd5lr9jLuZIKYerffspSZ5S3H0xcL6dE7feKBKv/2qzGvL6W93tRppNO1rOKOnjn36 Gq89jpiGDbvhiQvc1IvQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oQZOj-008YXz-Tc; Tue, 23 Aug 2022 19:17:14 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oQZOf-008YXc-Pp for linux-arm-kernel@lists.infradead.org; Tue, 23 Aug 2022 19:17:11 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1CA60B820D3; Tue, 23 Aug 2022 19:17:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9B64C433C1; Tue, 23 Aug 2022 19:17:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1661282226; bh=fq3Ha77vc5Ypa6cpCTtel7vNoAw8OmOKU6kZ246w/xA=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=QnVz4fG5gwuIFr0FHxxiw7OBZ520UriTeNfUA/74w7LcZdpRdu/hI5E4MHcHAxQiu 0zWksqekczGyB35pvjWZso76Uc3iAtXkEe91P307IsHQSCPSoYBNFgIJ4p39EtlVbN vPRIaAE969DC53EyT/bd3nylA9e2gd+EShA2CJPVwpPgNrbEtDIW9jQq8KhXKNok/S oJ5buZ1be8Hhdk+teR5+DN5L4H5szV2POnl3ozEPm5slAsDr2XmBLmiLv/32qIPUj7 hddaaIsiz6TTfy7/3L9txOsfY39vDqH6oHa62u9cMH/u//qhBPYR2gzTrPyc0avr0P RO0tr6rirDnMw== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oQZOa-005HGi-Ai; Tue, 23 Aug 2022 20:17:04 +0100 Date: Tue, 23 Aug 2022 20:17:03 +0100 Message-ID: <87bksawz0w.wl-maz@kernel.org> From: Marc Zyngier To: Peter Xu Cc: Gavin Shan , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com, corbet@lwn.net, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, catalin.marinas@arm.com, will@kernel.org, shuah@kernel.org, seanjc@google.com, drjones@redhat.com, dmatlack@google.com, bgardon@google.com, ricarkol@google.com, zhenyzha@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v1 1/5] KVM: arm64: Enable ring-based dirty memory tracking In-Reply-To: References: <20220819005601.198436-1-gshan@redhat.com> <20220819005601.198436-2-gshan@redhat.com> <87lerkwtm5.wl-maz@kernel.org> <41fb5a1f-29a9-e6bb-9fab-4c83a2a8fce5@redhat.com> <87fshovtu0.wl-maz@kernel.org> <171d0159-4698-354b-8b2f-49d920d03b1b@redhat.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: peterx@redhat.com, gshan@redhat.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, pbonzini@redhat.com, corbet@lwn.net, james.morse@arm.com, alexandru.elisei@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, catalin.marinas@arm.com, will@kernel.org, shuah@kernel.org, seanjc@google.com, drjones@redhat.com, dmatlack@google.com, bgardon@google.com, ricarkol@google.com, zhenyzha@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220823_121710_165344_2F06B425 X-CRM114-Status: GOOD ( 39.37 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 23 Aug 2022 14:58:19 +0100, Peter Xu wrote: > > On Tue, Aug 23, 2022 at 03:22:17PM +1000, Gavin Shan wrote: > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > > index 986cee6fbc7f..0b41feb6fb7d 100644 > > > --- a/arch/arm64/kvm/arm.c > > > +++ b/arch/arm64/kvm/arm.c > > > @@ -747,6 +747,12 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu) > > > if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) > > > return kvm_vcpu_suspend(vcpu); > > > + > > > + if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu)) { > > > + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; > > > + trace_kvm_dirty_ring_exit(vcpu); > > > + return 0; > > > + } > > > } > > > return 1; > > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > > > index f4c2a6eb1666..08b2f01164fa 100644 > > > --- a/virt/kvm/dirty_ring.c > > > +++ b/virt/kvm/dirty_ring.c > > > @@ -149,6 +149,7 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring) > > > void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset) > > > { > > > + struct kvm_vcpu *vcpu = container_of(ring, struct kvm_vcpu, dirty_ring); > > > struct kvm_dirty_gfn *entry; > > > /* It should never get full */ > > > @@ -166,6 +167,9 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset) > > > kvm_dirty_gfn_set_dirtied(entry); > > > ring->dirty_index++; > > > trace_kvm_dirty_ring_push(ring, slot, offset); > > > + > > > + if (kvm_dirty_ring_soft_full(vcpu)) > > > + kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu); > > > } > > > struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset) > > > > > > > Ok, thanks for the details, Marc. I will adopt your code in next revision :) > > Note that there can be a slight difference with the old/new code, in that > an (especially malicious) userapp can logically ignore the DIRTY_RING_FULL > vmexit and keep kicking VCPU_RUN with the new code. > > Unlike the old code, the 2nd/3rd/... KVM_RUN will still run in the new code > until the next dirty pfn being pushed to the ring, then it'll request ring > full exit again. > > Each time it exits the ring grows 1. > > At last iiuc it can easily hit the ring full and trigger the warning at the > entry of kvm_dirty_ring_push(): > > /* It should never get full */ > WARN_ON_ONCE(kvm_dirty_ring_full(ring)); Hmmm, yes. Well spotted. > We did that because kvm_dirty_ring_push() was previously designed to not be > able to fail at all (e.g., in the old bitmap world we never will fail too). > We can't because we can't lose any dirty page or migration could silently > fail too (consider when we do user exit due to ring full and migration just > completed; there could be unsynced pages on src/dst). > > So even though the old approach will need to read kvm->dirty_ring_size for > every entrance which is a pity, it will avoid issue above. I don't think we really need this check on the hot path. All we need is to make the request sticky until userspace gets their act together and consumes elements in the ring. Something like: diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 986cee6fbc7f..e8ed5e1af159 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -747,6 +747,14 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu) if (kvm_check_request(KVM_REQ_SUSPEND, vcpu)) return kvm_vcpu_suspend(vcpu); + + if (kvm_check_request(KVM_REQ_RING_SOFT_FULL, vcpu) && + kvm_dirty_ring_soft_full(vcpu)) { + kvm_make_request(KVM_REQ_RING_SOFT_FULL, vcpu); + vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL; + trace_kvm_dirty_ring_exit(vcpu); + return 0; + } } return 1; However, I'm a bit concerned by the reset side of things. It iterates over the vcpus and expects the view of each ring to be consistent, even if userspace is hacking at it from another CPU. For example, I can't see what guarantees that the kernel observes the writes from userspace in the order they are being performed (the documentation provides no requirements other than "it must collect the dirty GFNs in sequence", which doesn't mean much from an ordering perspective). I can see that working on a strongly ordered architecture, but on something as relaxed as ARM, the CPUs may^Wwill aggressively reorder stuff that isn't explicitly ordered. I have the feeling that a CAS operation on both sides would be enough, but someone who actually understands how this works should have a look... Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel