From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B661C2D0D0 for ; Mon, 16 Dec 2019 09:47:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 660EF20665 for ; Mon, 16 Dec 2019 09:47:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="haRxTYcb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727138AbfLPJrs (ORCPT ); Mon, 16 Dec 2019 04:47:48 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:32752 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726977AbfLPJrr (ORCPT ); Mon, 16 Dec 2019 04:47:47 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576489666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=x4F0S6F+puErRf/9qNGD3qSr4YnaGP1lnAVzdVRgZvU=; b=haRxTYcb+pPrHZlwLDOkXthIEeYXVmZXQAZh8YDqeXjnDxJ+5AyX0HT6mdoVVFUtBrsG3k V/oq40QpgvLlnE4pLdoGdbxa4RkjcXKqVOxF9f2t5wTaGVEERBSKKXkoXQMwZRbLoxgfhv E7HQ4+0Rb4sGWBmmUWm7nLxsM6pan+c= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-261-bPoOpwa_MnulqnxHRIoEoA-1; Mon, 16 Dec 2019 04:47:43 -0500 X-MC-Unique: bPoOpwa_MnulqnxHRIoEoA-1 Received: by mail-qk1-f197.google.com with SMTP id c202so4322495qkg.4 for ; Mon, 16 Dec 2019 01:47:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=x4F0S6F+puErRf/9qNGD3qSr4YnaGP1lnAVzdVRgZvU=; b=MMcj/0XyR68dqQeK+hIGIIbV94MmuFcdJYeu6/md8Swp5TFlOW+YGF79crjU7oJEQy Rn1rraiihDGODr6r1dsNJIklkROZB9CJ8hPOfCC0GUMD7mk/tY4t8p51A4ovyHr3E6xB ytqt0WmofToDDfWluTRugSNAx49nGr33swLuJ04YRe2MWejKGR/vNeHwAkyRTj263bPs 5MpBk+iob01ZNO3CueNPKaa2KPrI8oK+8pQSYFbizsPAlK+E18CJiR4/EvYKQT4ednOg roTtAJHDUt+QhKY/WNpfRKqSoBMueO9IMHNmJgcXfZCv+zuwuU9Yp8uEhctbSiF4MY7w PPfQ== X-Gm-Message-State: APjAAAUjzK3f5Pu2WNwl3uWTpHuB+eWeT5Pk2ic2qVdebuBeKjbxyRPZ 1n9Dmy4gTXg3SwQhgOCgpgavm6TDG9bH4r1BLvRpm1OSz9F7YeOURwQqdwDlrkTTFmbIKz2FJdj U5REt+81VktboYDybDx3nCeX/ X-Received: by 2002:aed:256f:: with SMTP id w44mr7069754qtc.331.1576489663040; Mon, 16 Dec 2019 01:47:43 -0800 (PST) X-Google-Smtp-Source: APXvYqxIZNYdabqmI0CG0aOJ0NQEhESVmckgU7vVqS63tN6mOzzA5VFGb+hnjE+Zgtxy3Azad9N/Jw== X-Received: by 2002:aed:256f:: with SMTP id w44mr7069748qtc.331.1576489662780; Mon, 16 Dec 2019 01:47:42 -0800 (PST) Received: from redhat.com (bzq-111-168-31-5.red.bezeqint.net. [31.168.111.5]) by smtp.gmail.com with ESMTPSA id r10sm5762779qkm.23.2019.12.16.01.47.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Dec 2019 01:47:41 -0800 (PST) Date: Mon, 16 Dec 2019 04:47:36 -0500 From: "Michael S. Tsirkin" To: Peter Xu Cc: Paolo Bonzini , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Sean Christopherson , "Dr . David Alan Gilbert" , Vitaly Kuznetsov Subject: Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking Message-ID: <20191216044619-mutt-send-email-mst@kernel.org> References: <20191129213505.18472-1-peterx@redhat.com> <20191129213505.18472-5-peterx@redhat.com> <20191211063830-mutt-send-email-mst@kernel.org> <20191211205952.GA5091@xz-x1> <20191211172713-mutt-send-email-mst@kernel.org> <46ceb88c-0ddd-0d9a-7128-3aa5a7d9d233@redhat.com> <20191215173302.GB83861@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191215173302.GB83861@xz-x1> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Dec 15, 2019 at 12:33:02PM -0500, Peter Xu wrote: > On Thu, Dec 12, 2019 at 01:08:14AM +0100, Paolo Bonzini wrote: > > >>> What depends on what here? Looks suspicious ... > > >> > > >> Hmm, I think maybe it can be removed because the entry pointer > > >> reference below should be an ordering constraint already? > > > > entry->xxx depends on ring->reset_index. > > Yes that's true, but... > > entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)]; > /* barrier? */ > next_slot = READ_ONCE(entry->slot); > next_offset = READ_ONCE(entry->offset); > > ... I think entry->xxx depends on entry first, then entry depends on > reset_index. So it seems fine because all things have a dependency? Is reset_index changed from another thread then? If yes then you want to read reset_index with READ_ONCE. That includes a dependency barrier. > > > > >>> what's the story around locking here? Why is it safe > > >>> not to take the lock sometimes? > > >> > > >> kvm_dirty_ring_push() will be with lock==true only when the per-vm > > >> ring is used. For per-vcpu ring, because that will only happen with > > >> the vcpu context, then we don't need locks (so kvm_dirty_ring_push() > > >> is called with lock==false). > > > > FWIW this will be done much more nicely in v2. > > > > >>>> + page = alloc_page(GFP_KERNEL | __GFP_ZERO); > > >>>> + if (!page) { > > >>>> + r = -ENOMEM; > > >>>> + goto out_err_alloc_page; > > >>>> + } > > >>>> + kvm->vm_run = page_address(page); > > >>> > > >>> So 4K with just 8 bytes used. Not as bad as 1/2Mbyte for the ring but > > >>> still. What is wrong with just a pointer and calling put_user? > > >> > > >> I want to make it the start point for sharing fields between > > >> user/kernel per-vm. Just like kvm_run for per-vcpu. > > > > This page is actually not needed at all. Userspace can just map at > > KVM_DIRTY_LOG_PAGE_OFFSET, the indices reside there. You can drop > > kvm_vm_run completely. > > I changed it because otherwise we use one entry of the padding, and > all the rest of paddings are a waste of memory because we can never > really use the padding as new fields only for the 1st entry which > overlaps with the indices. IMHO that could even waste more than 4k. > > (for now we only "waste" 4K for per-vm, kvm_run is already mapped so > no waste there, not to say potentially I still think we can use the > kvm_vm_run in the future) > > > > > >>>> + } else { > > >>>> + /* > > >>>> + * Put onto per vm ring because no vcpu context. Kick > > >>>> + * vcpu0 if ring is full. > > >>> > > >>> What about tasks on vcpu 0? Do guests realize it's a bad idea to put > > >>> critical tasks there, they will be penalized disproportionally? > > >> > > >> Reasonable question. So far we can't avoid it because vcpu exit is > > >> the event mechanism to say "hey please collect dirty bits". Maybe > > >> someway is better than this, but I'll need to rethink all these > > >> over... > > > > > > Maybe signal an eventfd, and let userspace worry about deciding what to > > > do. > > > > This has to be done synchronously. But the vm ring should be used very > > rarely (it's for things like kvmclock updates that write to guest memory > > outside a vCPU), possibly a handful of times in the whole run of the VM. > > I've summarized a list of callers who might dirty guest memory in the > other thread, it seems to me that even the kvm clock is using per-vcpu > contexts. > > > > > >>> KVM_DIRTY_RING_MAX_ENTRIES is not part of UAPI. > > >>> So how does userspace know what's legal? > > >>> Do you expect it to just try? > > >> > > >> Yep that's what I thought. :) > > > > We should return it for KVM_CHECK_EXTENSION. > > OK. I'll drop the versioning. > > -- > Peter Xu