From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E049EC77B61 for ; Sat, 22 Apr 2023 01:56:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233640AbjDVB4m (ORCPT ); Fri, 21 Apr 2023 21:56:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232989AbjDVB4j (ORCPT ); Fri, 21 Apr 2023 21:56:39 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33EEC1719 for ; Fri, 21 Apr 2023 18:56:38 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-51b41410216so1628533a12.0 for ; Fri, 21 Apr 2023 18:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682128597; x=1684720597; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=IQdNhpWZDysF1UahR095KnSwqZZsAIRpm1aTBOeoxU0=; b=WvBNdsrmV3zKyzEzVd5WJbwXsxfW67XDwZajqzJsoe5/qS8aKiZC25DdwamG22qL6P pOEeHuaZK2WXX6LGpiaQNIBtapa+enyHQ3U6So6xs1wWhEwuP6g5Y87o+vFl7Ncl0FqJ aYPy5R3UpftMUJ2Et77SQTLDXmIxV+NGmjW43VWWPCwUWy6VQ0sym7Y44JghDOXDlOqh r0btZ4Wmj5NknxwgPJ8hD/0A5NAUUs1K3/nu4gi0FhzTZfOz9uhzg5kr+7AprgOPq1Ig Qak78Ut7eio8UZFGwrEoZe02RLOwNMhl/Q2sfzgRh2JM3lFX5eEtNS6gMx0wsVKF04EG uhMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682128597; x=1684720597; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=IQdNhpWZDysF1UahR095KnSwqZZsAIRpm1aTBOeoxU0=; b=TIZ72+/xIlwZ0ElDoNjMqe4eVweZ3J7rGHzH/gP/2IchmaliCV3Z5g4s35gWGBD7rw BE5qBaJMtinIGtOvbGDplYYjcKk/gSHGCdt5YfBtQqtY0zyOC8s0GCvsCsyVC6rSlIDi Ai26MliMfxMaPjERv3CeVF7p8c6oEVb0JuVhHKv/tKBZF4ZITQHsXdd2vY2jxU9PkHsY IX7bNofIgvbQSkUpd0teewkWvU3BBOSBhSWHTCjSnFGniQb0MkH1mxxQTOCEnGLHpTa4 HGm5NgX5UX6EDR36btUqcCopY+n+/G5jY9X8L0qx3uEAE5IxszpB2FJsr/0jXgeVeJzR h15Q== X-Gm-Message-State: AAQBX9f5DieSCAwsGNvLQtAcQPJUw9OAOXQGwv4seNWlgSF/MhgVQYaN ttpq7PgLiWjiMMRczb62IFIRWwS0M18= X-Google-Smtp-Source: AKy350bA8nYVy6RhXf+a8EbMVXDTb5UqBhqquCV4X8ayUrrycoEk7bjzkdzHz6OFwze4Uue09C5oGDmek/0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:40c1:0:b0:50b:189d:d1a with SMTP id n184-20020a6340c1000000b0050b189d0d1amr1635019pga.10.1682128597618; Fri, 21 Apr 2023 18:56:37 -0700 (PDT) Date: Fri, 21 Apr 2023 18:56:35 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230421214946.2571580-1-seanjc@google.com> Message-ID: Subject: Re: [PATCH v2] KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated From: Sean Christopherson To: David Matlack Cc: Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Jeremi Piotrowski , Ben Gardon Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 21, 2023, David Matlack wrote: > On Fri, Apr 21, 2023 at 2:49=E2=80=AFPM Sean Christopherson wrote: > > > > void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm) > > { > > struct kvm_mmu_page *root; > > > > - lockdep_assert_held_write(&kvm->mmu_lock); > > - list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) { > > - if (!root->role.invalid && > > - !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) { > > + /* > > + * Note! mmu_lock isn't held when destroying the VM! There ca= n't be > > + * other references to @kvm, i.e. nothing else can invalidate r= oots, > > + * but walking the list of roots does need to be guarded agains= t roots > > + * being deleted by the asynchronous zap worker. > > + */ > > + rcu_read_lock(); > > + > > + list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) { >=20 > I see that roots are removed from the list with list_del_rcu(), so I > agree this should be safe. >=20 > KVM could, alternatively, acquire the mmu_lock in > kvm_mmu_uninit_tdp_mmu(), which would let us keep the lockdep > assertion and drop the rcu_read_lock() + comment. That might be worth > it in case someone accidentally adds a call to > kvm_tdp_mmu_invalidate_all_roots() without mmu_lock outside of VM > teardown. kvm_mmu_uninit_tdp_mmu() is not a particularly performance > sensitive path and adding the mmu_lock wouldn't add much overhead > anyway (it would block for at most a few milliseconds waiting for the > async work to reschedule). Heh, I actually started to ping you off-list to specifically discuss this o= ption, but then decided that not waiting those few milliseconds might be worthwhil= e for some use cases. I also couldn't quite convince myself that it would only b= e a few milliseconds, e.g. if the worker is zapping a fully populated 5-level root,= there are no other tasks scheduled on _its_ CPU, and CONFIG_PREEMPTION=3Dn (which= neuters rwlock_needbreak()). The other reason I opted for not taking mmu_lock is that, with the persiste= nt roots approach, I don't think it's actually strictly necessary for kvm_mmu_zap_al= l_fast() to invaliate roots while holding mmu_lock for write. Holding slots_lock en= sures that only a single task can be doing invalidations, versus the old model wh= ere putting the last reference to a root could happen just about anywhere. And allocating a new root and zapping from mmu_noitifiers requires holding mmu_= lock for write, so I _think_ we could getaway with holding mmu_lock for read. Maybe= . It's largely a moot point since kvm_mmu_zap_all_fast() needs to hold mmu_lo= ck for write anyways to play nice with the shadow MMU, i.e. I don't expect us to e= ver want to pursue a change in this area. But at the same time I was strugglin= g to write a comment explaining why the VM destruction path "had" to take mmu_lo= ck.