From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1A3BC38147 for ; Wed, 18 Jan 2023 17:36:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230400AbjARRg0 (ORCPT ); Wed, 18 Jan 2023 12:36:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbjARRgZ (ORCPT ); Wed, 18 Jan 2023 12:36:25 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62285366A5 for ; Wed, 18 Jan 2023 09:36:24 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id z4-20020a17090a170400b00226d331390cso2888799pjd.5 for ; Wed, 18 Jan 2023 09:36:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=a4q1xtD/CKeMJmqJVdndBp4lLHmpRzPtOcKZfvWbSM8=; b=IqhVEgZYxeoqosmDrZw0VHcez/IWLXpDhpBV/8crQnWyCyw+YtF6atjKQ0KTCsY/tb kiue/gcjDKIc8xRpCUGofqg+ybpedxX7giMLGocFT4Hp0F9aPwnteOFVvz6GOYUKlLuc gP5kzL9C/n9N8yqQEsMBhzMA+yC16QKARYH0oxlkoKJgVrbrkOz0QSzAUItKBKlXB7wA Lan65hXI7kmWH+xgSIqwqL73ZDcdbX6EyallD/VjWK7saodHhzFCy+I1fPtx1PhtdDQ+ PQmrHWMHtKIkbkRqXNF07+SNQN8BOEY3Wst1+siC7nwXrr2BkVqEuLSK75giYJmjoCof WqVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=a4q1xtD/CKeMJmqJVdndBp4lLHmpRzPtOcKZfvWbSM8=; b=KQj1BlAipyO1JTNnbqG9/8EgY8LIeXuXjHSDxVd0lXEpxXm3KQgU+kE5ZWaf05EJ1V N8Be+fhlXLhc1i0LfZqXmWihKgYSBFCIYBK3LOajD9XY15b4SWRkqToqS3KunRnu9FgN b0o4pslfIGQGGQ+bS6G8FunvZeEfD/4gIeuuofFvTyw+7S0u+Se1Sr8NjJ9OCO1C636w vb81Bi1MynEtDq/CrrENX2xP8A0HdUyEXehqNd14k67qlnpD9gnQjbMV41tTHn/fooGk AqWtIGc7AnXXXNZhI+MjRigT6H5cx6r0zEVaj3So4uVpld6bCaugjS+7gsCMtKhLRjCx 1Y7w== X-Gm-Message-State: AFqh2koYuDvT3xADS5H2VG0yyZq3cEHS3NrnfbnsRzo7L8rXHiwiIE/c s7oaC/8aOmvMSFahLEYzHC4G7MKzTYfSSQDK X-Google-Smtp-Source: AMrXdXt3mAzgoOCRxsiIt2Jb+Ic9VnnOQcLmwqxkvTu7BgYy3xQwH+s2Ogh3uKQJZ2dbF4Xt1c5+Lw== X-Received: by 2002:a17:90a:fd12:b0:226:5758:a57f with SMTP id cv18-20020a17090afd1200b002265758a57fmr3223912pjb.2.1674063383696; Wed, 18 Jan 2023 09:36:23 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id q93-20020a17090a4fe600b00229094aabd0sm1592729pjh.35.2023.01.18.09.36.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Jan 2023 09:36:23 -0800 (PST) Date: Wed, 18 Jan 2023 17:36:19 +0000 From: Sean Christopherson To: Mingwei Zhang Cc: Vipin Sharma , pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches Message-ID: References: <20221222023457.1764-1-vipinsh@google.com> <20221222023457.1764-2-vipinsh@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Tue, Jan 03, 2023, Mingwei Zhang wrote: > On Tue, Jan 3, 2023 at 5:00 PM Vipin Sharma wrote: > > > I think the mmu_cache allocation and deallocation may cause the usage > > > of GFP_ATOMIC (as observed by other reviewers as well). Adding a new > > > lock would definitely sound like a plan, but I think it might affect > > > the performance. Alternatively, I am wondering if we could use a > > > mmu_cache_sequence similar to mmu_notifier_seq to help avoid the > > > concurrency? > > > > > > > Can you explain more about the performance impact? Each vcpu will have > > its own mutex. So, only contention will be with the mmu_shrinker. This > > shrinker will use mutex_try_lock() which will not block to wait for > > the lock, it will just pass on to the next vcpu. While shrinker is > > holding the lock, vcpu will be blocked in the page fault path but I > > think it should not have a huge impact considering it will execute > > rarely and for a small time. > > > > > Similar to mmu_notifier_seq, mmu_cache_sequence should be protected by > > > mmu write lock. In the page fault path, each vcpu has to collect a > > > snapshot of mmu_cache_sequence before calling into > > > mmu_topup_memory_caches() and check the value again when holding the > > > mmu lock. If the value is different, that means the mmu_shrinker has > > > removed the cache objects and because of that, the vcpu should retry. > > > > > > > Yeah, this can be one approach. I think it will come down to the > > performance impact of using mutex which I don't think should be a > > concern. > > hmm, I think you are right that there is no performance overhead by > adding a mutex and letting the shrinker using mutex_trylock(). The > point of using a sequence counter is to avoid the new lock, since > introducing a new lock will increase management burden. No, more locks doesn't necessarily mean higher maintenance cost. More complexity definitely means more maintenance, but additional locks doesn't necessarily equate to increased complexity. Lockless algorithms are almost always more difficult to reason about, i.e. trying to use a sequence counter for this case would be more complex than using a per-vCPU mutex. The only complexity in adding another mutex is understanding why an additional lock necessary, and IMO that's fairly easy to explain/understand (the shrinker will almost never succeed if it has to wait for vcpu->mutex to be dropped). > So unless it is necessary, we probably should choose a simple solution first. > > In this case, I think we do have such a choice and since a similar > mechanism has already been used by mmu_notifiers. The mmu_notifier case is very different. The invalidations affect the entire VM, notifiers _must_ succeed, may or may not allowing sleeping, the readers (vCPUs) effectively need protection while running in the guest, and practically speaking holding a per-VM (or global) lock of any kind while a vCPU is running in the guest is not viable, e.g. even holding kvm->srcu is disallowed. In other words, using a traditional locking scheme to serialize guest accesses with host-initiated page table (or memslot) updates is simply not an option.