From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4236C2C9E for ; Thu, 23 Feb 2023 17:09:40 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-536c02ed619so133301817b3.8 for ; Thu, 23 Feb 2023 09:09:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=eacT4PM6bk+JAGYX8AtEreTJLTfD+mSJsjox9puZbiJhIzz/wIxHhoNqBFIAXYayu7 ix2AdCa9GlyQPz6wKG3LR8o4NT+280wasX8/AykGzPPWlG6F17mnxqyPEg7H8lCLwAiX eXH6zqI0g24xM5ZEDyA5U2YNv7s9H603+yWuiAeHry6HpCAAUQ5/Ypm3z9qWSoASZAJv GIl+kmJbNVHeS7mmPwXAdOExvjZqQ9ItW1uw3DrabrHDmj3KBFLmoGmyQYc+K7/u+5MN XBySfkomcCwXORKGamuW5yTLWSpURcpZFsI0uJbe36YJ/e/6tvZI8z6lN62S38fr6TXv yALQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=qIwjjohr0Wv0EiQMk7yO1Geaoo6Wj5JXYSoalFaxzvrMtG3chbpUhfIULPq0J/dwWH NWfntqv7BxHgMQj9Y8wdFOoW8G+NUUNl2CqZCacAZpPMUfVkuonDP26ZmW8zMKPC6i2/ 2PKpgsmUeeVerL/QrZTixjEmi6Ya675BlXod3YUNoN6yjQ9jzGib1wVaNtLvlYc6slKr 7cvsL3Ynsa/bVChPy5p3ZCTsPvmoVZ3fF8PBvs9BWpgq9yQtRqlmc/t06Y3xxGdWIojU HX5DrPWVsiO9d1sHzwc3eCF0WV9kt2Pt02ooCMOeWAYkXPz2WKmzlDg41+4gY8x9JB2j dtWg== X-Gm-Message-State: AO0yUKW/qZFlAWnkZpiz9gIN+LaZ7+djHqG/7vTN/204r7mJv/tKBEd7 2jwhwyicAr692lfX3VyI1FWHxg+F0Ns= X-Google-Smtp-Source: AK7set+o4nZnorUIfwBQUKqlnrrJeoYQJz7sG9NME5jj+i7H5Ea4MTUvMN+YCmQEaMe31ZnG3Fe5siKK8IQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:b60b:0:b0:52f:45a:5b00 with SMTP id u11-20020a81b60b000000b0052f045a5b00mr2611344ywh.2.1677172179211; Thu, 23 Feb 2023 09:09:39 -0800 (PST) Date: Thu, 23 Feb 2023 09:09:37 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> <20230217041230.2417228-3-yuzhao@google.com> Message-ID: Subject: Re: [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() From: Sean Christopherson To: Yu Zhao Cc: Andrew Morton , Paolo Bonzini , Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com Content-Type: text/plain; charset="us-ascii" On Wed, Feb 22, 2023, Yu Zhao wrote: > On Fri, Feb 17, 2023 at 9:27 AM Sean Christopherson wrote: > > > > On Thu, Feb 16, 2023, Yu Zhao wrote: > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index 6aaae18f1854..d2995c9e8f07 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -1367,6 +1367,12 @@ struct kvm_arch { > > > * the MMU lock in read mode + the tdp_mmu_pages_lock or > > > * the MMU lock in write mode > > > * > > > + * kvm_arch_test_clear_young() is a special case. It relies on two > > > > No, it's not. The TDP MMU already employs on RCU and CMPXCHG. > > It is -- you read it out of context :) Ah, the special case is that it's fully lockless. That's still not all that special, e.g. see kvm_tdp_mmu_walk_lockless_{begin,end}(). > * For reads, this list is protected by: > * the MMU lock in read mode + RCU or > * the MMU lock in write mode > * > * For writes, this list is protected by: > * the MMU lock in read mode + the tdp_mmu_pages_lock or > * the MMU lock in write mode > * > * kvm_arch_test_clear_young() is a special case. > ... > > struct list_head tdp_mmu_roots; > > > Just drop the > > entire comment. > > Let me move it into kvm_arch_test_clear_young(). No, I do not want kvm_arch_test_clear_young(), or any other one-off function, to be "special". I love the idea of a lockless walk, but I want it to be a formal, documented way to walk TDP MMU roots. I.e. add macro to go with for_each_tdp_mmu_root() and the yield-safe variants. /* blah blah blah */ #define for_each_tdp_mmu_root_lockless(_kvm, _root, _as_id) \ list_for_each_entry_rcu(_root, &kvm->arch.tdp_mmu_roots, link) \ if (refcount_read(&root->tdp_mmu_root_count) && \ kvm_mmu_page_as_id(_root) != _as_id) { \ } else > Also I want to be clear: > 1. We can't just focus on here and now; we need to consider the distant future. I 100% agree, but those words need to be backed up by actions. This series is littered with code that is not maintainable long term, e.g. open coding stuff that belongs in helpers and/or for which KVM already provides helpers, copy-pasting __kvm_handle_hva_range() instead of extending it to have a lockless option, poking directly into KVM from mm/ code, etc. I apologize for being so blunt. My intent isn't to be rude/snarky, it's to set very clear expectations for getting any of these changes merges. I asbolutely do want to land improvments to KVM's test+clear young flows, but it needs to be done in a way that is maintainable and doesn't saddle KVM with more tech debt. > 2. From my POV, "see the comments on ..." is like the index of a book. And my _very_ strong preference is to provide the "index" via code, not comments. > > Clearing a single bit doesn't need a CMPXCHG. Please weigh in on a relevant series > > that is modifying the aging flows[*], I want to have exactly one helper for aging > > TDP MMU SPTEs. > > > > [*] https://lore.kernel.org/all/20230211014626.3659152-5-vipinsh@google.com > > I'll take a look at that series. clear_bit() probably won't cause any > practical damage but is technically wrong because, for example, it can > end up clearing the A-bit in a non-leaf PMD. (cmpxchg will just fail > in this case, obviously.) Eh, not really. By that argument, clearing an A-bit in a huge PTE is also technically wrong because the target gfn may or may not have been accessed. The only way for KVM to clear a A-bit in a non-leaf entry is if the entry _was_ a huge PTE, but was replaced between the "is leaf" and the clear_bit(). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AEDAAC636D6 for ; Thu, 23 Feb 2023 17:10:41 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4PN00H48knz3cg0 for ; Fri, 24 Feb 2023 04:10:39 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20210112 header.b=eacT4PM6; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=flex--seanjc.bounces.google.com (client-ip=2607:f8b0:4864:20::1149; helo=mail-yw1-x1149.google.com; envelope-from=30533ywykdbqcyu73w08805y.w86527eh99w-xyf52cdc.8j5uvc.8b0@flex--seanjc.bounces.google.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.a=rsa-sha256 header.s=20210112 header.b=eacT4PM6; dkim-atps=neutral Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4PMzzD3wYcz2xJ4 for ; Fri, 24 Feb 2023 04:09:42 +1100 (AEDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536a5a0b6e3so153816067b3.10 for ; Thu, 23 Feb 2023 09:09:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=eacT4PM6bk+JAGYX8AtEreTJLTfD+mSJsjox9puZbiJhIzz/wIxHhoNqBFIAXYayu7 ix2AdCa9GlyQPz6wKG3LR8o4NT+280wasX8/AykGzPPWlG6F17mnxqyPEg7H8lCLwAiX eXH6zqI0g24xM5ZEDyA5U2YNv7s9H603+yWuiAeHry6HpCAAUQ5/Ypm3z9qWSoASZAJv GIl+kmJbNVHeS7mmPwXAdOExvjZqQ9ItW1uw3DrabrHDmj3KBFLmoGmyQYc+K7/u+5MN XBySfkomcCwXORKGamuW5yTLWSpURcpZFsI0uJbe36YJ/e/6tvZI8z6lN62S38fr6TXv yALQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=32acJwomesJE7K59hIKbYZE/5BYnQB1Edgs6I7Nyb0PvRXHRBt/zrJVapFKWrAB8HO OpSoJsazyyPyH5OiCcYteGXT72XLK+pqlFnh9TdyuUSaaw8RCFweUq71wwncV5YRWnxA IyC9aZKLX3X95+AFw1hlzs9yz4W1F72qA/4EfDJKA7U5p8euypTkjYYS3vgBVC48tQZK LsCnp9A92v3qW2Gotbp0/fJJa3YrpIwVnJMgOJLFQQPnKz2Im5/MZVEqoieIdXoA8u9d J9G34386meTxF/G/bjGMxIIDJnOe4oTzU5U/pIE/UE5s7kzNuanBn3kfehaWfo3lFM+n uGUA== X-Gm-Message-State: AO0yUKUc0S79df1HLa3qNLhtYI633JzwNLaYxIh2fHACPWX++wwPWq6b 6spwbv8U6uRSASwbLBsAVddMYd0g3qA= X-Google-Smtp-Source: AK7set+o4nZnorUIfwBQUKqlnrrJeoYQJz7sG9NME5jj+i7H5Ea4MTUvMN+YCmQEaMe31ZnG3Fe5siKK8IQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:b60b:0:b0:52f:45a:5b00 with SMTP id u11-20020a81b60b000000b0052f045a5b00mr2611344ywh.2.1677172179211; Thu, 23 Feb 2023 09:09:39 -0800 (PST) Date: Thu, 23 Feb 2023 09:09:37 -0800 In-Reply-To: Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> <20230217041230.2417228-3-yuzhao@google.com> Message-ID: Subject: Re: [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() From: Sean Christopherson To: Yu Zhao Content-Type: text/plain; charset="us-ascii" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-mm@google.com, kvm@vger.kernel.org, Jonathan Corbet , Michael Larabel , x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, Paolo Bonzini , Andrew Morton , linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, Feb 22, 2023, Yu Zhao wrote: > On Fri, Feb 17, 2023 at 9:27 AM Sean Christopherson wrote: > > > > On Thu, Feb 16, 2023, Yu Zhao wrote: > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index 6aaae18f1854..d2995c9e8f07 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -1367,6 +1367,12 @@ struct kvm_arch { > > > * the MMU lock in read mode + the tdp_mmu_pages_lock or > > > * the MMU lock in write mode > > > * > > > + * kvm_arch_test_clear_young() is a special case. It relies on two > > > > No, it's not. The TDP MMU already employs on RCU and CMPXCHG. > > It is -- you read it out of context :) Ah, the special case is that it's fully lockless. That's still not all that special, e.g. see kvm_tdp_mmu_walk_lockless_{begin,end}(). > * For reads, this list is protected by: > * the MMU lock in read mode + RCU or > * the MMU lock in write mode > * > * For writes, this list is protected by: > * the MMU lock in read mode + the tdp_mmu_pages_lock or > * the MMU lock in write mode > * > * kvm_arch_test_clear_young() is a special case. > ... > > struct list_head tdp_mmu_roots; > > > Just drop the > > entire comment. > > Let me move it into kvm_arch_test_clear_young(). No, I do not want kvm_arch_test_clear_young(), or any other one-off function, to be "special". I love the idea of a lockless walk, but I want it to be a formal, documented way to walk TDP MMU roots. I.e. add macro to go with for_each_tdp_mmu_root() and the yield-safe variants. /* blah blah blah */ #define for_each_tdp_mmu_root_lockless(_kvm, _root, _as_id) \ list_for_each_entry_rcu(_root, &kvm->arch.tdp_mmu_roots, link) \ if (refcount_read(&root->tdp_mmu_root_count) && \ kvm_mmu_page_as_id(_root) != _as_id) { \ } else > Also I want to be clear: > 1. We can't just focus on here and now; we need to consider the distant future. I 100% agree, but those words need to be backed up by actions. This series is littered with code that is not maintainable long term, e.g. open coding stuff that belongs in helpers and/or for which KVM already provides helpers, copy-pasting __kvm_handle_hva_range() instead of extending it to have a lockless option, poking directly into KVM from mm/ code, etc. I apologize for being so blunt. My intent isn't to be rude/snarky, it's to set very clear expectations for getting any of these changes merges. I asbolutely do want to land improvments to KVM's test+clear young flows, but it needs to be done in a way that is maintainable and doesn't saddle KVM with more tech debt. > 2. From my POV, "see the comments on ..." is like the index of a book. And my _very_ strong preference is to provide the "index" via code, not comments. > > Clearing a single bit doesn't need a CMPXCHG. Please weigh in on a relevant series > > that is modifying the aging flows[*], I want to have exactly one helper for aging > > TDP MMU SPTEs. > > > > [*] https://lore.kernel.org/all/20230211014626.3659152-5-vipinsh@google.com > > I'll take a look at that series. clear_bit() probably won't cause any > practical damage but is technically wrong because, for example, it can > end up clearing the A-bit in a non-leaf PMD. (cmpxchg will just fail > in this case, obviously.) Eh, not really. By that argument, clearing an A-bit in a huge PTE is also technically wrong because the target gfn may or may not have been accessed. The only way for KVM to clear a A-bit in a non-leaf entry is if the entry _was_ a huge PTE, but was replaced between the "is leaf" and the clear_bit(). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0F25BC636D6 for ; Thu, 23 Feb 2023 17:10:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:Message-ID: References:Mime-Version:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=DZdkKs0ePl+aSX+7B6nda3v/98a7Nfv74W9Jt9v9B4M=; b=lmNsF9aF8ffVKl0sZoWzdUIT0k bBJnmPbZc6BLLGOW63cYt7/gowSNS+RpSiqQzNYh83itOF+/9OIlQgq38ScEPGK9zefYsYuvI/HTH fvq1Ds1xGvls4Z4x+OaQnBgKg9/w2oHNGMMi1RwTUz4FZY0aPkL3tNw+syamsv+LQ5BNOaAC+44Ge ccA2G7LSrI3SEN70dIsV9xWm9/D1H+XB3D2ndCZ6M01M9tmuq1iLTcviz4qOB/1LHL9AD8EH1BHRD IQGvHqQYljJMmtARdDh3tNLotquIO4LfUGBJL7jh9/Uywr1/OSpo1Y3OfXYUqTwTpZ16Q0FO9tzX1 lfMxVtzQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pVF6I-00HIgy-Fp; Thu, 23 Feb 2023 17:09:46 +0000 Received: from mail-yw1-x1149.google.com ([2607:f8b0:4864:20::1149]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1pVF6F-00HIgE-FF for linux-arm-kernel@lists.infradead.org; Thu, 23 Feb 2023 17:09:45 +0000 Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536b7eb9117so139912807b3.14 for ; Thu, 23 Feb 2023 09:09:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=eacT4PM6bk+JAGYX8AtEreTJLTfD+mSJsjox9puZbiJhIzz/wIxHhoNqBFIAXYayu7 ix2AdCa9GlyQPz6wKG3LR8o4NT+280wasX8/AykGzPPWlG6F17mnxqyPEg7H8lCLwAiX eXH6zqI0g24xM5ZEDyA5U2YNv7s9H603+yWuiAeHry6HpCAAUQ5/Ypm3z9qWSoASZAJv GIl+kmJbNVHeS7mmPwXAdOExvjZqQ9ItW1uw3DrabrHDmj3KBFLmoGmyQYc+K7/u+5MN XBySfkomcCwXORKGamuW5yTLWSpURcpZFsI0uJbe36YJ/e/6tvZI8z6lN62S38fr6TXv yALQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ESEYfucHSnOWaZatj5hNQPiLl3UqdNvq9+VGnD33XD0=; b=QlRTt4MxfPgE7s4bUOZJQYiKbLG9ZDEf6cbFcvqW7G1fzmgwc0S015IdLYi7MyaN0F 99lme8vjB/MCRtbl7l288ir1xW1vsaXOcgu4udpMRsaiA5N0qbHe9od6Qg1XSJxzHBbw XR3oMQsyQtCFCxX2SzoUBpJjxIiT/2cNxmJsAK+TeAEatBcP2OguLhrZQwGNnCEMBNAU ZmzuKGQ/LMRjj5fnfZXPIgdgoIvxPZquPfWHzVU0/KHE7IbJ5gvt5/llZHf12BxsOA6o zUWGwojrEsQjSaKIZ3eaYwf8FaGgpL9XzZIYpF8irNlt7i0QFp3iZSb4BSJoerRB88r8 Ij5Q== X-Gm-Message-State: AO0yUKXrmE6uhODMcOYZKAUESC7y0EE3Wyy4OvJ95fFzEDNwLBxSeQny zWoPxxGQ/34KQOORKZSaAb+8pLYQKco= X-Google-Smtp-Source: AK7set+o4nZnorUIfwBQUKqlnrrJeoYQJz7sG9NME5jj+i7H5Ea4MTUvMN+YCmQEaMe31ZnG3Fe5siKK8IQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:b60b:0:b0:52f:45a:5b00 with SMTP id u11-20020a81b60b000000b0052f045a5b00mr2611344ywh.2.1677172179211; Thu, 23 Feb 2023 09:09:39 -0800 (PST) Date: Thu, 23 Feb 2023 09:09:37 -0800 In-Reply-To: Mime-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> <20230217041230.2417228-3-yuzhao@google.com> Message-ID: Subject: Re: [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() From: Sean Christopherson To: Yu Zhao Cc: Andrew Morton , Paolo Bonzini , Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230223_090943_549838_ABE3EA3D X-CRM114-Status: GOOD ( 29.84 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Feb 22, 2023, Yu Zhao wrote: > On Fri, Feb 17, 2023 at 9:27 AM Sean Christopherson wrote: > > > > On Thu, Feb 16, 2023, Yu Zhao wrote: > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > > > index 6aaae18f1854..d2995c9e8f07 100644 > > > --- a/arch/x86/include/asm/kvm_host.h > > > +++ b/arch/x86/include/asm/kvm_host.h > > > @@ -1367,6 +1367,12 @@ struct kvm_arch { > > > * the MMU lock in read mode + the tdp_mmu_pages_lock or > > > * the MMU lock in write mode > > > * > > > + * kvm_arch_test_clear_young() is a special case. It relies on two > > > > No, it's not. The TDP MMU already employs on RCU and CMPXCHG. > > It is -- you read it out of context :) Ah, the special case is that it's fully lockless. That's still not all that special, e.g. see kvm_tdp_mmu_walk_lockless_{begin,end}(). > * For reads, this list is protected by: > * the MMU lock in read mode + RCU or > * the MMU lock in write mode > * > * For writes, this list is protected by: > * the MMU lock in read mode + the tdp_mmu_pages_lock or > * the MMU lock in write mode > * > * kvm_arch_test_clear_young() is a special case. > ... > > struct list_head tdp_mmu_roots; > > > Just drop the > > entire comment. > > Let me move it into kvm_arch_test_clear_young(). No, I do not want kvm_arch_test_clear_young(), or any other one-off function, to be "special". I love the idea of a lockless walk, but I want it to be a formal, documented way to walk TDP MMU roots. I.e. add macro to go with for_each_tdp_mmu_root() and the yield-safe variants. /* blah blah blah */ #define for_each_tdp_mmu_root_lockless(_kvm, _root, _as_id) \ list_for_each_entry_rcu(_root, &kvm->arch.tdp_mmu_roots, link) \ if (refcount_read(&root->tdp_mmu_root_count) && \ kvm_mmu_page_as_id(_root) != _as_id) { \ } else > Also I want to be clear: > 1. We can't just focus on here and now; we need to consider the distant future. I 100% agree, but those words need to be backed up by actions. This series is littered with code that is not maintainable long term, e.g. open coding stuff that belongs in helpers and/or for which KVM already provides helpers, copy-pasting __kvm_handle_hva_range() instead of extending it to have a lockless option, poking directly into KVM from mm/ code, etc. I apologize for being so blunt. My intent isn't to be rude/snarky, it's to set very clear expectations for getting any of these changes merges. I asbolutely do want to land improvments to KVM's test+clear young flows, but it needs to be done in a way that is maintainable and doesn't saddle KVM with more tech debt. > 2. From my POV, "see the comments on ..." is like the index of a book. And my _very_ strong preference is to provide the "index" via code, not comments. > > Clearing a single bit doesn't need a CMPXCHG. Please weigh in on a relevant series > > that is modifying the aging flows[*], I want to have exactly one helper for aging > > TDP MMU SPTEs. > > > > [*] https://lore.kernel.org/all/20230211014626.3659152-5-vipinsh@google.com > > I'll take a look at that series. clear_bit() probably won't cause any > practical damage but is technically wrong because, for example, it can > end up clearing the A-bit in a non-leaf PMD. (cmpxchg will just fail > in this case, obviously.) Eh, not really. By that argument, clearing an A-bit in a huge PTE is also technically wrong because the target gfn may or may not have been accessed. The only way for KVM to clear a A-bit in a non-leaf entry is if the entry _was_ a huge PTE, but was replaced between the "is leaf" and the clear_bit(). _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel