From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9579FC61DA4 for ; Thu, 23 Feb 2023 17:28:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B86B6B0071; Thu, 23 Feb 2023 12:28:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 069206B0072; Thu, 23 Feb 2023 12:28:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4AE16B0073; Thu, 23 Feb 2023 12:28:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D53036B0071 for ; Thu, 23 Feb 2023 12:28:10 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AC511A0921 for ; Thu, 23 Feb 2023 17:28:10 +0000 (UTC) X-FDA: 80499239940.05.F618DD9 Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by imf07.hostedemail.com (Postfix) with ESMTP id E74D440003 for ; Thu, 23 Feb 2023 17:28:08 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sdd7hPKB; spf=pass (imf07.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677173289; a=rsa-sha256; cv=none; b=skgex8qJsAqNsAapkX1m2ed0S0HkR7tzB4ot77AfWPkmTeQHJWKE1umgZS6Rgve9LARFNy c9onr3DpqLWD61jr44cH8gX7Mlz8GFRLTVjGKsS3m1Sqtmy1v7gF9LBr1H7/OqidXy41nR WkImX9sOwshQarNwCg7qJLpP2wdqL9M= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sdd7hPKB; spf=pass (imf07.hostedemail.com: domain of yuzhao@google.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677173288; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=31utYDCC/yFLRWwC2RwKHXn6KsflDX7hvn7iHrUw04E=; b=znhmRzTnoSBZRJYS6kG6cTxsb5VIz/qZPRnrA3LuLsVBYAN83dsy75ypIsB3vZB2nz+Rk8 FMtFbkUjCI8mQwIBV1ZDSd+UoypCg3UWyck793DVduUpQpxNc9enVOzXn5XO0XIR0nXjug pavAYwhkYgWt2xXBG9g7q4tbtYsZevE= Received: by mail-vs1-f54.google.com with SMTP id o6so16973425vsq.10 for ; Thu, 23 Feb 2023 09:28:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=31utYDCC/yFLRWwC2RwKHXn6KsflDX7hvn7iHrUw04E=; b=Sdd7hPKBaEt1PTDPKGEhKh0cFLRe2Yx/VuUuv6kAhuhnM7to0TDCYi0R2J/otk8Pqf WFH1iwbnoqpP/v6V7RkdAn0TCtMw+bMz5fesA/ftTwiR+QHmhj3jPqvv34rbMtH5k4mx LVXXTy4uUcoL8wRogYukxUCPXiCr52gBF9hpwOqQXVG//efJB+iA/i1h5bRi46cz85UA QDpWpWIYko0qEJvG3g8TiGlELNr7ETuVi9loLgEJ1ij4s7YECfz3SGeC0/wkvXK7pfdA 6sIAcR+uGf57sDBqQ/Wl/1ZePGl/tOoVcoXWej/gPVtiCBhyJJkY59Si4B+jFdVsUklV jhtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=31utYDCC/yFLRWwC2RwKHXn6KsflDX7hvn7iHrUw04E=; b=WUbh1aLP5AeJbUu1In4f7xpgyJN0TfDg3hOfMqne5UHUWP2cDP3E/6k0+XhReh36/c 1b0bEhvX5UkvDPyKorJ2CejpOixN1C5DS2lLs29fU9ecKn7axOU+UwDq++f+DWxbMy9Y tHcUn+kZlN8e+UP9WpNWvVlVYEnr63TZoIKRhF7eLCtx78duE0Sk/Fc/TtsZ4JU43DAt 3TN3MDvXBxtklzBvQEeubfSq/VHSyG+LnbrTu4N7M0/h+oHB1dizkkB8Eip0ZcHodBpa gcboLiTUwvjlP0F5y6M+ojoLx1RXyNpMzOaDKOI7qIP2DzjrfYrjjRVkqveoO1/r5Poi NdCQ== X-Gm-Message-State: AO0yUKW7vCN/lwk4szCNzgxnopMfAH8PKgL/zPEUaP0VLCr61jyBpRhp /9vhSWBtSkwaxTzDxvdffEkrlms6M6iC1gVIiMMikw== X-Google-Smtp-Source: AK7set9ipf4JKgksfIPCGz7YusAkR8fyQ5wmRY52KwjPqGlUX+CfhrQnWSITtgynuYdihY0faOT9Lz6HbOuMIKjb8/I= X-Received: by 2002:ab0:38d3:0:b0:67a:2833:5ceb with SMTP id l19-20020ab038d3000000b0067a28335cebmr3475904uaw.0.1677173287566; Thu, 23 Feb 2023 09:28:07 -0800 (PST) MIME-Version: 1.0 References: <20230217041230.2417228-1-yuzhao@google.com> <20230217041230.2417228-3-yuzhao@google.com> In-Reply-To: From: Yu Zhao Date: Thu, 23 Feb 2023 10:27:31 -0700 Message-ID: Subject: Re: [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() To: Sean Christopherson Cc: Andrew Morton , Paolo Bonzini , Jonathan Corbet , Michael Larabel , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-mm@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: E74D440003 X-Rspamd-Server: rspam01 X-Stat-Signature: agbqkehntdcss6d5isj7wuo8jnuzwg5w X-HE-Tag: 1677173288-511476 X-HE-Meta: U2FsdGVkX1/CCL0EgkYmivSL31X9xHnWlAZJ913eEN90XCbJDiiyF12qKzGq6ckmsvUiuPFhgTqwEqE/PSHK/XJR98BELFfmoTDHo0FLFwS8H7HggrkizAnJkOrNWUyrv5ut4KYbyzRnkfHVvxCzlHfCrn1LQYEoDpbBFKaPcrIrdMxn/4sXQZyEL54rPldD6juC8Z8KEz7XaJOgc9RCCCui63pEzR3L/AbO3rp5kODKmL5RpQa4obKSsFxS1c+6TMzW6WqJg6zNYQ+pbmWezcDwBgC04bpAjHXUFBHdKcknsZ5T668I2sFEFJ2HHXQB/4gQz5mKBf5rVzeLNqOCkyn4/fFclrb8Ftq/JWfFEbCC35s/+bgCpCmfk41Om3+54VnUIe9ZGG9kAVkwoZdgtDpneIbt0S1duQlQoO0iP4mraV/gyOf2RWpvgFI+b0WZZGzVTzZU2yPksNj2Ww/FYiBLgiT9rwe4QdQyPSxPBivCXAW+mKlwc4QDFvz1BBpIZ3doZStW5I4Hmp6W5mS5LWqt13EDrXVZnQSmA6Q7npItupfTZ6E0qU4tMT7lzw6oBCYakRoJH8CPT84yHmIMHAyUiLrdPMEjaWZhor+YXYoBJ7cujeLl3JDO8aquaCEAVVT8lPG0svjAwjJ1U19RRjSuVb5A048e1mLiBR0+46Y1/5Itv/GXihFB10+fVJYzL1cz4DeROdSwYP2kcI4OeMMsa4J/SGxUX4qkDuQ++PVl7iUxfWlhyfH+MQj+6opVGOu2JTn8qOiTmwRS2vFrjnJEnys8wMa7RJdYwe2zet4IRocGWpHHYoKJHnBDtCKYntnki8FPdc1J3hno4Vib4/O7g1eSWWzgZiJYSy92Z4Mv5tzaZ1eYz+kjgC1069zMm/eeJSFXy50Jbx5nxOypDrN0xU1LBsxd6mVZEJ+yLgVSmhqc4G8ROTY8vO5fTJiVUKWZ3ots/ktkajxQAYM ck4frCX4 FEnRBbehGAhAeR417ajK1nNx5bsLMAhvmSn4dKJk4sFfc3wZQLSjyqdP3pYBEHtkIRD9WoYZUUjZl8ldP2fZrfzMFblm3zU3TlH9m6ljyhSUFA4nBgcoN/tfgNHoAxJyQS5XlxjeVC9Fr6uHSn6Pminj4Ln/QWnnKbqR4/XFnQOHv2sptaQ7sxNlu8yOc6Y21n5sTlGwuJ2pAKy/jVd2kzLnQJE9WfNAUM4WYcAwNeg+gxa4IDvMGBYgrbRAM4AVJL7ry5grRL68ebZzGklNJEKMOZ33pY2VViXczmO+V2FDCF/3QCE2ofnt21C2sKHDv4vMe3Y0t0XTdhqXcAuJVhupCnhTGCOv3BgkSHS8lIBzjGpuz4Hfanpda0Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 23, 2023 at 10:09=E2=80=AFAM Sean Christopherson wrote: > > On Wed, Feb 22, 2023, Yu Zhao wrote: > > On Fri, Feb 17, 2023 at 9:27 AM Sean Christopherson = wrote: > > > > > > On Thu, Feb 16, 2023, Yu Zhao wrote: > > > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm= /kvm_host.h > > > > index 6aaae18f1854..d2995c9e8f07 100644 > > > > --- a/arch/x86/include/asm/kvm_host.h > > > > +++ b/arch/x86/include/asm/kvm_host.h > > > > @@ -1367,6 +1367,12 @@ struct kvm_arch { > > > > * the MMU lock in read mode + the tdp_mmu_pages_lock or > > > > * the MMU lock in write mode > > > > * > > > > + * kvm_arch_test_clear_young() is a special case. It relies o= n two > > > > > > No, it's not. The TDP MMU already employs on RCU and CMPXCHG. > > > > It is -- you read it out of context :) > > Ah, the special case is that it's fully lockless. That's still not all t= hat > special, e.g. see kvm_tdp_mmu_walk_lockless_{begin,end}(). > > > * For reads, this list is protected by: > > * the MMU lock in read mode + RCU or > > * the MMU lock in write mode > > * > > * For writes, this list is protected by: > > * the MMU lock in read mode + the tdp_mmu_pages_lock or > > * the MMU lock in write mode > > * > > * kvm_arch_test_clear_young() is a special case. > > ... > > > > struct list_head tdp_mmu_roots; > > > > > Just drop the > > > entire comment. > > > > Let me move it into kvm_arch_test_clear_young(). > > No, I do not want kvm_arch_test_clear_young(), or any other one-off funct= ion, to > be "special". I love the idea of a lockless walk, but I want it to be a = formal, > documented way to walk TDP MMU roots. I.e. add macro to go with for_each= _tdp_mmu_root() > and the yield-safe variants. I see what you mean now. will do. > /* blah blah blah */ > #define for_each_tdp_mmu_root_lockless(_kvm, _root, _as_id) \ > list_for_each_entry_rcu(_root, &kvm->arch.tdp_mmu_roots, link) \ > if (refcount_read(&root->tdp_mmu_root_count) && \ > kvm_mmu_page_as_id(_root) !=3D _as_id) { = \ > } else > > > Also I want to be clear: > > 1. We can't just focus on here and now; we need to consider the distant= future. > > I 100% agree, but those words need to be backed up by actions. This seri= es is > littered with code that is not maintainable long term, e.g. open coding s= tuff > that belongs in helpers and/or for which KVM already provides helpers, co= py-pasting > __kvm_handle_hva_range() instead of extending it to have a lockless optio= n, poking > directly into KVM from mm/ code, etc. > > I apologize for being so blunt. My intent isn't to be rude/snarky, it's = to set > very clear expectations for getting any of these changes merges. No worries at all. I appreciate you directly telling me how you prefer it to be done, and that makes the job easier for both of us. Please do bear with me though, because I'm very unfamiliar with the KVM side of expectations. > I asbolutely do > want to land improvments to KVM's test+clear young flows, but it needs to= be done > in a way that is maintainable and doesn't saddle KVM with more tech debt. Agreed. > > 2. From my POV, "see the comments on ..." is like the index of a book. > > And my _very_ strong preference is to provide the "index" via code, not c= omments. Will do. > > > Clearing a single bit doesn't need a CMPXCHG. Please weigh in on a r= elevant series > > > that is modifying the aging flows[*], I want to have exactly one help= er for aging > > > TDP MMU SPTEs. > > > > > > [*] https://lore.kernel.org/all/20230211014626.3659152-5-vipinsh@goog= le.com > > > > I'll take a look at that series. clear_bit() probably won't cause any > > practical damage but is technically wrong because, for example, it can > > end up clearing the A-bit in a non-leaf PMD. (cmpxchg will just fail > > in this case, obviously.) > > Eh, not really. By that argument, clearing an A-bit in a huge PTE is als= o technically > wrong because the target gfn may or may not have been accessed. Sorry, I don't understand. You mean clear_bit() on a huge PTE is technically wrong? Yes, that's what I mean. (cmpxchg() on a huge PTE is not.) > The only way for > KVM to clear a A-bit in a non-leaf entry is if the entry _was_ a huge PTE= , but was > replaced between the "is leaf" and the clear_bit(). I think there is a misunderstanding here. Let me be more specific: 1. Clearing the A-bit in a non-leaf entry is technically wrong because that's not our intention. 2. When we try to clear_bit() on a leaf PMD, it can at the same time become a non-leaf PMD, which causes 1) above, and therefore is technically wrong. 3. I don't think 2) could do any real harm, so no practically no problem. 4. cmpxchg() can avoid 2). Does this make sense? Thanks.