From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66EC0C27C79 for ; Wed, 12 Jun 2024 00:34:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E66AF6B0096; Tue, 11 Jun 2024 20:34:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEF3B6B011E; Tue, 11 Jun 2024 20:34:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C41206B011F; Tue, 11 Jun 2024 20:34:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A28DD6B0096 for ; Tue, 11 Jun 2024 20:34:29 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 494811A0677 for ; Wed, 12 Jun 2024 00:34:29 +0000 (UTC) X-FDA: 82220365458.16.629131D Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf14.hostedemail.com (Postfix) with ESMTP id 82021100013 for ; Wed, 12 Jun 2024 00:34:27 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OXaE3yu0; spf=pass (imf14.hostedemail.com: domain of 3Eu1oZgYKCMk7tp2yrv33v0t.r310x29C-11zAprz.36v@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3Eu1oZgYKCMk7tp2yrv33v0t.r310x29C-11zAprz.36v@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718152467; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wZtnx2ypD298MOxfRWQWCGDlyBsjwxOedcKmzJ1Vh/w=; b=65X5LEalVGlFCSzG5wiWUN/dYikt2K2ysNkiEGglUOMl1hk5So1lZ/qBeZueZLB4mLbmRa a78yTTZogYN0nPKjis95aL+R9to61QDdoUVtnjMBQbCSn8KWPDYUgencLak6gq9bZf/GG/ G0Kva0vb2WsFEXVxCBkD3UuMXNRuO/U= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=OXaE3yu0; spf=pass (imf14.hostedemail.com: domain of 3Eu1oZgYKCMk7tp2yrv33v0t.r310x29C-11zAprz.36v@flex--seanjc.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3Eu1oZgYKCMk7tp2yrv33v0t.r310x29C-11zAprz.36v@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718152467; a=rsa-sha256; cv=none; b=i0r5UtcCaNaRFIuT8WIyyE+uBtuBKgZ3UdqEN7TycuuMP/7epP8+XQ5G1JFJ0bZQcCfus0 ELjlAJGhbEzrWDeIfu0delYqpTlEMWUwxToFFljFY4aaLATX9FNhdf/bZfrhjThA/e5XMp CEmSSH7TVPMBUgAFhufnuygdf08br+8= Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-667fd2bf4feso5513314a12.3 for ; Tue, 11 Jun 2024 17:34:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1718152466; x=1718757266; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=wZtnx2ypD298MOxfRWQWCGDlyBsjwxOedcKmzJ1Vh/w=; b=OXaE3yu0EFKz1dKLUCywESlAhnRUs1DC1deaGWhpWCWvQcQBseMV+8ZTXM22xg/t9l YWLMUgc9JVl239gg8gjSURlq5iFSGvdsIf+3b68ZEn8VAV0iOL7aBVZ+ekz4fclA9Evt 6x2gVwxYHF4jTKU3TNWuaHyudY2uGmB5v1D52rzRNUv4kAHYnSZLacLww505n2uezX2H 50lXu5k1z/abfPqEiojiSv6bV3IzAA2BRXkVplm6+WbPBy3tk3/bZajcD5zS6jRXliD/ NskrlctKjYp8uZNOx5udxFnV9pEMMXie2erzbZit4VnCJcdE7NolNdztujDT15qM2bX0 L4fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718152466; x=1718757266; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=wZtnx2ypD298MOxfRWQWCGDlyBsjwxOedcKmzJ1Vh/w=; b=wpvXo2G2CaaJX4IxxuKQbxeF8OnZFp+EZbel2v5YZ5tgjaemwdoqWMpVserHjGvm3m m2zrLoae6af5lY6TmOilz+kKYtwQDOcTo8vfRp6wBSZo26nZNjZB9ZjLbFChoriFEMmL rTwzG8XHYUHP/kKh/4VI5D379g4Bh13Ni/bTXzs2jCOhgxr/XU2JBAdz9509hc6+CKUN sL+B29hzX3A472+EqedkX7MiEWHg78dTcnNf8y1XjvMEmYoDOu9ptbyGE2fvvvi/mrwL 1TskCWg41jYjU+k4T3Z5iwTNnjdR5VlJYYbjvfHcsnmxYB1AReywzmVzls6FNYaonDOE H7XQ== X-Forwarded-Encrypted: i=1; AJvYcCXsZvJi3jPDpJnu49SiFepo5LE2Z0bEcRU5q7WKK7PqGXJbHdgKAxVokk0lWkyw3h9DvIyPid5lDHsbkKp90pDidy4= X-Gm-Message-State: AOJu0YxQWfMA/0OGuDfEG35lYhht8db5+4fFYJUIslSnRRRDNOm0nrfC 3ilFPpszw9ZI5sLP+v5tzSYf+uLWgpl7CG3SgNCrwg+Poz8PSmF8mF/Cz6C7Qt3Tz9+gsggy0pF 47A== X-Google-Smtp-Source: AGHT+IEfUID57KRVZYNnDxF72ThhfPcukhPZNhZpGIpvsildrCe9AtFmJCUvjyphH5TOLB/4W627VDGrjJE= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a02:515:b0:6f6:7aa3:fc71 with SMTP id 41be03b00d2f7-6fae161fed2mr810a12.2.1718152466035; Tue, 11 Jun 2024 17:34:26 -0700 (PDT) Date: Tue, 11 Jun 2024 17:34:24 -0700 In-Reply-To: Mime-Version: 1.0 References: <20240611002145.2078921-1-jthoughton@google.com> <20240611002145.2078921-5-jthoughton@google.com> Message-ID: Subject: Re: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier From: Sean Christopherson To: James Houghton Cc: Yu Zhao , Andrew Morton , Paolo Bonzini , Ankit Agrawal , Axel Rasmussen , Catalin Marinas , David Matlack , David Rientjes , James Morse , Jonathan Corbet , Marc Zyngier , Oliver Upton , Raghavendra Rao Ananta , Ryan Roberts , Shaoqin Huang , Suzuki K Poulose , Wei Xu , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 82021100013 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: u78obsopswrrew5y535gsutti78dkkj1 X-HE-Tag: 1718152467-802529 X-HE-Meta: U2FsdGVkX18q3xwz8Wxit9lQs0NeSTXeybzkyD2jk/ZOt1ioVw28Fpu6BSgS3NbKcdM4KXA3VoBy29qaL3M5N43rYj8OTGcVVmYiUOjWfLSVwxUcFqaNjV7g0wgNMnBhTqL2kRCIW87Z1U6C+m7QMAanbkR53dHxQnAcabOp16ZCo9VskGMYmS/GzUOYlB6iD07t7wlplecZIEAwYxUrOHEUZ0g8NTMoOnP+iTfyIzUxaKh5FE363Ka+R1PofIzVz1O8/1C5ZIYdrxVtQ/xFIfiHfBvVUnhm1x+0uUqA5P+bCBwGq3DlEOSy523JTkgDeK5pMmL1csU3LPpUOu+nbI9YCAfC0DnJK12G5HSrn6J875Q1rVrSRV7GBp5tKDg0LcbTHGNLdJQ/COcE4jZxEpVlMAXiw1slk23fq26NcvtFb/Xy/I7n6UnL9IcLhVTjKnuLqmgITwWgvf1UsXYz+T2TLCc8eUDnwwnv7ni7nfHBqxhOlCn/CsclqZ+rhM3i/noojs1hrEJRflCbww4Gdha2/i9U4V8WiARRy4QtC4ZiSR5y+1X5xEI9sbOdiNglIMkUChVLcq+Nsg9FG8xNzRtDSW4pKe8dQk8NQhdP9ckr6sC3ZWhfBm7Fczxb8oasoa8ms3FCThxPhsWHgVuuX2QZa0VM0Ww8alh0t6066/RzicECNn9Jtw/NIQ8O5J5GNy9+oAkeL8xFdyjGeJj+EjxgnTF1a01QEM/5ePJhvS7CDZpVjPRXL6kP1vW+WxvnlRwHBL1+ydpRry0GMOpAkP6TEpUO9A9wPCPk4WPaNSuG4zlmyOFeE23rd7tCw/uS0FkSyDDsgTxOzJ/oQ4+USQp676btaqr5g71WXaYTXuQACdPVRQe0cYSdQYMDVPObYU0gHTT+Titw592yHfo8oR5b+tkfwWhTZ7LRltFQKwxkXXDM3wIrGYtQY7rSCqOE8pWhOFzta6AjOPrIcDy pUI6XSUq XkO850m+m5YB8OiKzvj8Oym16GsIe21gmJGsOhEwonOWO14OsQurfkMI+cCiBkVFnFZfilu3+A11r9gx7gJyoVQN6tyIxHKXcZiN5ygs5T6EPOVLBAMOr4Rn0eBfrzrA5S+eNsPjozwljb2MiCuEx28L1lbSXFTKftLpQHHMr4eJRmCltYaU6cMGzEYhLYz58TJNuLl5Meqvkxgd5+03aY30mBmYs6GfHqJDa374s/aBmvRqxHFoVC0eX1OSxm9w5nJD7eCb6OZtx6jnbVu2Lm/PXkfoYVm7gjzRBLxBcdi4Gz6htJbhKjEa5OEA3+Lg2h5fRAWOzEkPejtxdjELD+bns2XiIEdznov/eCG+XNl03zVwNc+wsYeaH5BNxdoB9hzWpFozhC0tgYaoPT7TJIQet06iEfUStVyLXWm3lMySXwJLBK9IYgeQPuJHDB3VkQXs8FtLEN3alqOupQMui0TPv3iLu4ax/I5gr4A3TGd4sh/NEH8pKb5Q8NzQFBQQljW1zljWeM9lXWopC5BrYDecOQ38TeToQCf3enxzhuARoZzgrrmxZPoBrDU//x68OXybN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000014, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 11, 2024, James Houghton wrote: > On Tue, Jun 11, 2024 at 12:42=E2=80=AFPM Sean Christopherson wrote: > > -- > > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > > index 7b77ad6cf833..07872ae00fa6 100644 > > --- a/mm/mmu_notifier.c > > +++ b/mm/mmu_notifier.c > > @@ -384,7 +384,8 @@ int __mmu_notifier_clear_flush_young(struct mm_stru= ct *mm, > > > > int __mmu_notifier_clear_young(struct mm_struct *mm, > > unsigned long start, > > - unsigned long end) > > + unsigned long end, > > + bool fast_only) > > { > > struct mmu_notifier *subscription; > > int young =3D 0, id; > > @@ -393,9 +394,12 @@ int __mmu_notifier_clear_young(struct mm_struct *m= m, > > hlist_for_each_entry_rcu(subscription, > > &mm->notifier_subscriptions->list, hli= st, > > srcu_read_lock_held(&srcu)) { > > - if (subscription->ops->clear_young) > > - young |=3D subscription->ops->clear_young(subsc= ription, > > - mm, sta= rt, end); > > + if (!subscription->ops->clear_young || > > + fast_only && !subscription->ops->has_fast_aging) > > + continue; > > + > > + young |=3D subscription->ops->clear_young(subscription, > > + mm, start, end)= ; >=20 > KVM changing has_fast_aging dynamically would be slow, wouldn't it? No, it could/would be done quite quickly. But, I'm not suggesting has_fast= _aging be dynamic, i.e. it's not an "all aging is guaranteed to be fast", it's a "= this MMU _can_ do fast aging". It's a bit fuzzy/weird mostly because KVM can es= sentially have multiple secondary MMUs wired up to the same mmu_notifier. > I feel like it's simpler to just pass in fast_only into `clear_young` its= elf > (and this is how I interpreted what you wrote above anyway). Eh, maybe? A "has_fast_aging" flag is more robust in the sense that it req= uires secondary MMUs to opt-in, i.e. all secondary MMUs will be considered "slow"= by default. =20 It's somewhat of a moot point because KVM is the only secondary MMU that im= plements .clear_young() and .test_young() (which I keep forgetting), and that seems = unlikely to change. A flag would also avoid an indirect call and thus a RETPOLINE when CONFIG_R= ETPOLINE=3Dy, i.e. would be a minor optimization when KVM doesn't suppport fast aging. B= ut that's probably a pretty unlikely combination, so it's probably not a valid argume= nt. So, I guess I don't have a strong opinion? > > Double ugh. Peeking ahead at the "failure" code, NAK to adding > > kvm_arch_young_notifier_likely_fast for all the same reasons I objected= to > > kvm_arch_has_test_clear_young() in v1. Please stop trying to do anythi= ng like > > that, I will NAK each every attempt to have core mm/ code call directly= into KVM. >=20 > Sorry to make you repeat yourself; I'll leave it out of v6. I don't > like it either, but I wasn't sure how important it was to avoid > calling into unnecessary notifiers if the TDP MMU were completely > disabled. If it's important, e.g. for performance, then the mmu_notifier should have = a flag so that the behavior doesn't assume a KVM backend. Hence my has_fast_agin= g suggestion. > > Anyways, back to this code, before we spin another version, we need to = agree on > > exactly what behavior we want out of secondary MMUs. Because to me, th= e behavior > > proposed in this version doesn't make any sense. > > > > Signalling failure because KVM _might_ have relevant aging information = in SPTEs > > that require taking kvm->mmu_lock is a terrible tradeoff. And for the = test_young > > case, it's flat out wrong, e.g. if a page is marked Accessed in the TDP= MMU, then > > KVM should return "young", not "failed". >=20 > Sorry for this oversight. What about something like: >=20 > 1. test (and maybe clear) A bits on TDP MMU > 2. If accessed && !should_clear: return (fast) > 3. if (fast_only): return (fast) > 4. If !(must check shadow MMU): return (fast) > 5. test (and maybe clear) A bits in shadow MMU > 6. return (slow) I don't understand where the "must check shadow MMU" in #4 comes from. I a= lso don't think it's necessary; see below. =20 > Some of this reordering (and maybe a change from > kvm_shadow_root_allocated() to checking indirect_shadow_pages or > something else) can be done in its own patch. > > > So rather than failing the fast aging, I think what we want is to know = if an > > mmu_notifier found a young SPTE during a fast lookup. E.g. something l= ike this > > in KVM, where using kvm_has_shadow_mmu_sptes() instead of kvm_memslots_= have_rmaps() > > is an optional optimization to avoid taking mmu_lock for write in paths= where a > > (very rare) false negative is acceptable. > > > > static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm) > > { > > return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_= pages); > > } > > > > static int __kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range= , > > bool fast_only) > > { > > int young =3D 0; > > > > if (!fast_only && kvm_has_shadow_mmu_sptes(kvm)) { > > write_lock(&kvm->mmu_lock); > > young =3D kvm_handle_gfn_range(kvm, range, kvm_age_rmap= ); > > write_unlock(&kvm->mmu_lock); > > } > > > > if (tdp_mmu_enabled && kvm_tdp_mmu_age_gfn_range(kvm, range)) > > young =3D 1 | MMU_NOTIFY_WAS_FAST; >=20 > I don't think this line is quite right. We might set > MMU_NOTIFY_WAS_FAST even when we took the mmu_lock. I understand what > you mean though, thanks. The name sucks, but I believe the logic is correct. As posted here in v5, = the MGRLU code wants to age both fast _and_ slow MMUs. AIUI, the intent is to = always get aging information, but only look around at other PTEs if it can be done= fast. if (should_walk_secondary_mmu()) { notifier_result =3D mmu_notifier_test_clear_young_fast_only( vma->vm_mm, addr, addr + PAGE_SIZE, /*clear=3D*/true); } if (notifier_result & MMU_NOTIFIER_FAST_FAILED) secondary_young =3D mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); else { secondary_young =3D notifier_result & MMU_NOTIFIER_FAST_YOUNG; notifier_was_fast =3D true; } The change, relative to v5, that I am proposing is that MGLRU looks around = if the page was young in _a_ "fast" secondary MMU, whereas v5 looks around if = and only if _all_ secondary MMUs are fast. In other words, if a fast MMU had a young SPTE, look around _that_ MMU, via= the fast_only flag.