From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE5ADC0015E for ; Tue, 25 Jul 2023 18:05:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 305576B0074; Tue, 25 Jul 2023 14:05:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B5896B0075; Tue, 25 Jul 2023 14:05:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 157E26B0078; Tue, 25 Jul 2023 14:05:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 08D086B0074 for ; Tue, 25 Jul 2023 14:05:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 85E72140E93 for ; Tue, 25 Jul 2023 18:05:34 +0000 (UTC) X-FDA: 81050911788.04.28F818F Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf03.hostedemail.com (Postfix) with ESMTP id 8B3FD2001A for ; Tue, 25 Jul 2023 18:05:32 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7nXQcBRp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 36w7AZAYKCDYkWSfbUYggYdW.Ugedafmp-eecnSUc.gjY@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=36w7AZAYKCDYkWSfbUYggYdW.Ugedafmp-eecnSUc.gjY@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690308332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BIo8qHauKdQX+T8FT8wbaojm6/JdQZAwM03PT/kIykI=; b=4FWiluqYCTNqN4B9HiZCQznZSrQplwlqHnSVccw8Go/35/WH0t5l4TZ8PvryGHk8PVECJm d3DD60cBVldPOL3DbJ0VmXPkiA3dNQ8wKrL+KDl7yIDF2msT+lJzu33uLyYmWuzQoI6COn CvwVEYarmio+Mm2CCfD+AvVsvEwtt7E= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=7nXQcBRp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 36w7AZAYKCDYkWSfbUYggYdW.Ugedafmp-eecnSUc.gjY@flex--seanjc.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=36w7AZAYKCDYkWSfbUYggYdW.Ugedafmp-eecnSUc.gjY@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690308332; a=rsa-sha256; cv=none; b=16oc/5rynerhrGMLaZuSbe+1+rVP4BFJL/CGSIOBkGGIiTDf6zYLyUCCBPCqGeGblMlgMk 3W+O+01H44GiOFf//SFdLyMvSoodfRQYNQF8a7s7wOcEFrRd5V3xZ4jfY2j+ZNrOrOdn00 6IuXiYDCQ79NtIdoNRa/ORZ2TBf8v/E= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-564fb1018bcso70061587b3.0 for ; Tue, 25 Jul 2023 11:05:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1690308331; x=1690913131; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BIo8qHauKdQX+T8FT8wbaojm6/JdQZAwM03PT/kIykI=; b=7nXQcBRpxoTRBB3DVtfi1G3SLzO+8rbX0pZK3GqB/ZiRovVnhgEY5C+Gs70dPcpJKC v8EbJPLKK/PtzLxWiGJ5qeO4LJqB7HN3pWAhYbANBnQSVYoMTUMzCaeCD3zEfSqoM+r2 fDpLY03C1R0xadVk8zh9oHGZMZqSmWV+dliNXRuRczTowFXc/oEsFJjHeqmHR0h/7zjV 1SaakHlCZawqmTagIefeOP2xnfIobskVh1rmXibU4Cwg3i96EQeqcAc0By/i+P6tJXi9 1AMdfmti0ZHyvBS0qJY6Ig052U8rUWQC7aHvZS0sxZXtvMY234BUecIw+5qK/UmUColq VbWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690308331; x=1690913131; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BIo8qHauKdQX+T8FT8wbaojm6/JdQZAwM03PT/kIykI=; b=PnEC0PE+mOHQ+3h7S/uf+RDYaI3GsfOe553L/tuteQAeon6sFubyQvzpxzwaPJTtQy MA1M2bE5tjT6O9EAP9ZMJs8AeACv7v3c/IHdrwyD5enTh8Zox1HndRVReuvpMPei3S6B I2qiuoIhSixTufjNcI5roQkHbqslyTIJ7oL1TkvMBPxAC9/1yLiG1QCTTLt3BJX1DOG8 W0n9zf/1CygWEcgb8fRhOJuUtGr9A2xDz/w8KdAeoB2m7Ca9ilFwHAeNAUbvyX+qIU/Q hvawhH/zf4yPH9qZMldzUKMRUlKreEQqNkWhHsjm9a41UQbtIIODpdLcw3xaL8w/Woap U5Aw== X-Gm-Message-State: ABy/qLbhj/EN+gePeeFJFY7I0AOwCU5fFr2iQyjhJuXy7DAsZ8US2TJ8 sEEyTuIdtVu9VcefySFtC118tw6XWVY= X-Google-Smtp-Source: APBJJlG+f5K6Guu8BWDh2M0jvEhmql8458IaH+LCw2ABsy+ZDoDj1Vv7TcavygUx+nDEGvPQs1RCmsWjSr8= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:4509:0:b0:573:5797:4b9e with SMTP id s9-20020a814509000000b0057357974b9emr213ywa.1.1690308331572; Tue, 25 Jul 2023 11:05:31 -0700 (PDT) Date: Tue, 25 Jul 2023 11:05:29 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230718234512.1690985-1-seanjc@google.com> <20230718234512.1690985-2-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v11 01/29] KVM: Wrap kvm_gfn_range.pte in a per-action union From: Sean Christopherson To: Xu Yilun Cc: Yan Zhao , Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Yu Zhang , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , Vlastimil Babka , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8B3FD2001A X-Stat-Signature: m3e45pxd8icrsthpdimc1ffurm3ygx5f X-Rspam-User: X-HE-Tag: 1690308332-145788 X-HE-Meta: U2FsdGVkX18tYkau5cT1C0mdYoQ3EUKAVP9zjq2GlOEP7eE3CVzsrs8pkJeaHixsTLwUi74kviJllkzQZkC47vhMH+20PomqlDjBeh3QCSgLb6KoBqiZvvb7ay5pv8OzUWkoRm0YCDAHaLNnZFxleJk5KLCmHHJtAPqiMQUabJ05nnzzjmVorgJ8+PaWzRVjmwpdTJmf4CHSNLbze1J8LB87W0sSjc++g3SvIvOW81vSZQzK58+L1mMjxxCCRryXFPotn6yCtIo8Kimw0WZcGpVI1LxwNryjZu03E0U1BNiunmhUvXGeg6kdAMgbPpurMKaaIExX+NWGLKwxyWS/d0OFNKtlk9XfjBUlAtQcXlR9BuCspg3UurgQmMLjHjIMQ72xNC6+g4NQnNA3ZkKFkeLeKGxFZqZehB+ZJKBEOrpYjekxJZmd/AoeP9ghNEK/X05aTJ6N6pUEieI/DBvjTvmVGbb0wGiMz6b5SxLtKQCPfG69e+x+tseiVWcQCzTsvtfgyzyBHoO9EGnNKQs903YgKdF85CTs6F1uzlY+Z1dUIZQoRoucAYccMIaBLnQYrGVUJizMXugZTapKSlxZtTLxuzcG61iV0G+mtBcguQXLGDqgUQNmMwzm8Vw0feG+6boV4bm2PVpHJbQtbgs/oSNT+OHwGm56nrwQFazMRmMV2LgA0Elp0pBRRndZ54Z7V2o9D+NpHM/RxfBhUhyi6s/D1Ujh0QTWBWifn+7mIuk25r6rgBfDnCnCu3I3yDkng0FxCajW1rmYGtrVCoH/brwUl6TAT8jTh3U8v9B0M3eC9S/YIwoGi8YFrbuX5UGZ4CGZezFy/SSVeh4AAc3bWfGJRLkvP6TrJOn7K49TMcX4klBHc5EmWvscIsb7nK3cOHogczgngNcjFtH+6rpTlxnn64x9ee1bb8Fntd1DmX4O975YPaW4AsgGbAQTbvoR5Au61U+V/sU6NVxIwfA baHDUse2 C31QAxGffqbZ6utmxL4VLWaQS9wJv0yAQY+GY1Pu029HVjrkVmYyy0WlV5oYF6cTg9aVgHekWHVUMHLYOBaos3eiyTHjCBB6V4740pvhFD8B8FKAmH8RsRFMkxGRdaqqkSGZ8MQIVF0Tl9D06V+ITerK5lu5mpLxi3uSCj5NFR3NzMlrvR9KeafUMvKB5UiBn8fwPTGLDqZzWwFV1cn51FtzZLGo7/CdAV36XztQgytSrdr8oVl3am9wdI7zsmlY5oaDD3qkYWZD570av+vmbDW2QLaX+j3cXV7jrmRLram+FirgbxQvQr6VhhhY1U4DuHsOH5w0qjC6tHdGt+kLOw//2Ww2fau9wrCzUaS5sN4NEB6MYfrrZYMB7aRpMq5AcI1clMwq/xjZyE5mECvOFkOemZj9p4nPx+T3N631ROsDc9ncNuCzwLX/Cb+1R1V5UrfhhtPp4kbZAaD9gFa1ZM36tGj32+7LhBkwDI67lW4lTkGUVg7Zee2osk9ARirSOZgClFL+rQe9inO+GJ8fuzflfSsws0YCwXp4HdAzECCBgyG4uxT+yyP4TbZE7dCWUYNe2GCVSW5d18qoRmiIBvFExRRqQW7VqmuIbAHNk/Fy2TIOkHFPik/zCeuURdyBaahq9L6Y1r2vBmCuwVSsoqiTHPSzuOo6VUv8Sq8DGTEd+sEj59ZEAh/yZ6fiZNBpyPYocXSyV7dt3XAwuScimhu0FpFMNnRe5Xe6D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 21, 2023, Xu Yilun wrote: > On 2023-07-21 at 14:26:11 +0800, Yan Zhao wrote: > > On Tue, Jul 18, 2023 at 04:44:44PM -0700, Sean Christopherson wrote: > > > > May I know why KVM now needs to register to callback .change_pte()? > > I can see the original purpose is to "setting a pte in the shadow page > table directly, instead of flushing the shadow page table entry and then > getting vmexit to set it"[1]. > > IIUC, KVM is expected to directly make the new pte present for new > pages in this callback, like for COW. Yes. > > As also commented in kvm_mmu_notifier_change_pte(), .change_pte() must be > > surrounded by .invalidate_range_{start,end}(). > > > > While kvm_mmu_notifier_invalidate_range_start() has called kvm_unmap_gfn_range() > > to zap all leaf SPTEs, and page fault path will not install new SPTEs > > successfully before kvm_mmu_notifier_invalidate_range_end(), > > kvm_set_spte_gfn() should not be able to find any shadow present leaf entries to > > update PFN. > > I also failed to figure out how the kvm_set_spte_gfn() could pass > several !is_shadow_present_pte(iter.old_spte) check then write the new > pte. It can't. .change_pte() has been dead code on x86 for 10+ years at this point, and if my assessment from a few years back still holds true, it's dead code on all architectures. The only reason I haven't formally proposed dropping the hook is that I don't want to risk the patch backfiring, i.e. I don't want to prompt someone to care enough to try and fix it. commit c13fda237f08a388ba8a0849785045944bf39834 Author: Sean Christopherson Date: Fri Apr 2 02:56:49 2021 +0200 KVM: Assert that notifier count is elevated in .change_pte() In KVM's .change_pte() notification callback, replace the notifier sequence bump with a WARN_ON assertion that the notifier count is elevated. An elevated count provides stricter protections than bumping the sequence, and the sequence is guarnateed to be bumped before the count hits zero. When .change_pte() was added by commit 828502d30073 ("ksm: add mmu_notifier set_pte_at_notify()"), bumping the sequence was necessary as .change_pte() would be invoked without any surrounding notifications. However, since commit 6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start and invalidate_range_end"), all calls to .change_pte() are guaranteed to be surrounded by start() and end(), and so are guaranteed to run with an elevated notifier count. Note, wrapping .change_pte() with .invalidate_range_{start,end}() is a bug of sorts, as invalidating the secondary MMU's (KVM's) PTE defeats the purpose of .change_pte(). Every arch's kvm_set_spte_hva() assumes .change_pte() is called when the relevant SPTE is present in KVM's MMU, as the original goal was to accelerate Kernel Samepage Merging (KSM) by updating KVM's SPTEs without requiring a VM-Exit (due to invalidating the SPTE). I.e. it means that .change_pte() is effectively dead code on _all_ architectures. x86 and MIPS are clearcut nops if the old SPTE is not-present, and that is guaranteed due to the prior invalidation. PPC simply unmaps the SPTE, which again should be a nop due to the invalidation. arm64 is a bit murky, but it's also likely a nop because kvm_pgtable_stage2_map() is called without a cache pointer, which means it will map an entry if and only if an existing PTE was found. For now, take advantage of the bug to simplify future consolidation of KVMs's MMU notifier code. Doing so will not greatly complicate fixing .change_pte(), assuming it's even worth fixing. .change_pte() has been broken for 8+ years and no one has complained. Even if there are KSM+KVM users that care deeply about its performance, the benefits of avoiding VM-Exits via .change_pte() need to be reevaluated to justify the added complexity and testing burden. Ripping out .change_pte() entirely would be a lot easier.