From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CCCC4C5AD4C for ; Thu, 23 Nov 2023 14:49:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=0fJox3Nm4Rcg3S4H+65ql1rip57in4sRM5WWKEyNX5A=; b=KZEm4Mtq7fRpEW ohfeMrk6wWkGJkuHfSOat7Fmmm1j8B9nd3b2a3DFJtFXLSc64ivgJBwrCuV1298rQGgG6ot+OwRZD 7skonQC3ETUwZWIMA2gxJSjmUgb8WokowTGVNG2eAi5BglpGwiR3d4bNsP/iqROtG1cCbBJ0/MjNW O7Tzynxz5j1ytLb5ir2ZrdAUNXsKzFvUQ3yrPGVQx/N4bFlXG3jA+C16MKO5wQ2woeXAUzAPWlsWs JRl1aD6giUL/m2Winrsa4ktE4p6RNdTupRBVxmv1LHAWfwgugvn+V6UjyKstNjP8u8/9RYL8JnrgG 69Nqr5dy7IOnrscONbAg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r6B0l-0054Hu-1G; Thu, 23 Nov 2023 14:48:59 +0000 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r6B0h-0054GG-1d for linux-arm-kernel@lists.infradead.org; Thu, 23 Nov 2023 14:48:57 +0000 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-40b35199f94so68865e9.0 for ; Thu, 23 Nov 2023 06:48:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1700750931; x=1701355731; darn=lists.infradead.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Id8FFmZPqAcjX9nM2kv7yhFZ7V91eseH8kTxLchD3kM=; b=sUbpguufOjqAA8gxpkfUnRK+SFK6IK5mX2du+arrRXgKNMfePUEHQRJ44TM4v/0RDU 65xj21XLYbKblFRabbQ284NCdRHT8QZP+b6Upw9v24Xl98/azy6pHj+aGcdxTztriQjT 4RMhiLnRmqdAmq6E5H5tQwnRvAmoPQK8W1KX5QNMJlaKUfaBT8p45jfU9foBbfc/3xTz YxHroMjMhQNjMH9A554yGFM3hxhkdkevFnhzo1L2QjabY4ixmF4ndKRUeaUXprf1pe7b oT5FzPub8VFZUTzo9J+r+LDUUT1GkcE9RSoWoy87qX4CKT2KD5OgU8JKihEETBeUW4Jl Ovqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700750931; x=1701355731; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Id8FFmZPqAcjX9nM2kv7yhFZ7V91eseH8kTxLchD3kM=; b=JweulDWOSsnyYY/QfjhPw4bE9GT0k/8AeLpOzWNvvyZXyIZKydaSBhkmsLKB2AbThu RSOaUdRHgl56wnlWVVMPz//EQeQFskQ9QjoLqPRwdZv/qM9g8p/dcCfOOYLHtB4Nl5Cs IR3wKNejvLcgkFGnGAgVBRamWyqBHJh/GyCH/SyBmsGBFvu5SGusBSAlS27U5DfTt5Ki /AS+A4S5Hi2NLblUIe1x/e2mFst0m6IX8S5Yk6pp8OBDh5JZEoibifLUiITdmU6inDx8 xN2wBccDG+aWv0H6km2zKvtfOVdlwpAXg6uOVj+d4KMuqozdfeOR7up2+3ijBo5mNK7s 6FFg== X-Gm-Message-State: AOJu0YxrZtcbbkew5kzgSiGUsSuYR+GUM1tkoBSo9AzQWBgaO6tF3cV+ ofa2plTAhfvSjm8plKxYJeqnSA== X-Google-Smtp-Source: AGHT+IFCh5xgc0nF4xb2cR9PLPVK5PT5r87gQj8D+vxwP8OMDt4I+IMAbggtwK74+wGLTcgejKtYDg== X-Received: by 2002:a05:600c:b4f:b0:404:7462:1f87 with SMTP id k15-20020a05600c0b4f00b0040474621f87mr287391wmr.6.1700750931337; Thu, 23 Nov 2023 06:48:51 -0800 (PST) Received: from google.com (110.121.148.146.bc.googleusercontent.com. [146.148.121.110]) by smtp.gmail.com with ESMTPSA id t4-20020a0560001a4400b003316eb9db40sm1859750wry.51.2023.11.23.06.48.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 06:48:50 -0800 (PST) Date: Thu, 23 Nov 2023 14:48:49 +0000 From: Sebastian Ene To: Vincent Donnefort Cc: will@kernel.org, Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , catalin.marinas@arm.com, mark.rutland@arm.com, akpm@linux-foundation.org, maz@kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kernel-team@android.com, qperret@google.com, smostafa@google.com Subject: Re: [PATCH v3 06/10] arm64: ptdump: Register a debugfs entry for the host stage-2 tables Message-ID: References: <20231115171639.2852644-2-sebastianene@google.com> <20231115171639.2852644-8-sebastianene@google.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231123_064855_546740_5E9C02A2 X-CRM114-Status: GOOD ( 51.42 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Nov 21, 2023 at 05:13:41PM +0000, Vincent Donnefort wrote: > On Wed, Nov 15, 2023 at 05:16:36PM +0000, Sebastian Ene wrote: Hi, > > Initialize a structures used to keep the state of the host stage-2 ptdump > > walker when pKVM is enabled. Create a new debugfs entry for the host > > stage-2 pagetables and hook the callbacks invoked when the entry is > > accessed. When the debugfs file is opened, allocate memory resources which > > will be shared with the hypervisor for saving the pagetable snapshot. > > On close release the associated memory and we unshare it from the > > hypervisor. > > > > Signed-off-by: Sebastian Ene > > --- > > arch/arm64/include/asm/ptdump.h | 12 +++ > > arch/arm64/kvm/Kconfig | 13 +++ > > arch/arm64/kvm/arm.c | 2 + > > arch/arm64/mm/ptdump.c | 168 ++++++++++++++++++++++++++++++++ > > arch/arm64/mm/ptdump_debugfs.c | 8 +- > > 5 files changed, 202 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/include/asm/ptdump.h b/arch/arm64/include/asm/ptdump.h > > index 9b2bebfcefbe..de5a5a0c0ecf 100644 > > --- a/arch/arm64/include/asm/ptdump.h > > +++ b/arch/arm64/include/asm/ptdump.h > > @@ -22,6 +22,7 @@ struct ptdump_info { > > void (*ptdump_walk)(struct seq_file *s, struct ptdump_info *info); > > int (*ptdump_prepare_walk)(void *file_priv); > > void (*ptdump_end_walk)(void *file_priv); > > + size_t mc_len; > > }; > > > > void ptdump_walk(struct seq_file *s, struct ptdump_info *info); > > @@ -33,13 +34,24 @@ struct ptdump_info_file_priv { > > #ifdef CONFIG_PTDUMP_DEBUGFS > > #define EFI_RUNTIME_MAP_END DEFAULT_MAP_WINDOW_64 > > void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name); > > +void ptdump_debugfs_kvm_register(struct ptdump_info *info, const char *name, > > + struct dentry *d_entry); > > #else > > static inline void ptdump_debugfs_register(struct ptdump_info *info, > > const char *name) { } > > +static inline void ptdump_debugfs_kvm_register(struct ptdump_info *info, > > + const char *name, > > + struct dentry *d_entry) { } > > #endif > > void ptdump_check_wx(void); > > #endif /* CONFIG_PTDUMP_CORE */ > > > > +#ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS > > +void ptdump_register_host_stage2(void); > > +#else > > +static inline void ptdump_register_host_stage2(void) { } > > +#endif /* CONFIG_PTDUMP_STAGE2_DEBUGFS */ > > + > > #ifdef CONFIG_DEBUG_WX > > #define debug_checkwx() ptdump_check_wx() > > #else > > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig > > index 83c1e09be42e..cf5b7f06b152 100644 > > --- a/arch/arm64/kvm/Kconfig > > +++ b/arch/arm64/kvm/Kconfig > > @@ -71,4 +71,17 @@ config PROTECTED_NVHE_STACKTRACE > > > > If unsure, or not using protected nVHE (pKVM), say N. > > > > +config PTDUMP_STAGE2_DEBUGFS > > + bool "Present the stage-2 pagetables to debugfs" > > + depends on NVHE_EL2_DEBUG && PTDUMP_DEBUGFS && KVM > > + default n > > + help > > + Say Y here if you want to show the stage-2 kernel pagetables > > + layout in a debugfs file. This information is only useful for kernel developers > > + who are working in architecture specific areas of the kernel. > > + It is probably not a good idea to enable this feature in a production > > + kernel. > > + > > + If in doubt, say N. > > + > > endif # VIRTUALIZATION > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index e5f75f1f1085..987683650576 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -28,6 +28,7 @@ > > > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -2592,6 +2593,7 @@ static __init int kvm_arm_init(void) > > if (err) > > goto out_subs; > > > > + ptdump_register_host_stage2(); > > kvm_arm_initialised = true; > > > > return 0; > > diff --git a/arch/arm64/mm/ptdump.c b/arch/arm64/mm/ptdump.c > > index d531e24ea0b2..0b4cb54e43ff 100644 > > --- a/arch/arm64/mm/ptdump.c > > +++ b/arch/arm64/mm/ptdump.c > > @@ -24,6 +24,9 @@ > > #include > > #include > > #include > > +#include > > +#include > > +#include > > > > > > enum address_markers_idx { > > @@ -378,6 +381,170 @@ void ptdump_check_wx(void) > > pr_info("Checked W+X mappings: passed, no W+X pages found\n"); > > } > > > > +#ifdef CONFIG_PTDUMP_STAGE2_DEBUGFS > > +static struct ptdump_info stage2_kernel_ptdump_info; > > + > > +static phys_addr_t ptdump_host_pa(void *addr) > > +{ > > + return __pa(addr); > > +} > > + > > +static void *ptdump_host_va(phys_addr_t phys) > > +{ > > + return __va(phys); > > +} > > + > > +static size_t stage2_get_pgd_len(void) > > +{ > > + u64 mmfr0, mmfr1, vtcr; > > + u32 phys_shift = get_kvm_ipa_limit(); > > + > > + mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); > > + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1); > > + vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift); > > + > > + return kvm_pgtable_stage2_pgd_size(vtcr); > > That's a lot of conversions to go from the kvm_ipa_limit to VTCR and > VTCR back to ia_bits and the start level, but that would mean rewrite pieces of > pgtable.c there. :-\ > Right, I think with Oliver's suggestion we will no longer have to move these bits around and the code will be self contained under /kvm. > > +} > > + > > +static int stage2_ptdump_prepare_walk(void *file_priv) > > +{ > > + struct ptdump_info_file_priv *f_priv = file_priv; > > + struct ptdump_info *info = &f_priv->info; > > + struct kvm_pgtable_snapshot *snapshot; > > + int ret, pgd_index, mc_index, pgd_pages_sz; > > + void *page_hva; > > + phys_addr_t pgd; > > + > > + snapshot = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL_ACCOUNT); > > + if (!snapshot) > > + return -ENOMEM; > > For a single page, __get_free_page is enough. > I can use this, thanks. > > + > > + memset(snapshot, 0, PAGE_SIZE); > > + ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, virt_to_pfn(snapshot)); > > + if (ret) > > + goto free_snapshot; > > It'd probably be better to not share anything here, and let the hypervisor do > host_donate_hyp() and hyp_donate_host() before returning back from the HVC. This > way the hypervisor will protect itself. > Right, as we took this discussion offline, I will update this and use the *donate* API. > > + > > + snapshot->pgd_len = stage2_get_pgd_len(); > > + pgd_pages_sz = snapshot->pgd_len / PAGE_SIZE; > > + snapshot->pgd_hva = alloc_pages_exact(snapshot->pgd_len, > > + GFP_KERNEL_ACCOUNT); > > + if (!snapshot->pgd_hva) { > > + ret = -ENOMEM; > > + goto unshare_snapshot; > > + } > > + > > + for (pgd_index = 0; pgd_index < pgd_pages_sz; pgd_index++) { > > + page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE; > > + ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, > > + virt_to_pfn(page_hva)); > > + if (ret) > > + goto unshare_pgd_pages; > > + } > > + > > + for (mc_index = 0; mc_index < info->mc_len; mc_index++) { > > + page_hva = alloc_pages_exact(PAGE_SIZE, GFP_KERNEL_ACCOUNT); > > ditto. > Ack. > > + if (!page_hva) { > > + ret = -ENOMEM; > > + goto free_memcache_pages; > > + } > > + > > + push_hyp_memcache(&snapshot->mc, page_hva, ptdump_host_pa); > > + ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, > > + virt_to_pfn(page_hva)); > > + if (ret) { > > + pop_hyp_memcache(&snapshot->mc, ptdump_host_va); > > + free_pages_exact(page_hva, PAGE_SIZE); > > + goto free_memcache_pages; > > + } > > Maybe for the page-table pages, it'd be better to let the hyp does the > host_donate_hyp() / hyp_donate_host()? It might be easier than sharing + pin. > > > + } > > + > > + ret = kvm_call_hyp_nvhe(__pkvm_copy_host_stage2, snapshot); > > + if (ret) > > + goto free_memcache_pages; > > + > > + pgd = (phys_addr_t)snapshot->pgtable.pgd; > > + snapshot->pgtable.pgd = phys_to_virt(pgd); > > + f_priv->file_priv = snapshot; > > + return 0; > > + > > +free_memcache_pages: > > + page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va); > > + while (page_hva) { > > + ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(page_hva)); > > + WARN_ON(ret); > > + free_pages_exact(page_hva, PAGE_SIZE); > > + page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va); > > + } > > +unshare_pgd_pages: > > + pgd_index = pgd_index - 1; > > + for (; pgd_index >= 0; pgd_index--) { > > + page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE; > > + ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(page_hva)); > > + WARN_ON(ret); > > + } > > + free_pages_exact(snapshot->pgd_hva, snapshot->pgd_len); > > +unshare_snapshot: > > + WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(snapshot))); > > +free_snapshot: > > + free_pages_exact(snapshot, PAGE_SIZE); > > + f_priv->file_priv = NULL; > > + return ret; > > Couldn't this path be merged with stage2_ptdump_end_walk()? > I think it should be doable. > > +} > > + > > +static void stage2_ptdump_end_walk(void *file_priv) > > +{ > > + struct ptdump_info_file_priv *f_priv = file_priv; > > + struct kvm_pgtable_snapshot *snapshot = f_priv->file_priv; > > + void *page_hva; > > + int pgd_index, ret, pgd_pages_sz; > > + > > + if (!snapshot) > > + return; > > + > > + page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va); > > + while (page_hva) { > > + ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(page_hva)); > > + WARN_ON(ret); > > + free_pages_exact(page_hva, PAGE_SIZE); > > + page_hva = pop_hyp_memcache(&snapshot->mc, ptdump_host_va); > > + } > > + > > + pgd_pages_sz = snapshot->pgd_len / PAGE_SIZE; > > + for (pgd_index = 0; pgd_index < pgd_pages_sz; pgd_index++) { > > + page_hva = snapshot->pgd_hva + pgd_index * PAGE_SIZE; > > + ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(page_hva)); > > + WARN_ON(ret); > > + } > > + > > + free_pages_exact(snapshot->pgd_hva, snapshot->pgd_len); > > + WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, > > + virt_to_pfn(snapshot))); > > + free_pages_exact(snapshot, PAGE_SIZE); > > + f_priv->file_priv = NULL; > > +} > > + > > +void ptdump_register_host_stage2(void) > > +{ > > + if (!is_protected_kvm_enabled()) > > + return; > > + > > + stage2_kernel_ptdump_info = (struct ptdump_info) { > > + .mc_len = host_s2_pgtable_pages(), > > + .ptdump_prepare_walk = stage2_ptdump_prepare_walk, > > + .ptdump_end_walk = stage2_ptdump_end_walk, > > + }; > > + > > + ptdump_debugfs_kvm_register(&stage2_kernel_ptdump_info, > > + "host_stage2_page_tables", > > + kvm_debugfs_dir); > > +} > > +#endif /* CONFIG_PTDUMP_STAGE2_DEBUGFS */ > > + > > static int __init ptdump_init(void) > > { > > address_markers[PAGE_END_NR].start_address = PAGE_END; > > @@ -386,6 +553,7 @@ static int __init ptdump_init(void) > > #endif > > ptdump_initialize(); > > ptdump_debugfs_register(&kernel_ptdump_info, "kernel_page_tables"); > > + > > Not needed. > Will remove this, checkpatch didn't seem to complain about it. > > return 0; > > } > > device_initcall(ptdump_init); > > diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c > > index 3bf5de51e8c3..4821dbef784c 100644 > > --- a/arch/arm64/mm/ptdump_debugfs.c > > +++ b/arch/arm64/mm/ptdump_debugfs.c > > @@ -68,5 +68,11 @@ static const struct file_operations ptdump_fops = { > > > > void __init ptdump_debugfs_register(struct ptdump_info *info, const char *name) > > { > > - debugfs_create_file(name, 0400, NULL, info, &ptdump_fops); > > + ptdump_debugfs_kvm_register(info, name, NULL); > > Not really related to kvm, the only difference is passing or not a dentry. > > How about a single (non __init) function? > I don't think it works because you have to keep the signature of the original function. This 'ptdump_debugfs_register' is also called from the non-arch drivers code. > > +} > > + > > +void ptdump_debugfs_kvm_register(struct ptdump_info *info, const char *name, > > + struct dentry *d_entry) > > +{ > > + debugfs_create_file(name, 0400, d_entry, info, &ptdump_fops); > > } > > -- > > 2.43.0.rc0.421.g78406f8d94-goog > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel