From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id AAEC3C43334
	for <linux-mm@archiver.kernel.org>; Fri,  1 Jul 2022 16:52:20 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 39B0E6B0073; Fri,  1 Jul 2022 12:52:20 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 3247C6B0074; Fri,  1 Jul 2022 12:52:20 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1C5006B0075; Fri,  1 Jul 2022 12:52:20 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 080AC6B0073
	for <linux-mm@kvack.org>; Fri,  1 Jul 2022 12:52:20 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id D50D9358ED
	for <linux-mm@kvack.org>; Fri,  1 Jul 2022 16:52:19 +0000 (UTC)
X-FDA: 79639123998.21.61ACBB1
Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217])
	by imf21.hostedemail.com (Postfix) with ESMTP id B760B1C003F
	for <linux-mm@kvack.org>; Fri,  1 Jul 2022 16:52:18 +0000 (UTC)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dfw.source.kernel.org (Postfix) with ESMTPS id B8B61625F1;
	Fri,  1 Jul 2022 16:52:17 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3C2EC341C7;
	Fri,  1 Jul 2022 16:52:10 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1656694337;
	bh=6OactPBHIuIvcRfwvl3VE1bL+DJCL/krYJF3tLZoebI=;
	h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
	b=LCVg8pYO5wkSe1vpbel9rO1skNXGT9bVjJ2wDdH9jo61swL4UlsfLH4z0nx93eqc3
	 P9ZgNQ13gfYN1RgTgs9PSnwj92VF2SV3+8sAtZzgkKudeR+kZDqnjp8LjFG+TnhKt+
	 HB/BJSFsMuiD1I/qodudY3i9eWoOXZ4TeuI5+mUokOW7+WyC2/9qd3aNZBulD+XeX2
	 MkemKvfgWCTvYfjC/EgLPxVVVMgqkU67Zb12VxsXeroNhBMp6LqOi9iT1jq9B16ybX
	 vNsXpUJy9lr7dXqGBpstSKYoHtRIBx4Jla4UbTrCBP9t5uejswUwFc5gykYM392pAx
	 cBeJmzPvhU3lw==
Date: Fri, 1 Jul 2022 19:51:58 +0300
From: Mike Rapoport <rppt@kernel.org>
To: "guanghui.fgh" <guanghuifeng@linux.alibaba.com>
Cc: baolin.wang@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
	akpm@linux-foundation.org, david@redhat.com, jianyong.wu@arm.com,
	james.morse@arm.com, quic_qiancai@quicinc.com,
	christophe.leroy@csgroup.eu, jonathan@marek.ca,
	mark.rutland@arm.com, thunder.leizhen@huawei.com,
	anshuman.khandual@arm.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, geert+renesas@glider.be,
	ardb@kernel.org, linux-mm@kvack.org, yaohongbo@linux.alibaba.com,
	alikernel-developer@linux.alibaba.com
Subject: Re: [PATCH v3] arm64: mm: fix linear mapping mem access performance
 degradation
Message-ID: <Yr8mLtu7hjQeFprD@kernel.org>
References: <1656586222-98555-1-git-send-email-guanghuifeng@linux.alibaba.com>
 <Yr2pT8SLznI6beqS@kernel.org>
 <f8ee2f3f-d0ed-3291-ec04-f7f754ab1931@linux.alibaba.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <f8ee2f3f-d0ed-3291-ec04-f7f754ab1931@linux.alibaba.com>
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1656694338;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=aOnEMSRcOoMmtU7wHRd04QjRoz4Wjl+98EKgzQatXYI=;
	b=kBFDv5iXDR+svZ/OxdsHpHj7auwNi+ghQnnGK8YnuJx/tLZjkqpgVxWjkRj2XXijGMvO1O
	5Au/6CE0aHcjYmh7750k/v35TwHZqhcrtZfsMuptPeCxySWyngskh/+ia5hPz16s6KogZg
	gb33BfukFoCoal0Abdj5nBn4kPRSdLc=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656694338; a=rsa-sha256;
	cv=none;
	b=zaFSXfp9mDSh4bK5VM4ze+NnX5/RWEfoF9jfhNJiKoiB+ZCv2ZHP2OGBA6w/lRne5EUQD+
	mArHhe78+KrlJUjbDXNP2DltaxvJXswFxIGak8Kg3EVl87v6yhn6GlpUtG6OfDHn+paMKX
	MveMj+G/FBvSfL3+0RbHw48l2l3hD8Y=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=LCVg8pYO;
	dmarc=pass (policy=none) header.from=kernel.org;
	spf=pass (imf21.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org
X-Stat-Signature: b1yhgt1sgisjk5mpw1cbfu9178ci7cz6
X-Rspam-User: 
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=kernel.org header.s=k20201202 header.b=LCVg8pYO;
	dmarc=pass (policy=none) header.from=kernel.org;
	spf=pass (imf21.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org
X-Rspamd-Server: rspam06
X-Rspamd-Queue-Id: B760B1C003F
X-HE-Tag: 1656694338-663529
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Jul 01, 2022 at 12:36:00PM +0800, guanghui.fgh wrote:
> Thanks.
> 
> 在 2022/6/30 21:46, Mike Rapoport 写道:
> > Hi,
> > 
> > On Thu, Jun 30, 2022 at 06:50:22PM +0800, Guanghui Feng wrote:
> > > The arm64 can build 2M/1G block/sectiion mapping. When using DMA/DMA32 zone
> > > (enable crashkernel, disable rodata full, disable kfence), the mem_map will
> > > use non block/section mapping(for crashkernel requires to shrink the region
> > > in page granularity). But it will degrade performance when doing larging
> > > continuous mem access in kernel(memcpy/memmove, etc).
> > > 
> > > There are many changes and discussions:
> > > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for
> > > platforms with no DMA memory zones")
> > > commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into
> > > mem_init()")
> > > commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel
> > > reservation is required")
> > > 
> > > This patch changes mem_map to use block/section mapping with crashkernel.
> > > Firstly, do block/section mapping(normally 2M or 1G) for all avail mem at
> > > mem_map, reserve crashkernel memory. And then walking pagetable to split
> > > block/section mapping to non block/section mapping(normally 4K) [[[only]]]
> > > for crashkernel mem. So the linear mem mapping use block/section mapping
> > > as more as possible. We will reduce the cpu dTLB miss conspicuously, and
> > > accelerate mem access about 10-20% performance improvement.
> > 
> > ...
> > > Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
> > > ---
> > >   arch/arm64/include/asm/mmu.h |   1 +
> > >   arch/arm64/mm/init.c         |   8 +-
> > >   arch/arm64/mm/mmu.c          | 231 ++++++++++++++++++++++++++++++-------------
> > >   3 files changed, 168 insertions(+), 72 deletions(-)
> > 
> > ...
> > 
> > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > index 626ec32..4b779cf 100644
> > > --- a/arch/arm64/mm/mmu.c
> > > +++ b/arch/arm64/mm/mmu.c
> > > @@ -42,6 +42,7 @@
> > >   #define NO_BLOCK_MAPPINGS	BIT(0)
> > >   #define NO_CONT_MAPPINGS	BIT(1)
> > >   #define NO_EXEC_MAPPINGS	BIT(2)	/* assumes FEAT_HPDS is not used */
> > > +#define NO_SEC_REMAPPINGS	BIT(3)	/* rebuild with non block/sec mapping*/
> > >   u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN);
> > >   u64 idmap_ptrs_per_pgd = PTRS_PER_PGD;
> > > @@ -156,11 +157,12 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
> > >   }
> > >   static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
> > > -		     phys_addr_t phys, pgprot_t prot)
> > > +		     phys_addr_t phys, pgprot_t prot, int flags)
> > >   {
> > >   	pte_t *ptep;
> > > -	ptep = pte_set_fixmap_offset(pmdp, addr);
> > > +	ptep = (flags & NO_SEC_REMAPPINGS) ? pte_offset_kernel(pmdp, addr) :
> > > +		pte_set_fixmap_offset(pmdp, addr);
> > >   	do {
> > >   		pte_t old_pte = READ_ONCE(*ptep);
> > > @@ -176,7 +178,8 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end,
> > >   		phys += PAGE_SIZE;
> > >   	} while (ptep++, addr += PAGE_SIZE, addr != end);
> > > -	pte_clear_fixmap();
> > > +	if (!(flags & NO_SEC_REMAPPINGS))
> > > +		pte_clear_fixmap();
> > >   }
> > >   static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> > > @@ -208,16 +211,59 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> > >   		next = pte_cont_addr_end(addr, end);
> > >   		/* use a contiguous mapping if the range is suitably aligned */
> > > -		if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> > > +		if (!(flags & NO_SEC_REMAPPINGS) &&
> > > +		   (((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
> > >   		    (flags & NO_CONT_MAPPINGS) == 0)
> > >   			__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
> > > -		init_pte(pmdp, addr, next, phys, __prot);
> > > +		init_pte(pmdp, addr, next, phys, __prot, flags);
> > >   		phys += next - addr;
> > >   	} while (addr = next, addr != end);
> > >   }
> > > +static void init_pmd_remap(pud_t *pudp, unsigned long addr, unsigned long end,
> > > +			   phys_addr_t phys, pgprot_t prot,
> > > +			   phys_addr_t (*pgtable_alloc)(int), int flags)
> > > +{
> > > +	unsigned long next;
> > > +	pmd_t *pmdp;
> > > +	phys_addr_t map_offset;
> > > +	pmdval_t pmdval;
> > > +
> > > +	pmdp = pmd_offset(pudp, addr);
> > > +	do {
> > > +		next = pmd_addr_end(addr, end);
> > > +
> > > +		if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) {
> > > +			phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT);
> > > +			pmd_clear(pmdp);
> > > +			pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN;
> > > +			if (flags & NO_EXEC_MAPPINGS)
> > > +				pmdval |= PMD_TABLE_PXN;
> > > +			__pmd_populate(pmdp, pte_phys, pmdval);
> > > +			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> > > +
> > > +			map_offset = addr - (addr & PMD_MASK);
> > > +			if (map_offset)
> > > +			    alloc_init_cont_pte(pmdp, addr & PMD_MASK, addr,
> > > +						phys - map_offset, prot,
> > > +						pgtable_alloc,
> > > +						flags & (~NO_SEC_REMAPPINGS));
> > > +
> > > +			if (next < (addr & PMD_MASK) + PMD_SIZE)
> > > +			    alloc_init_cont_pte(pmdp, next,
> > > +					       (addr & PUD_MASK) + PUD_SIZE,
> > > +					        next - addr + phys,
> > > +						prot, pgtable_alloc,
> > > +						flags & (~NO_SEC_REMAPPINGS));
> > > +		}
> > > +		alloc_init_cont_pte(pmdp, addr, next, phys, prot,
> > > +				    pgtable_alloc, flags);
> > > +		phys += next - addr;
> > > +	} while (pmdp++, addr = next, addr != end);
> > > +}
> > 
> > There is still to much duplicated code here and in init_pud_remap().
> > 
> > Did you consider something like this:
> > 
> > void __init map_crashkernel(void)
> > {
> > 	int flags = NO_EXEC_MAPPINGS | NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > 	u64 size;
> > 
> > 	/*
> > 	 * check if crash kernel supported, reserved etc
> > 	 */
> > 
> > 
> > 	size = crashk_res.end + 1 - crashk_res.start;
> > 
> > 	__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
> > 	__create_pgd_mapping(swapper_pg_dir, crashk_res.start,
> > 			     __phys_to_virt(crashk_res.start), size,
> > 			     PAGE_KERNEL, early_pgtable_alloc, flags);
> > }
> > 
> I'm trying do this.
> But I think it's the Inverse Process of mem mapping and also generates
> duplicated code(Boundary judgment, pagetable modify).
> 
> When removing the pgd mapping, it may split pud/pmd section which also needs
> [[[rebuild and clear]]] some pagetable.

Well, __remove_pgd_mapping() is probably an overkill, but
unmap_hotplug_pmd_range() and unmap_hotplug_pud_range() should do, depending
on the size of the crash kernel.

-- 
Sincerely yours,
Mike.