From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39F91C433E0 for ; Wed, 17 Mar 2021 11:20:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CBEE564F64 for ; Wed, 17 Mar 2021 11:20:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CBEE564F64 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5E3556B0070; Wed, 17 Mar 2021 07:20:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5948C6B0071; Wed, 17 Mar 2021 07:20:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 40FEB6B0073; Wed, 17 Mar 2021 07:20:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 256D16B0070 for ; Wed, 17 Mar 2021 07:20:24 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D00A5180AD82F for ; Wed, 17 Mar 2021 11:20:23 +0000 (UTC) X-FDA: 77929122726.25.3562A50 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 49755200038C for ; Wed, 17 Mar 2021 11:20:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615980022; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QaEGLo4X+NXL8tELF4RXIvdfb6RpjHhM2xKHX0rYoII=; b=eRpkPngK4+BG1H7lUe73NJJDxiVf2Ocx/7rbcfkPYmP/7JErzSOU5OF76KoRpos8G4BhaD QPiyiUX7i09RUklsynEVmFCYM6zqXkkbEg/HEuzcqFCyZFwhFWPV2zRW4KenDIjBUMPIYF cAQwvBRSQBQNZt1y4Vy0BHi94aXqykI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-151-R_kdP-m1Nz2gWLDFCNTusQ-1; Wed, 17 Mar 2021 07:20:20 -0400 X-MC-Unique: R_kdP-m1Nz2gWLDFCNTusQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 09D6F83DD21; Wed, 17 Mar 2021 11:20:19 +0000 (UTC) Received: from [10.36.112.124] (ovpn-112-124.ams2.redhat.com [10.36.112.124]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8683A5D9C0; Wed, 17 Mar 2021 11:20:16 +0000 (UTC) Subject: Re: [RFC] mm: Enable generic pfn_valid() to handle early sections with memmap holes To: Anshuman Khandual , linux-mm@kvack.org Cc: Russell King , Catalin Marinas , Will Deacon , Andrew Morton , Mike Rapoport , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <1615174073-10520-1-git-send-email-anshuman.khandual@arm.com> <745496f5-e099-8780-e42e-f347b55e8476@redhat.com> <72902ace-5f00-b484-aa71-e6841fb7d082@arm.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <2541b182-1f1c-c1ed-c15c-9d71160eb6fe@redhat.com> Date: Wed, 17 Mar 2021 12:20:15 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <72902ace-5f00-b484-aa71-e6841fb7d082@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 49755200038C X-Stat-Signature: 4g3bn764ti4gjo1c9gd945eetrpbh8ch Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615980023-876071 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 11.03.21 05:29, Anshuman Khandual wrote: >=20 >=20 > On 3/8/21 2:07 PM, David Hildenbrand wrote: >> On 08.03.21 04:27, Anshuman Khandual wrote: >>> Platforms like arm and arm64 have redefined pfn_valid() because their= early >>> memory sections might have contained memmap holes caused by memblock = areas >>> tagged with MEMBLOCK_NOMAP, which should be skipped while validating = a pfn >>> for struct page backing. This scenario could be captured with a new o= ption >>> CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES and then generic pfn_valid() c= an be >>> improved to accommodate such platforms. This reduces overall code foo= tprint >>> and also improves maintainability. >>> >>> Commit 4f5b0c178996 ("arm, arm64: move free_unused_memmap() to generi= c mm") >>> had used CONFIG_HAVE_ARCH_PFN_VALID to gate free_unused_memmap(), whi= ch in >>> turn had expanded its scope to new platforms like arc and m68k. Rathe= r lets >>> restrict back the scope for free_unused_memmap() to arm and arm64 pla= tforms >>> using this new config option i.e CONFIG_HAVE_EARLY_SECTION_MEMMAP. >>> >>> While here, it exports the symbol memblock_is_map_memory() to build d= rivers >>> that depend on pfn_valid() but does not have the required visibility.= After >>> this new config is in place, just drop CONFIG_HAVE_ARCH_PFN_VALID fro= m both >>> arm and arm64 platforms. >>> >>> Cc: Russell King >>> Cc: Catalin Marinas >>> Cc: Will Deacon >>> Cc: Andrew Morton >>> Cc: Mike Rapoport >>> Cc: David Hildenbrand >>> Cc: linux-arm-kernel@lists.infradead.org >>> Cc: linux-kernel@vger.kernel.org >>> Cc: linux-mm@kvack.org >>> Suggested-by: David Hildenbrand >>> Signed-off-by: Anshuman Khandual >>> --- >>> This applies on 5.12-rc2 along with arm64 pfn_valid() fix patches [1]= and >>> has been lightly tested on the arm64 platform. The idea to represent = this >>> unique situation on the arm and arm64 platforms with a config option = was >>> proposed by David H during an earlier discussion [2]. This still does= not >>> build on arm platform due to pfn_valid() resolution errors. Nonethele= ss >>> wanted to get some early feedback whether the overall approach here, = is >>> acceptable or not. >> >> It might make sense to keep the arm variant for now. The arm64 variant= is where the magic happens and where we missed updates when working on t= he generic variant. >=20 > Sure, will drop the changes on arm. >=20 >> >> The generic variant really only applies to 64bit targets where we have= SPARSEMEM. See x86 as an example. >=20 > Okay. >=20 >> >> [...] >> >>> =C2=A0 /* >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>> index 47946cec7584..93532994113f 100644 >>> --- a/include/linux/mmzone.h >>> +++ b/include/linux/mmzone.h >>> @@ -1409,8 +1409,23 @@ static inline int pfn_section_valid(struct mem= _section *ms, unsigned long pfn) >>> =C2=A0 } >>> =C2=A0 #endif >>> =C2=A0 +bool memblock_is_map_memory(phys_addr_t addr); >>> + >>> =C2=A0 #ifndef CONFIG_HAVE_ARCH_PFN_VALID >>> =C2=A0 static inline int pfn_valid(unsigned long pfn) >>> +{ >>> +=C2=A0=C2=A0=C2=A0 phys_addr_t addr =3D PFN_PHYS(pfn); >>> + >>> +=C2=A0=C2=A0=C2=A0 /* >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Ensure the upper PAGE_SHIFT bits are clea= r in the >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * pfn. Else it might lead to false positive= s when >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * some of the upper bits are set, but the l= ower bits >>> +=C2=A0=C2=A0=C2=A0=C2=A0 * match a valid pfn. >>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> +=C2=A0=C2=A0=C2=A0 if (PHYS_PFN(addr) !=3D pfn) >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >> >> I think this should be fine for other archs as well. >> >>> + >>> +#ifdef CONFIG_SPARSEMEM >> >> Why do we need the ifdef now? If that's to cover the arm case, then pl= ease consider the arm64 case only for now. >=20 > Yes, it is not needed. >=20 >> >>> =C2=A0 { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct mem_section *ms; >>> =C2=A0 @@ -1423,7 +1438,14 @@ static inline int pfn_valid(unsigned l= ong pfn) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Traditionally early sections = always returned pfn_valid() for >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * the entire section-sized span= . >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> -=C2=A0=C2=A0=C2=A0 return early_section(ms) || pfn_section_valid(ms,= pfn); >>> +=C2=A0=C2=A0=C2=A0 if (early_section(ms)) >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return IS_ENABLED(CONFIG_= HAVE_EARLY_SECTION_MEMMAP_HOLES) ? >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 m= emblock_is_map_memory(pfn << PAGE_SHIFT) : 1; >>> + >>> +=C2=A0=C2=A0=C2=A0 return pfn_section_valid(ms, pfn); >>> +} >>> +#endif >>> +=C2=A0=C2=A0=C2=A0 return 1; >>> =C2=A0 } >>> =C2=A0 #endif >>> =C2=A0 diff --git a/mm/Kconfig b/mm/Kconfig >>> index 24c045b24b95..0ec20f661b3f 100644 >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -135,6 +135,16 @@ config HAVE_FAST_GUP >>> =C2=A0 config ARCH_KEEP_MEMBLOCK >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bool >>> =C2=A0 +config HAVE_EARLY_SECTION_MEMMAP_HOLES >>> +=C2=A0=C2=A0=C2=A0 depends on ARCH_KEEP_MEMBLOCK && SPARSEMEM_VMEMMA= P >>> +=C2=A0=C2=A0=C2=A0 def_bool n >>> +=C2=A0=C2=A0=C2=A0 help >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Early sections on certain platforms m= ight have portions which are >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 not backed with struct page mapping a= s their memblock entries are >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 marked with MEMBLOCK_NOMAP. When subs= cribed, this option enables >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 specific handling for those memory se= ctions in certain situations >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 such as pfn_valid(). >>> + >>> =C2=A0 # Keep arch NUMA mapping infrastructure post-init. >>> =C2=A0 config NUMA_KEEP_MEMINFO >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bool >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index afaefa8fc6ab..d9fa2e62ab7a 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -1744,6 +1744,7 @@ bool __init_memblock memblock_is_map_memory(phy= s_addr_t addr) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return false; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return !memblock_is_nomap(&memblock.m= emory.regions[i]); >>> =C2=A0 } >>> +EXPORT_SYMBOL(memblock_is_map_memory); >>> =C2=A0 =C2=A0 int __init_memblock memblock_search_pfn_nid(unsigned l= ong pfn, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 unsigned long *start_pfn, unsigned long *end_pfn) >>> @@ -1926,7 +1927,7 @@ static void __init free_unused_memmap(void) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned long start, end, prev_end =3D= 0; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int i; >>> =C2=A0 -=C2=A0=C2=A0=C2=A0 if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALI= D) || >>> +=C2=A0=C2=A0=C2=A0 if (!IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_= HOLES) || >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CO= NFIG_SPARSEMEM_VMEMMAP)) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return; >>> =20 >> >> With >> >> commit 1f90a3477df3ff1a91e064af554cdc887c8f9e5e >> Author: Dan Williams >> Date:=C2=A0=C2=A0 Thu Feb 25 17:17:05 2021 -0800 >> >> =C2=A0=C2=A0=C2=A0 mm: teach pfn_to_online_page() about ZONE_DEVICE s= ection collisions >> >> (still in -next I think) >=20 > Already has merged mainline. >=20 >> >> You'll also have to take care of pfn_to_online_page(). >> >=20 > Something like this would work ? >=20 > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 5ba51a8bdaeb..19812deb807f 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -309,6 +309,11 @@ struct page *pfn_to_online_page(unsigned long pfn) > * Save some code text when online_section() + > * pfn_section_valid() are sufficient. > */ > + if (IS_ENABLED(CONFIG_HAVE_EARLY_SECTION_MEMMAP_HOLES)) { > + if (early_section(ms) && !memblock_is_map_memory(PFN_PH= YS(pfn))) > + return NULL; > + } > + > if (IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) && !pfn_valid(pfn)) > return NULL; >=20 Sorry for the late reply, just stumbled over this again. I think, yes. I do wonder if we then still need the=20 CONFIG_HAVE_ARCH_PFN_VALID handling below - are there any custom=20 pfn_valid() implementation with SPARSE remaining? I don't think so. --=20 Thanks, David / dhildenb