From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00614C433B4 for ; Fri, 16 Apr 2021 07:26:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7234561152 for ; Fri, 16 Apr 2021 07:26:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7234561152 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0327E6B0036; Fri, 16 Apr 2021 03:26:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 008C76B006C; Fri, 16 Apr 2021 03:26:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E12976B0070; Fri, 16 Apr 2021 03:26:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id C4C4D6B0036 for ; Fri, 16 Apr 2021 03:26:00 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 796981802EC35 for ; Fri, 16 Apr 2021 07:26:00 +0000 (UTC) X-FDA: 78037396080.20.5F0596B Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf17.hostedemail.com (Postfix) with ESMTP id 8ED7E40002C8 for ; Fri, 16 Apr 2021 07:25:57 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C30C5AD22; Fri, 16 Apr 2021 07:25:58 +0000 (UTC) Date: Fri, 16 Apr 2021 09:25:56 +0200 From: Oscar Salvador To: David Hildenbrand Cc: Andrew Morton , Michal Hocko , Anshuman Khandual , Pavel Tatashin , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: References: <20210408121804.10440-1-osalvador@suse.de> <20210408121804.10440-5-osalvador@suse.de> <54bed4d3-631f-7d30-aa2c-f8dd2f2c6804@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54bed4d3-631f-7d30-aa2c-f8dd2f2c6804@redhat.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8ED7E40002C8 X-Stat-Signature: njqkswren477e5jy6cqijbc7wjci7cgr Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1618557957-347807 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 15, 2021 at 01:19:59PM +0200, David Hildenbrand wrote: > > Implementation wise we will reuse vmem_altmap infrastructure to override > > the default allocator used by __populate_section_memmap. > > Part of the implementation also relies on memory_block structure gaining > > a new field which specifies the number of vmemmap_pages at the beginning. > > This patch also introduces the following functions: > > > > - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(), > > calls kasan_add_zero_shadow() or the vmemmap range and marks > > online as many sections as vmemmap pages fully span. > > - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone > > present_pages > > - vmemmap_deinit_space: Undoes what vmemmap_init_space does. > > > > This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below. Yeah, was not happy either with the names but at that time I could not come up with anything better. > It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count. This should be explained by this: "On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling offline_pages(), because offline_pages() performs the tearing-down of kthreads and the rebuilding of the zonelists if the node/zone become empty." Is not that clear? > I suggest detecting the zone in here and just passing it down to online_pages(). Ok, makes sense. > My take would be doing the present page adjustment after onlining succeeded: > > static int memory_block_online(struct memory_block *mem) > { > unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); > unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; > unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; > struct zone *zone; > int ret; > > zone = mhp_get_target_zone(mem->nid, mem->online_type); > > if (nr_vmemmap_pages) { > ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone); > if (ret) > return ret; > } > > ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone); > if (ret) { > if (nr_vmemmap_pages) > mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); > return ret; > } > > /* > * Account once onlining succeeded. If the page was unpopulated, > * it is now already properly populated. > */ > if (nr_vmemmap_pages) > adjust_present_page_count(zone, nr_vmemmap_pages); > return 0; > } > > And the opposite: > > static int memory_block_offline(struct memory_block *mem) > { > unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); > unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; > unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; > struct zone *zone; > int ret; > > zone = page_zone(pfn_to_page(start_pfn)); > > > /* > * Unaccount before offlining, such that unpopulated zones can > * properly be torn down in offline_pages(). > */ > if (nr_vmemmap_pages) > adjust_present_page_count(zone, -nr_vmemmap_pages); > > ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages); > if (ret) { > if (nr_vmemmap_pages) > adjust_present_page_count(zone, +nr_vmemmap_pages); > return ret; > } > > if (nr_vmemmap_pages) > mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages); > return 0; > } > > Having to do the present page adjustment manually is not completely nice, > but it's easier than manually having to mess with zones becomming populated/unpopulated > outside of online_pages()/offline_pages(). Ok, I like this, and the naming is much better. I will work on this shortly. thanks David! -- Oscar Salvador SUSE L3