From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F30AEC5519F for ; Mon, 30 Nov 2020 09:12:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4668B2074A for ; Mon, 30 Nov 2020 09:12:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4668B2074A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 32C6F6B0036; Mon, 30 Nov 2020 04:12:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DE628D0002; Mon, 30 Nov 2020 04:12:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F38B8D0001; Mon, 30 Nov 2020 04:12:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 093886B0036 for ; Mon, 30 Nov 2020 04:12:41 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C7D5A8249980 for ; Mon, 30 Nov 2020 09:12:40 +0000 (UTC) X-FDA: 77540519280.27.flag48_5d118ed273a0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id AC32A3D663 for ; Mon, 30 Nov 2020 09:12:40 +0000 (UTC) X-HE-Tag: flag48_5d118ed273a0 X-Filterd-Recvd-Size: 5193 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 09:12:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 35F69AC95; Mon, 30 Nov 2020 09:12:39 +0000 (UTC) Date: Mon, 30 Nov 2020 10:12:37 +0100 From: Oscar Salvador To: Michal Hocko Cc: david@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com Subject: Re: [RFC PATCH v2 2/4] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: <20201130091236.GB3825@linux> References: <20201125112048.8211-1-osalvador@suse.de> <20201125112048.8211-3-osalvador@suse.de> <20201127151536.GV31550@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201127151536.GV31550@dhcp22.suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 27, 2020 at 04:15:36PM +0100, Michal Hocko wrote: > > Vmemap page tables can map arbitrary memory. > > That means that we can simply use the beginning of each memory section and > > map struct pages there. > > Did you mean each memory block rather than section? Yes, sorry, I did not update that part. > > struct pages which back the allocated space then just need to be treated > > carefully. > > > > Implementation wise we will reuse vmem_altmap infrastructure to override > > the default allocator used by __populate_section_memmap. Once the memmap is > > allocated, we are going to need a way to mark altmap pfns used for the allocation. > > If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the > > altmap structure in add_memory_resouce(), and then we will call > > mhp_mark_vmemmap_pages() to properly mark those pages. > > > > Online/Offline: > > > > In the memory_block structure, a new field is created in order to > > store the number of vmemmap_pages. > > Is this really needed? We know how many pfns are required for a block of > a specific size, right? > > I have only glanced through the patch so I might be missing something > but I am really wondering why you haven't chosen to use altmap directly > here. Well, this is my bad, I did not update the changelog wrt. to the previous version so it might be confusing. I will make sure to update it for the next submission, but let me explain it here to shed some light. We no longer need to use mhp_mark_vmemap_pages to mark vmemmap pages pages. Prior to online_pages(), the whole range is offline, so no one should be messing with any pages within that range. The initialization of the pages takes places in online_pages(): We have: start_pfn = first_pfn_of_the_range buddy_start_pfn = first_pfn_of_the_range + nr_vmemmap_pages We do have: + if (nr_vmemmap_pages) + move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL, + MIGRATE_UNMOVABLE); + move_pfn_range_to_zone(zone, buddy_start_pfn, buddy_nr_pages, NULL, + MIGRATE_ISOLATE); Now, all the range is initialized and marked PageReserved, but we only send to buddy (buddy_start_pfn, end_pfn]. And so, (start_pfn, buddy_start_pfn - 1] reamins PageReserved. And we know that pfn walkers to skip Reserved pages. About the altmap part. Altmap is used in the hot-add phase in add_memory_resource. The thing is, we could avoid adding the memory_block's nr_vmemmap_pages field , but we would have to mark the vmemmap pages as we used to do in previous implementations (see [1]), but I find this way cleaner, and it adds much less code (previous implementions can be see in [2]), and as a starter I find it much better. > It would be also good to describe how does a pfn walker recognize such a > page? Most of them will simply ignore it but e.g. hotplug walker will > need to skip over those because they are not preventing offlining as > they will go away with the memory block together. Wrt. hotplug walker, same as above, we only care to migrate the (buddy_start_pfn, end_pfn], so the first pfn to migrate and isolate is set to buddy_start_pfn. Other pfns walkers should merely skip vmemmap pages because they are Reserved. > Some basic description of testing done would be suitable as well. Well, that is: - Hot-add memory to a specific numa node - Online memory - numactl -H and /proc/zoneinfo reflects the truth and nr_vmemmap_pages are extracted where they have to be - Start a memory stress program and bind it to the numa node we added memory so we make sure it gets exercised - Wait for a while and when node's free pages have decreased considerably, offline memory - Check that memory went offline and check /proc/zoneinfo and numactl -H again - Hot-remove range [1] https://patchwork.kernel.org/project/linux-mm/patch/20201022125835.26396-3-osalvador@suse.de/ [2] https://patchwork.kernel.org/project/linux-mm/cover/20201022125835.26396-1-osalvador@suse.de/ -- Oscar Salvador SUSE L3