From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC53BC2D0E4 for ; Thu, 19 Nov 2020 10:48:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 31D3B246EF for ; Thu, 19 Nov 2020 10:48:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31D3B246EF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 568C56B005C; Thu, 19 Nov 2020 05:48:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5196C6B005D; Thu, 19 Nov 2020 05:48:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BB456B0068; Thu, 19 Nov 2020 05:48:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0117.hostedemail.com [216.40.44.117]) by kanga.kvack.org (Postfix) with ESMTP id 0E2A36B005C for ; Thu, 19 Nov 2020 05:48:52 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A1394180AD822 for ; Thu, 19 Nov 2020 10:48:51 +0000 (UTC) X-FDA: 77500844862.12.rat78_2207e7e27342 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 7E3081802B8CE for ; Thu, 19 Nov 2020 10:48:51 +0000 (UTC) X-HE-Tag: rat78_2207e7e27342 X-Filterd-Recvd-Size: 6089 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Thu, 19 Nov 2020 10:48:50 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 85D91AA4F; Thu, 19 Nov 2020 10:48:49 +0000 (UTC) Date: Thu, 19 Nov 2020 11:48:47 +0100 From: Oscar Salvador To: David Hildenbrand Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com Subject: Re: [RFC PATCH 3/3] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: <20201119104847.GA5281@localhost.localdomain> References: <20201022125835.26396-1-osalvador@suse.de> <20201022125835.26396-4-osalvador@suse.de> <3cc37927-538e-ae7d-27bc-45aaabe06b3a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3cc37927-538e-ae7d-27bc-45aaabe06b3a@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 17, 2020 at 04:38:52PM +0100, David Hildenbrand wrote: > Sorry for the late replay, fairly busy with all kinds of things. Heh, no worries, I appreciate the time :-) > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > index b02fd51e5589..6b57bf90ca72 100644 > > --- a/drivers/acpi/acpi_memhotplug.c > > +++ b/drivers/acpi/acpi_memhotplug.c > > @@ -195,7 +195,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) > > node = memory_add_physaddr_to_nid(info->start_addr); > > result = __add_memory(node, info->start_addr, info->length, > > - MHP_NONE); > > + MEMHP_MEMMAP_ON_MEMORY); > > I'd suggest moving that into a separate patch. Fine by me > > diff --git a/include/linux/memory.h b/include/linux/memory.h > > index 439a89e758d8..7cc93de5856c 100644 > > --- a/include/linux/memory.h > > +++ b/include/linux/memory.h > > @@ -30,6 +30,7 @@ struct memory_block { > > int phys_device; /* to which fru does this belong? */ > > struct device dev; > > int nid; /* NID for this memory block */ > > + unsigned long nr_vmemmap_pages; /* Number for vmemmap pages */ > > Maybe also document that these pages are directly at the beginning of the > memory block. Sure > > -static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > > +static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages) > > { > > return -EINVAL; > > } > > @@ -369,10 +372,12 @@ extern int add_memory_driver_managed(int nid, u64 start, u64 size, > > mhp_t mhp_flags); > > extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > > unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages, > > struct vmem_altmap *altmap, int migratetype); > > extern void remove_pfn_range_from_zone(struct zone *zone, > > unsigned long start_pfn, > > - unsigned long nr_pages); > > + unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages); > > I think we should not pass nr_vmemmap_pages down here but instead do two > separate calls to move_pfn_range_to_zone()/remove_pfn_range_from_zone() from > online_pages()/offline_pages() > > 1. for vmemmap pages, migratetype = MIGRATE_UNMOVABLE > 2. for remaining pages, migratetype = MIGRATE_ISOLATE Ok, that was the other option, it might be even cleaner. > > + valid_start_pfn = pfn + nr_vmemmap_pages; > > + valid_nr_pages = nr_pages - nr_vmemmap_pages; > > Hm, valid sounds strange. More like "free_start_pfn" or "buddy_start_pfn". Agreed, I might lean towards buddy_start_pfn. > > - move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); > > + move_pfn_range_to_zone(zone, pfn, nr_pages, nr_vmemmap_pages, NULL, > > + MIGRATE_ISOLATE); > > As mentioned, I'd suggest properly initializing the memmap here > > if (nr_vmemmap_pages) { > move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL, > MIGRATE_UNMOVABLE); > } > move_pfn_range_to_zone(zone, valid_start_pfn, valid_nr_pages, NULL, Sure, agreed. > > + if (!support_memmap_on_memory(size)) > > + mhp_flags &= ~MEMHP_MEMMAP_ON_MEMORY; > > Callers (e.g., virtio-mem) might rely on this. We should reject this with > -EINVAL and provide a way for callers to test whether this flag is possible. Uhm, we might want to make "support_memmap_on_memory" public, and callers who might want to it use can check its return value? Or do you have something else in mind? Agreed on the -EINVAIL. > > + if (mhp_flags & MEMHP_MEMMAP_ON_MEMORY) > > + mhp_mark_vmemmap_pages(params.altmap); > > Do we really still need that? Pages are offline, so we're messing with an > invalid memmap. online_pages() should handle initializing the memmap of > these pages. Yeah, on a second thought we do not need this. Since the pages are still offline, no one should be messing with that range yet anyway. > > [...] > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index e74ca22baaa1..043503fb8c6e 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -8761,6 +8761,13 @@ void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > > spin_lock_irqsave(&zone->lock, flags); > > while (pfn < end_pfn) { > > page = pfn_to_page(pfn); > > + /* > > + * Skip vmemmap pages > > + */ > > + if (PageVmemmap(page)) { > > + pfn += vmemmap_nr_pages(page); > > + continue; > > + } > > I'd assume calling code can handle that and exclude isolating such pages. The thing is that __offline_isolated_pages calls offline_mem_sections(), so we really need the first pfn, and not the "pfn + nr_vmemmap_pages". Instead of skipping it in the loop, I might just skip it before entering the loop. Thanks! -- Oscar Salvador SUSE L3