From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31271C0015E for ; Tue, 25 Jul 2023 18:06:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF4E06B0071; Tue, 25 Jul 2023 14:06:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA5646B0075; Tue, 25 Jul 2023 14:06:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96DEE8D0001; Tue, 25 Jul 2023 14:06:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8A5F26B0071 for ; Tue, 25 Jul 2023 14:06:18 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 573F1160EAE for ; Tue, 25 Jul 2023 18:06:18 +0000 (UTC) X-FDA: 81050913636.19.A9E911B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf02.hostedemail.com (Postfix) with ESMTP id DDA1980023 for ; Tue, 25 Jul 2023 18:06:15 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XDJKnAna; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690308376; a=rsa-sha256; cv=none; b=a2xhQFwnbimDHkWGcD9ZhBSw1wyQHrtDhc9bnQfpgVvyNef2l3U6IgWXgisgrJD0c6aS3R dGBjxm9LiNKA1uAXII3y7dup2YUemuGrNq8XRc35elMRNUqQNQ6KvrONgMhc9CjVpHDX+G KGJb/8VuOP3g6xer9iU4afuMlJSqQyc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XDJKnAna; spf=pass (imf02.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690308376; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VyRd3M8p7RXXamr68HvIAH9l9kBVFDDur1Wgkf6GWSc=; b=FWWHnhDng0ZKd/UBWshdeY94Nx4SPmzFpPtletBSMqWO9F3va8rfX7s/MaA7YcveJPQ09v D2oo7RtZe1/eJKA8ThitPCHxx/ommrnrJpTrfZwzGyzPCTuJzVHpdY0zfVaQqYdMw8m6A+ Vn3JoDvU7QREh23ALrj4XempZ5BmEOw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690308374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VyRd3M8p7RXXamr68HvIAH9l9kBVFDDur1Wgkf6GWSc=; b=XDJKnAnaXVKUTNGyGUGt5BfOgC2ohd/7q7FhHvL/ULMBHBWt3rAFREfEpEXA9CUPzBdAY/ vwSNttcskQai/eJhFnvUSImnFW2ZppRRfF/BKb8OEOTN2bd2CEUOHPm7+XFG27ykqkq2WE laTNXMSNFNI7v3jUdQpODDUgj0jGwZs= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-622-TevEU_kUMcKYcR23E6T6Jw-1; Tue, 25 Jul 2023 14:06:13 -0400 X-MC-Unique: TevEU_kUMcKYcR23E6T6Jw-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3fa96b67ac1so34281535e9.0 for ; Tue, 25 Jul 2023 11:06:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690308372; x=1690913172; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=VyRd3M8p7RXXamr68HvIAH9l9kBVFDDur1Wgkf6GWSc=; b=Hw7vFdSh7t+XWfHze5jS0hCvjdB8cWIiouMwx3vZmIjv5QFMGAlHkcz1BPI2R1tXDZ XsnXjBxE3xtyvGD7Aw8IR49C9gbhUn5elkLxM5/SrkBa9JoGxCO6mtBMG7b9L6fJsvYM 8NLTQsTrY2xU1i+n+pn5tpS1rocsxqah6WbWPSGXeJo4D5ze4nHUl67HJY9Y2KoD9XZC IrjG3Qsamd5XE7pTBhVKjEToTRf/AKzyiGnXRY0nuqADn9xethexqSC2QCP8dZ25Wwmk mxNoMDLCf0rIGcbJ8AOGEonHPqBwzata3xqaqWUNv3teV4YVedHs1/9gYy/Qp6Q4Kt1L wYpw== X-Gm-Message-State: ABy/qLZj1/D2Lkxt+XBqumqoKHV+qpQGc/SDjsFvQbvlr2SMt2pPUkSl 805KN88iPr2OzzD8xhg4zS+RZJVfsBjiX/5+GSA59VV7DIFGKnXC1T8KS8wWRlskB/LeVTDB1je 2KKm7mnijNTU= X-Received: by 2002:a7b:ce12:0:b0:3fc:25:ced6 with SMTP id m18-20020a7bce12000000b003fc0025ced6mr11324578wmc.13.1690308372260; Tue, 25 Jul 2023 11:06:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlEfbMalZnC5jaLstg+fMvdW4Ua3uzmPfgZdS36IlfqNJbN2At02wxkl8lUBzKqR/39Zu56B+A== X-Received: by 2002:a7b:ce12:0:b0:3fc:25:ced6 with SMTP id m18-20020a7bce12000000b003fc0025ced6mr11324558wmc.13.1690308371757; Tue, 25 Jul 2023 11:06:11 -0700 (PDT) Received: from ?IPV6:2003:cb:c73f:e900:3b0d:87a6:2953:20d1? (p200300cbc73fe9003b0d87a6295320d1.dip0.t-ipconnect.de. [2003:cb:c73f:e900:3b0d:87a6:2953:20d1]) by smtp.gmail.com with ESMTPSA id n19-20020a7bc5d3000000b003fc02e8ea68sm16835585wmk.13.2023.07.25.11.06.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 Jul 2023 11:06:11 -0700 (PDT) Message-ID: Date: Tue, 25 Jul 2023 20:06:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 To: "Aneesh Kumar K.V" , linux-mm@kvack.org, akpm@linux-foundation.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com, christophe.leroy@csgroup.eu Cc: Oscar Salvador , Michal Hocko , Vishal Verma References: <20230725100212.531277-1-aneesh.kumar@linux.ibm.com> <20230725100212.531277-5-aneesh.kumar@linux.ibm.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v5 4/7] mm/hotplug: Support memmap_on_memory when memmap is not aligned to pageblocks In-Reply-To: <20230725100212.531277-5-aneesh.kumar@linux.ibm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: DDA1980023 X-Stat-Signature: 3bzrjc74qun95kswwyk5c5runzb3qu9p X-Rspam-User: X-HE-Tag: 1690308375-463366 X-HE-Meta: U2FsdGVkX1/FlY33Bh7WK5BEL0Opwb9rsvv1KVFzwosh3KJ6ccGvV/YQEzObkPFevoRTVt6rgAVM0JxhJsv33nQpZHpNsF5o8W/cChR2MCliBDlkrqEZyc0jx/RQYBtpLr7KMOOVf25ZFXPp0sWhqBZdmutENo2PZOTWnR/4A7dLe1DVvbFRjdgaqurXX26HLWSWqqXtNcTeyeKxhj1fB2RJlfak0FtBL+lJMtNaw8xoXjNlBozlaRgKUumyNrn1KSLeccFHXEAaxSDoYF+P9Q5DElY3/H1BEJ2vAq5FJLqfpA8+Z8j4wAaQd1+pSRkmVdBtBIcBgta51Sg1BBnTeGrLGTmUSiHqEaECcE0Ou52EQP7mEVX+3GK40A4HEvGXc7KC/AGRSaShv0nR9QUXculkFQyNRUB5pPSkidW+te0xFSVZsBQXMB7/AWNHCl1Sx4j39KDgVS9fbBr6OkzFThhrNocOQH10GCrrDYqdFEEGKuZmoGhzHa/R9QJyNm88kygi6KdwRIZQC0KVLb4Wfr2l4p6oT//VZnk2vBCYGMcnzsIoyHnvBNjM0/oaU4/WZH7zTk+sOGkq4+gxM+Mco8GUlt4VzLb1fko1f0T2hN9iVqei/eDNHhauoZX3Qec2X6gy3wg7BlJ5wSZRNbHSUnckBh1zHFIQsVAI7FL9T7nhhl8OcAcksgSGtZZohPPnP4Pr6Tw8mhv6zTSP5fW9kb56eo364clv8bHI0uKS3hlZeKgKMhl/wNiQybVRvWHkPeqwfpd37Nk3LHaMw2WybCxGGxBizSkwxnNzcDvlBaKLona+1zNgiEDNefI/3GS8wcLYOG0kmgGoEyR0T3iKqxA0SaLn6t+n8ctjY93RuuVhbLq/zcebLE4kVmpjNNDdA0AWiJDaORoU6RtwWMitQ7tZvoXQR0CwlMJoK+wuR3YL2HCpPFN6arf6S3S8rmEC6nLZw1S6xrYqW0zio4V qCUip+RD 7Y2R7IO5pJETVQpMoERpn8Q3/ZZiwYoY3TsAkgtOWah4PcN9An05MLotuJeVkUE3tA2scd9kOXD2SqDLHJ2wvi4mWNddQp+Op5D3c48rPmJODqc8wFm+TGgvDkX2+6+x/IcKsjXHvas5Z1qHXSqTl/rhEBzH+FhjrIOMXbsXB7bgTC/MlCEo+VDFYuBio1GBYfQXY6iRp4Qr8GTQRkbCSv83SHZEzR4axj3RnULI6Lm8qibR4wRWT8KOQW99wZo558zNctFfHpzTF44CVH1tLWIZP4h2+OwVBNp5Z3qy5h9L6UoWXnUtVinXJvpTgNoDaTJ42e5fny82mrAcaekzrzuu5gAG5CNbn1NOUWGtFVaf9VDwqqcuMfg+POI/uFd11nWEwUrOgYcjheo1rsqmpQsVgl95aFMLGrXKzX65ENb0Kl5eNv0ijb/vusufwq91Ggo2OqxNQeIBD8rKO8owKY36HFO3hCaRF5/G2dZZuXhNuWO7IH9iEqS7/6kC88pvslOvbdAa06iaNNtbz0lIe4TrI4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25.07.23 12:02, Aneesh Kumar K.V wrote: > Currently, memmap_on_memory feature is only supported with memory block > sizes that result in vmemmap pages covering full page blocks. This is > because memory onlining/offlining code requires applicable ranges to be > pageblock-aligned, for example, to set the migratetypes properly. > > This patch helps to lift that restriction by reserving more pages than > required for vmemmap space. This helps the start address to be page > block aligned with different memory block sizes. Using this facility > implies the kernel will be reserving some pages for every memoryblock. > This allows the memmap on memory feature to be widely useful with > different memory block size values. > > For ex: with 64K page size and 256MiB memory block size, we require 4 > pages to map vmemmap pages, To align things correctly we end up adding a > reserve of 28 pages. ie, for every 4096 pages 28 pages get reserved. > > Signed-off-by: Aneesh Kumar K.V > --- > .../admin-guide/mm/memory-hotplug.rst | 12 ++ > mm/memory_hotplug.c | 121 ++++++++++++++++-- > 2 files changed, 119 insertions(+), 14 deletions(-) > > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst > index bd77841041af..2994958c7ce8 100644 > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > @@ -433,6 +433,18 @@ The following module parameters are currently defined: > memory in a way that huge pages in bigger > granularity cannot be formed on hotplugged > memory. > + > + With value "force" it could result in memory > + wastage due to memmap size limitations. For > + example, if the memmap for a memory block > + requires 1 MiB, but the pageblock size is 2 > + MiB, 1 MiB of hotplugged memory will be wasted. > + Note that there are still cases where the > + feature cannot be enforced: for example, if the > + memmap is smaller than a single page, or if the > + architecture does not support the forced mode > + in all configurations. > + > ``online_policy`` read-write: Set the basic policy used for > automatic zone selection when onlining memory > blocks without specifying a target zone. > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 457824a6ecb8..5b472e137898 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -41,17 +41,89 @@ > #include "internal.h" > #include "shuffle.h" > > +enum { > + MEMMAP_ON_MEMORY_DISABLE = 0, > + MEMMAP_ON_MEMORY_ENABLE, > + MEMMAP_ON_MEMORY_FORCE, > +}; > + > +static int memmap_mode __read_mostly = MEMMAP_ON_MEMORY_DISABLE; > + > +static inline unsigned long memory_block_memmap_pages(void) > +{ > + unsigned long memmap_size; > + > + memmap_size = PHYS_PFN(memory_block_size_bytes()) * sizeof(struct page); > + return memmap_size >> PAGE_SHIFT; I'd really move a !page variant (memory_block_memmap_size()) to the previous patch and use it in mhp_supports_memmap_on_memory() and arch_supports_memmap_on_memory(). Then, in this patch, reuse that function in memory_block_memmap_on_memory_pages() and ... > +} > + > +static inline unsigned long memory_block_memmap_on_memory_pages(void) > +{ > + unsigned long nr_pages = memory_block_memmap_pages(); ... do here a nr_pages = PHYS_PFN(memory_block_memmap_size()); Conceptually, it would be even cleaner to have here nr_pages = PFN_UP(memory_block_memmap_size()); even though one can argue that mhp_supports_memmap_on_memory() will make sure that the unaligned value (memory_block_memmap_size()) covers full pages, but at least to me it looks cleaner that way. No strong opinion. > + > + /* > + * In "forced" memmap_on_memory mode, we add extra pages to align the > + * vmemmap size to cover full pageblocks. That way, we can add memory > + * even if the vmemmap size is not properly aligned, however, we might waste > + * memory. > + */ > + if (memmap_mode == MEMMAP_ON_MEMORY_FORCE) > + return pageblock_align(nr_pages); > + return nr_pages; > +} > + > #ifdef CONFIG_MHP_MEMMAP_ON_MEMORY > /* > * memory_hotplug.memmap_on_memory parameter > */ > -static bool memmap_on_memory __ro_after_init; > -module_param(memmap_on_memory, bool, 0444); > -MODULE_PARM_DESC(memmap_on_memory, "Enable memmap on memory for memory hotplug"); > +static int set_memmap_mode(const char *val, const struct kernel_param *kp) > +{ > + int ret, mode; > + bool enabled; > + > + if (sysfs_streq(val, "force") || sysfs_streq(val, "FORCE")) { > + mode = MEMMAP_ON_MEMORY_FORCE; > + goto matched; > + } Avoid the goto + label } else { ret = kstrtobool(val, &enabled); ... } *((int *)kp->arg) = mode; > + > + ret = kstrtobool(val, &enabled); > + if (ret < 0) > + return ret; > + if (enabled) > + mode = MEMMAP_ON_MEMORY_ENABLE; > + else > + mode = MEMMAP_ON_MEMORY_DISABLE; > + > +matched: > + *((int *)kp->arg) = mode; > + if (mode == MEMMAP_ON_MEMORY_FORCE) { > + unsigned long memmap_pages = memory_block_memmap_on_memory_pages(); > + > + pr_info("Memory hotplug will reserve %ld pages in each memory block\n", > + memmap_pages - memory_block_memmap_pages()); pr_info_once() ? > + } > + return 0; > +} > + [...] > /* > * Besides having arch support and the feature enabled at runtime, we > @@ -1294,10 +1366,28 @@ static bool mhp_supports_memmap_on_memory(unsigned long size) > * altmap as an alternative source of memory, and we do not exactly > * populate a single PMD. > */ > - return mhp_memmap_on_memory() && > - size == memory_block_size_bytes() && > - IS_ALIGNED(remaining_size, (pageblock_nr_pages << PAGE_SHIFT)) && > - arch_supports_memmap_on_memory(size); > + if (!mhp_memmap_on_memory() || size != memory_block_size_bytes()) > + return false; > + > + /* > + * Make sure the vmemmap allocation is fully contained > + * so that we always allocate vmemmap memory from altmap area. > + */ > + if (!IS_ALIGNED(vmemmap_size, PAGE_SIZE)) > + return false; > + > + /* > + * start pfn should be pageblock_nr_pages aligned for correctly > + * setting migrate types > + */ > + if (!pageblock_aligned(memmap_pages)) > + return false; > + > + if (memmap_pages == PHYS_PFN(memory_block_size_bytes())) > + /* No effective hotplugged memory doesn't make sense. */ > + return false; > + > + return arch_supports_memmap_on_memory(size); > } > > /* > @@ -1310,7 +1400,10 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) > { > struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) }; > enum memblock_flags memblock_flags = MEMBLOCK_NONE; > - struct vmem_altmap mhp_altmap = {}; > + struct vmem_altmap mhp_altmap = { > + .base_pfn = PHYS_PFN(res->start), > + .end_pfn = PHYS_PFN(res->end), Is it required to set .end_pfn, and if so, shouldn't we also set it to base_pfn + memory_block_memmap_on_memory_pages()) ? We also don't set it on the try_remove_memory() part,. With these things addressed, feel free to add Acked-by: David Hildenbrand -- Cheers, David / dhildenb