From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E58FDC25B75 for ; Thu, 6 Jun 2024 15:46:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=swoff8QqzpnB9FL21FaedwU1JDs5WXd/t0cQ9vvakzs=; b=bXiUOHwQlwVICw 1g/x07eSLKzWvQ4C5y4whB6cqmfffHlAEqnT4wj80yChcKYcz5sI6RqrbmF5iQgvmTr6+6C66uO7f D6GWlbr+TgpN5bkx8e6tPy+PPnQMMhSK9fD91FCctFcbTGK7NKgFct40b4Qoys2NWocskSJQOFGCm dk3xozmcQv0LvOjDopFw+uH4zompzD3oxdT+AARNuiS6dHUEMt6jryJCWn4BWL1O8eYBqvPJifgn7 SVQEGHQDTQjqqBtYT16YUdNivTwO77S6sejBdIZV2Z4Ynx5D7UYWvF+IE45Gk8Y+LIcA+X4XSd9DW iQ6uQgrZTsc9frnNo5dA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFFKA-0000000AMHN-0swF; Thu, 06 Jun 2024 15:46:46 +0000 Received: from sin.source.kernel.org ([145.40.73.55]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sFFK6-0000000AMGZ-3FeH for linux-arm-kernel@lists.infradead.org; Thu, 06 Jun 2024 15:46:44 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 172C9CE1A5D; Thu, 6 Jun 2024 15:46:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D0B04C32782; Thu, 6 Jun 2024 15:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717688798; bh=tdUd+sQ+AtIbL6VgvDooNlTuDamPXaS/f7IM4yBS8cs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rz9rd/m3A83EsyFNY4S7Uz5IqVPDGC/S90AhVZWkdDgQjz7N4ZWrE17BWcYerPQZe oXrrRIOk0m8KTrOjil4uKjiwnnJ0SJxpysNpZ+UYPxiebkiVYOo4z3yq4YMPTQ9Asi 4Hl4nC+nJUJKL+6GffOKm4PYn877cyf4dcv9Y1gbFFbgROffo0f5pTMIaziPQoWDZc zVWI+IbWV317z4FOIprj3r7BWzK7GBxo1nXtLhq5wV+GoPT8RcyhDHfzKRQDeZOOUe bxf/+Z/ySSygIKBLyU2QJ5slYh9ntF3t6nUVga2duJjn+IWDvWVP5rPAD+7UWNeNct 74a3zVqbtLA8A== Date: Thu, 6 Jun 2024 18:44:34 +0300 From: Mike Rapoport To: David Hildenbrand , Jonathan Cameron , Dan Williams Cc: linux-cxl@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sudeep Holla , Andrew Morton , Will Deacon , Jia He , Mike Rapoport , linuxarm@huawei.com, catalin.marinas@arm.com, Anshuman.Khandual@arm.com, Yuquan Wang , Oscar Salvador , Lorenzo Pieralisi , James Morse Subject: Re: [RFC PATCH 8/8] HACK: mm: memory_hotplug: Drop memblock_phys_free() call in try_remove_memory() Message-ID: References: <20240529171236.32002-1-Jonathan.Cameron@huawei.com> <20240529171236.32002-9-Jonathan.Cameron@huawei.com> <20240531104848.00006a95@Huawei.com> <4f734ec7-6277-4f5e-b291-6186f1eb5ecd@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <4f734ec7-6277-4f5e-b291-6186f1eb5ecd@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240606_084643_471003_B312321F X-CRM114-Status: GOOD ( 57.43 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, May 31, 2024 at 11:55:08AM +0200, David Hildenbrand wrote: > On 31.05.24 11:48, Jonathan Cameron wrote: > > On Fri, 31 May 2024 09:49:32 +0200 > > David Hildenbrand wrote: > > > > > On 29.05.24 19:12, Jonathan Cameron wrote: > > > > I'm not sure what this is balancing, but it if is necessary then the reserved > > > > memblock approach can't be used to stash NUMA node assignments as after the > > > > first add / remove cycle the entry is dropped so not available if memory is > > > > re-added at the same HPA. > > > > > > > > This patch is here to hopefully spur comments on what this is there for! > > > > > > > > Signed-off-by: Jonathan Cameron > > > > --- > > > > mm/memory_hotplug.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > > > index 431b1f6753c0..3d8dd4749dfc 100644 > > > > --- a/mm/memory_hotplug.c > > > > +++ b/mm/memory_hotplug.c > > > > @@ -2284,7 +2284,7 @@ static int __ref try_remove_memory(u64 start, u64 size) > > > > } > > > > if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) { > > > > - memblock_phys_free(start, size); > > > > + // memblock_phys_free(start, size); > > > > memblock_remove(start, size); > > > > } > > > > > > memblock_phys_free() works on memblock.reserved, memblock_remove() works > > > on memblock.memory. > > > > > > If you take a look at the doc at the top of memblock.c: > > > > > > memblock.memory: physical memory available to the system > > > memblock.reserved: regions that were allocated [during boot] > > > > > > > > > memblock.memory is supposed to be a superset of memblock.reserved. Your > > > "hack" here indicates that you somehow would be relying on the opposite > > > being true, which indicates that you are doing the wrong thing. > > > > > > > > > memblock_remove() indeed balances against memblock_add_node() for > > > hotplugged memory [add_memory_resource()]. There seem to a case where we > > > would succeed in hotunplugging memory that was part of "memblock.reserved". > > > > > > But how could that happen? I think the following way: > > > > > > Once the buddy is up and running, memory allocated during early boot is > > > not freed back to memblock, but usually we simply go via something like > > > free_reserved_page(), not memblock_free() [because the buddy took over]. > > > So one could end up unplugging memory that still resides in > > > memblock.reserved set. > > > > > > So with memblock_phys_free(), we are enforcing the invariant that > > > memblock.memory is a superset of memblock.reserved. > > > > > > Likely, arm64 should store that node assignment elsewhere from where it > > > can be queried. Or it should be using something like > > > CONFIG_HAVE_MEMBLOCK_PHYS_MAP for these static windows. > > > > > > > Hi David, > > > > Thanks for the explanation and pointers. I'd rather avoid inventing a parallel > > infrastructure for this (like x86 has for other reasons, but which is also used > > for this purpose). > > Yes, although memblock feels a bit wrong, because it is targeted at managing > actual present memory, not properties about memory that could become present > later. Right now memblock.reserved can have ranges that are not actually present, but I still have doubts about using it to contain memory ranges that could be hotplugged and I'm more leaning against it. In general memblock as an infrastructure for dealing with memory ranges and their properties seems to be a good fit, so either memblock.reserved or a new memblock_type can be used to implement phys_to_target_node(). memblock.reserved maybe less suitable than a new data structure because unlike x86::numa_reserved_meminfo it does manage already present memory for the most part while x86::numa_reserved_meminfo contains regions that do not overlap with RAM. Now before we continue to bikeshed about the name for the new data structure, how about this crazy idea to merge parts of arch/x86/numa.c dealing with 'struct numa_meminfo' to arch_numa.c and then remove some of the redundancy, e.g by implementing memory_add_physaddr_to_nid() with memblock.memory instead of numa_meminfo. It's an involved project indeed, but it has advantage of shared infrastructure for NUMA initialization of ACPI systems. AFAIU, x86 has a parallel infrastructure because they were first, but I don't see a fundamental reason why it has to remain that way. There also were several attempts to implement fakenuma on arm64 or riscv (sorry, can't find lore links right now), and with numa_meminfo in arch_numa.c we get it almost for free. > > From a quick look CONFIG_HAVE_MEMBLOCK_PHYS_MAP is documented in a fashion that > > makes me think it's not directly appropriate (this isn't actual physical memory > > available during boot) but the general approach of just adding another memblock > > collection seems like it will work. > > Yes. As an alternative, modify the description of > CONFIG_HAVE_MEMBLOCK_PHYS_MAP. I wouldn't want to touch physmem, except maybe moving it entirely to arch/s390 and leaving it there. > > Hardest problem might be naming it. physmem_possible perhaps? > > Fill that with anything found in SRAT or CEDT should work for ACPI, but I'm not > > sure on whether we can fill it when neither of those is present. Maybe we just > > don't bother as for today's usecase CEDT needs to be present. > > You likely only want something for these special windows, with these special > semantics. That makes it hard. > > "physmem_possible" is a bit too generic for my taste, promising semantics > that might not hold true (we don't want all hotpluggable areas to show up > there). > > > > > Maybe physmem_known_possible is the way to go. I'll give this approach a spin > > and see how it goes. > > Goes into a better direction, or really "physmem_fixed_windows". Because > also "physmem_known_possible" as a wrong smell to it. physmem_potential? ;-) > -- > Cheers, > > David / dhildenb > -- Sincerely yours, Mike. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel