All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: <linux-kernel@vger.kernel.org>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Andreas Larsson <andreas@gaisler.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>, "Borislav Petkov" <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	"John Paul Adrian Glaubitz" <glaubitz@physik.fu-berlin.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Herring <robh@kernel.org>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vasily Gorbik <gor@linux.ibm.com>, Will Deacon <will@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<loongarch@lists.linux.dev>, <linux-mips@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>,
	<linux-riscv@lists.infradead.org>, <linux-s390@vger.kernel.org>,
	<linux-sh@vger.kernel.org>, <sparclinux@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <devicetree@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-mm@kvack.org>,
	<x86@kernel.org>
Subject: Re: [PATCH 12/17] mm: introduce numa_memblks
Date: Fri, 19 Jul 2024 19:16:47 +0100	[thread overview]
Message-ID: <20240719191647.000072f6@Huawei.com> (raw)
In-Reply-To: <20240716111346.3676969-13-rppt@kernel.org>

On Tue, 16 Jul 2024 14:13:41 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig
> options to let x86 select it in its Kconfig.
> 
> This code will be later reused by arch_numa.
> 
> No functional changes.
> 
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Hi Mike,

My only real concern in here is there are a few places where
the lifted code makes changes to memblocks that are x86 only today.
I need to do some more digging to work out if those are safe
in all cases.

Jonathan



> +/**
> + * numa_cleanup_meminfo - Cleanup a numa_meminfo
> + * @mi: numa_meminfo to clean up
> + *
> + * Sanitize @mi by merging and removing unnecessary memblks.  Also check for
> + * conflicts and clear unused memblks.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
> +{
> +	const u64 low = 0;

Given always zero, why not just use that value inline?

> +	const u64 high = PFN_PHYS(max_pfn);
> +	int i, j, k;
> +
> +	/* first, trim all entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		/* move / save reserved memory ranges */
> +		if (!memblock_overlaps_region(&memblock.memory,
> +					bi->start, bi->end - bi->start)) {
> +			numa_move_tail_memblk(&numa_reserved_meminfo, i--, mi);
> +			continue;
> +		}
> +
> +		/* make sure all non-reserved blocks are inside the limits */
> +		bi->start = max(bi->start, low);
> +
> +		/* preserve info for non-RAM areas above 'max_pfn': */
> +		if (bi->end > high) {
> +			numa_add_memblk_to(bi->nid, high, bi->end,
> +					   &numa_reserved_meminfo);
> +			bi->end = high;
> +		}
> +
> +		/* and there's no empty block */
> +		if (bi->start >= bi->end)
> +			numa_remove_memblk_from(i--, mi);
> +	}
> +
> +	/* merge neighboring / overlapping entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		for (j = i + 1; j < mi->nr_blks; j++) {
> +			struct numa_memblk *bj = &mi->blk[j];
> +			u64 start, end;
> +
> +			/*
> +			 * See whether there are overlapping blocks.  Whine
> +			 * about but allow overlaps of the same nid.  They
> +			 * will be merged below.
> +			 */
> +			if (bi->end > bj->start && bi->start < bj->end) {
> +				if (bi->nid != bj->nid) {
> +					pr_err("node %d [mem %#010Lx-%#010Lx] overlaps with node %d [mem %#010Lx-%#010Lx]\n",
> +					       bi->nid, bi->start, bi->end - 1,
> +					       bj->nid, bj->start, bj->end - 1);
> +					return -EINVAL;
> +				}
> +				pr_warn("Warning: node %d [mem %#010Lx-%#010Lx] overlaps with itself [mem %#010Lx-%#010Lx]\n",
> +					bi->nid, bi->start, bi->end - 1,
> +					bj->start, bj->end - 1);
> +			}
> +
> +			/*
> +			 * Join together blocks on the same node, holes
> +			 * between which don't overlap with memory on other
> +			 * nodes.
> +			 */
> +			if (bi->nid != bj->nid)
> +				continue;
> +			start = min(bi->start, bj->start);
> +			end = max(bi->end, bj->end);
> +			for (k = 0; k < mi->nr_blks; k++) {
> +				struct numa_memblk *bk = &mi->blk[k];
> +
> +				if (bi->nid == bk->nid)
> +					continue;
> +				if (start < bk->end && end > bk->start)
> +					break;
> +			}
> +			if (k < mi->nr_blks)
> +				continue;
> +			pr_info("NUMA: Node %d [mem %#010Lx-%#010Lx] + [mem %#010Lx-%#010Lx] -> [mem %#010Lx-%#010Lx]\n",
> +			       bi->nid, bi->start, bi->end - 1, bj->start,
> +			       bj->end - 1, start, end - 1);
> +			bi->start = start;
> +			bi->end = end;
> +			numa_remove_memblk_from(j--, mi);
> +		}
> +	}
> +
> +	/* clear unused ones */
> +	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
> +		mi->blk[i].start = mi->blk[i].end = 0;
> +		mi->blk[i].nid = NUMA_NO_NODE;
> +	}
> +
> +	return 0;
> +}

...


> +/*
> + * Mark all currently memblock-reserved physical memory (which covers the
> + * kernel's own memory ranges) as hot-unswappable.
> + */
> +static void __init numa_clear_kernel_node_hotplug(void)

This will be a change for non x86 architectures.  'should' be fine
but I'm not 100% sure.

> +{
> +	nodemask_t reserved_nodemask = NODE_MASK_NONE;
> +	struct memblock_region *mb_region;
> +	int i;
> +
> +	/*
> +	 * We have to do some preprocessing of memblock regions, to
> +	 * make them suitable for reservation.
> +	 *
> +	 * At this time, all memory regions reserved by memblock are
> +	 * used by the kernel, but those regions are not split up
> +	 * along node boundaries yet, and don't necessarily have their
> +	 * node ID set yet either.
> +	 *
> +	 * So iterate over all memory known to the x86 architecture,

Comment needs an update at least given not x86 specific any more.

> +	 * and use those ranges to set the nid in memblock.reserved.
> +	 * This will split up the memblock regions along node
> +	 * boundaries and will set the node IDs as well.
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +		int ret;
> +
> +		ret = memblock_set_node(mb->start, mb->end - mb->start,
> +					&memblock.reserved, mb->nid);
> +		WARN_ON_ONCE(ret);
> +	}
> +
> +	/*
> +	 * Now go over all reserved memblock regions, to construct a
> +	 * node mask of all kernel reserved memory areas.
> +	 *
> +	 * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> +	 *   numa_meminfo might not include all memblock.reserved
> +	 *   memory ranges, because quirks such as trim_snb_memory()
> +	 *   reserve specific pages for Sandy Bridge graphics. ]
> +	 */
> +	for_each_reserved_mem_region(mb_region) {
> +		int nid = memblock_get_region_node(mb_region);
> +
> +		if (nid != MAX_NUMNODES)
> +			node_set(nid, reserved_nodemask);
> +	}
> +
> +	/*
> +	 * Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
> +	 * belonging to the reserved node mask.
> +	 *
> +	 * Note that this will include memory regions that reside
> +	 * on nodes that contain kernel memory - entire nodes
> +	 * become hot-unpluggable:
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +
> +		if (!node_isset(mb->nid, reserved_nodemask))
> +			continue;
> +
> +		memblock_clear_hotplug(mb->start, mb->end - mb->start);
> +	}
> +}

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: <linux-kernel@vger.kernel.org>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Andreas Larsson <andreas@gaisler.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>, "Borislav Petkov" <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	"John Paul Adrian Glaubitz" <glaubitz@physik.fu-berlin.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Herring <robh@kernel.org>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vasily Gorbik <gor@linux.ibm.com>, Will Deacon <will@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<loongarch@lists.linux.dev>, <linux-mips@vger.kernel.org>,
	<linuxppc-dev@lists.ozlabs.org>,
	<linux-riscv@lists.infradead.org>, <linux-s390@vger.kernel.org>,
	<linux-sh@vger.kernel.org>, <sparclinux@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-cxl@vger.kernel.org>,
	<nvdimm@lists.linux.dev>, <devicetree@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linux-mm@kvack.org>,
	<x86@kernel.org>
Subject: Re: [PATCH 12/17] mm: introduce numa_memblks
Date: Fri, 19 Jul 2024 19:16:47 +0100	[thread overview]
Message-ID: <20240719191647.000072f6@Huawei.com> (raw)
In-Reply-To: <20240716111346.3676969-13-rppt@kernel.org>

On Tue, 16 Jul 2024 14:13:41 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig
> options to let x86 select it in its Kconfig.
> 
> This code will be later reused by arch_numa.
> 
> No functional changes.
> 
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Hi Mike,

My only real concern in here is there are a few places where
the lifted code makes changes to memblocks that are x86 only today.
I need to do some more digging to work out if those are safe
in all cases.

Jonathan



> +/**
> + * numa_cleanup_meminfo - Cleanup a numa_meminfo
> + * @mi: numa_meminfo to clean up
> + *
> + * Sanitize @mi by merging and removing unnecessary memblks.  Also check for
> + * conflicts and clear unused memblks.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
> +{
> +	const u64 low = 0;

Given always zero, why not just use that value inline?

> +	const u64 high = PFN_PHYS(max_pfn);
> +	int i, j, k;
> +
> +	/* first, trim all entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		/* move / save reserved memory ranges */
> +		if (!memblock_overlaps_region(&memblock.memory,
> +					bi->start, bi->end - bi->start)) {
> +			numa_move_tail_memblk(&numa_reserved_meminfo, i--, mi);
> +			continue;
> +		}
> +
> +		/* make sure all non-reserved blocks are inside the limits */
> +		bi->start = max(bi->start, low);
> +
> +		/* preserve info for non-RAM areas above 'max_pfn': */
> +		if (bi->end > high) {
> +			numa_add_memblk_to(bi->nid, high, bi->end,
> +					   &numa_reserved_meminfo);
> +			bi->end = high;
> +		}
> +
> +		/* and there's no empty block */
> +		if (bi->start >= bi->end)
> +			numa_remove_memblk_from(i--, mi);
> +	}
> +
> +	/* merge neighboring / overlapping entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		for (j = i + 1; j < mi->nr_blks; j++) {
> +			struct numa_memblk *bj = &mi->blk[j];
> +			u64 start, end;
> +
> +			/*
> +			 * See whether there are overlapping blocks.  Whine
> +			 * about but allow overlaps of the same nid.  They
> +			 * will be merged below.
> +			 */
> +			if (bi->end > bj->start && bi->start < bj->end) {
> +				if (bi->nid != bj->nid) {
> +					pr_err("node %d [mem %#010Lx-%#010Lx] overlaps with node %d [mem %#010Lx-%#010Lx]\n",
> +					       bi->nid, bi->start, bi->end - 1,
> +					       bj->nid, bj->start, bj->end - 1);
> +					return -EINVAL;
> +				}
> +				pr_warn("Warning: node %d [mem %#010Lx-%#010Lx] overlaps with itself [mem %#010Lx-%#010Lx]\n",
> +					bi->nid, bi->start, bi->end - 1,
> +					bj->start, bj->end - 1);
> +			}
> +
> +			/*
> +			 * Join together blocks on the same node, holes
> +			 * between which don't overlap with memory on other
> +			 * nodes.
> +			 */
> +			if (bi->nid != bj->nid)
> +				continue;
> +			start = min(bi->start, bj->start);
> +			end = max(bi->end, bj->end);
> +			for (k = 0; k < mi->nr_blks; k++) {
> +				struct numa_memblk *bk = &mi->blk[k];
> +
> +				if (bi->nid == bk->nid)
> +					continue;
> +				if (start < bk->end && end > bk->start)
> +					break;
> +			}
> +			if (k < mi->nr_blks)
> +				continue;
> +			pr_info("NUMA: Node %d [mem %#010Lx-%#010Lx] + [mem %#010Lx-%#010Lx] -> [mem %#010Lx-%#010Lx]\n",
> +			       bi->nid, bi->start, bi->end - 1, bj->start,
> +			       bj->end - 1, start, end - 1);
> +			bi->start = start;
> +			bi->end = end;
> +			numa_remove_memblk_from(j--, mi);
> +		}
> +	}
> +
> +	/* clear unused ones */
> +	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
> +		mi->blk[i].start = mi->blk[i].end = 0;
> +		mi->blk[i].nid = NUMA_NO_NODE;
> +	}
> +
> +	return 0;
> +}

...


> +/*
> + * Mark all currently memblock-reserved physical memory (which covers the
> + * kernel's own memory ranges) as hot-unswappable.
> + */
> +static void __init numa_clear_kernel_node_hotplug(void)

This will be a change for non x86 architectures.  'should' be fine
but I'm not 100% sure.

> +{
> +	nodemask_t reserved_nodemask = NODE_MASK_NONE;
> +	struct memblock_region *mb_region;
> +	int i;
> +
> +	/*
> +	 * We have to do some preprocessing of memblock regions, to
> +	 * make them suitable for reservation.
> +	 *
> +	 * At this time, all memory regions reserved by memblock are
> +	 * used by the kernel, but those regions are not split up
> +	 * along node boundaries yet, and don't necessarily have their
> +	 * node ID set yet either.
> +	 *
> +	 * So iterate over all memory known to the x86 architecture,

Comment needs an update at least given not x86 specific any more.

> +	 * and use those ranges to set the nid in memblock.reserved.
> +	 * This will split up the memblock regions along node
> +	 * boundaries and will set the node IDs as well.
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +		int ret;
> +
> +		ret = memblock_set_node(mb->start, mb->end - mb->start,
> +					&memblock.reserved, mb->nid);
> +		WARN_ON_ONCE(ret);
> +	}
> +
> +	/*
> +	 * Now go over all reserved memblock regions, to construct a
> +	 * node mask of all kernel reserved memory areas.
> +	 *
> +	 * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> +	 *   numa_meminfo might not include all memblock.reserved
> +	 *   memory ranges, because quirks such as trim_snb_memory()
> +	 *   reserve specific pages for Sandy Bridge graphics. ]
> +	 */
> +	for_each_reserved_mem_region(mb_region) {
> +		int nid = memblock_get_region_node(mb_region);
> +
> +		if (nid != MAX_NUMNODES)
> +			node_set(nid, reserved_nodemask);
> +	}
> +
> +	/*
> +	 * Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
> +	 * belonging to the reserved node mask.
> +	 *
> +	 * Note that this will include memory regions that reside
> +	 * on nodes that contain kernel memory - entire nodes
> +	 * become hot-unpluggable:
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +
> +		if (!node_isset(mb->nid, reserved_nodemask))
> +			continue;
> +
> +		memblock_clear_hotplug(mb->start, mb->end - mb->start);
> +	}
> +}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: nvdimm@lists.linux.dev, x86@kernel.org,
	Andreas Larsson <andreas@gaisler.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	linux-mips@vger.kernel.org, linux-mm@kvack.org,
	sparclinux@vger.kernel.org,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arch@vger.kernel.org, Rob Herring <robh@kernel.org>,
	Vasily Gorbik <gor@linux.ibm.com>,
	linux-sh@vger.kernel.org, Huacai Chen <chenhuacai@kernel.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	linux-acpi@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	devicetree@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>,
	linux-s390@vger.kernel.org, Heiko Carstens <hca@linux.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	linux-cxl@vger.kernel.org, loongarch@lists.linux.dev,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	Palmer Dabbelt <palmer@dabbelt.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH 12/17] mm: introduce numa_memblks
Date: Fri, 19 Jul 2024 19:16:47 +0100	[thread overview]
Message-ID: <20240719191647.000072f6@Huawei.com> (raw)
In-Reply-To: <20240716111346.3676969-13-rppt@kernel.org>

On Tue, 16 Jul 2024 14:13:41 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> Move code dealing with numa_memblks from arch/x86 to mm/ and add Kconfig
> options to let x86 select it in its Kconfig.
> 
> This code will be later reused by arch_numa.
> 
> No functional changes.
> 
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Hi Mike,

My only real concern in here is there are a few places where
the lifted code makes changes to memblocks that are x86 only today.
I need to do some more digging to work out if those are safe
in all cases.

Jonathan



> +/**
> + * numa_cleanup_meminfo - Cleanup a numa_meminfo
> + * @mi: numa_meminfo to clean up
> + *
> + * Sanitize @mi by merging and removing unnecessary memblks.  Also check for
> + * conflicts and clear unused memblks.
> + *
> + * RETURNS:
> + * 0 on success, -errno on failure.
> + */
> +int __init numa_cleanup_meminfo(struct numa_meminfo *mi)
> +{
> +	const u64 low = 0;

Given always zero, why not just use that value inline?

> +	const u64 high = PFN_PHYS(max_pfn);
> +	int i, j, k;
> +
> +	/* first, trim all entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		/* move / save reserved memory ranges */
> +		if (!memblock_overlaps_region(&memblock.memory,
> +					bi->start, bi->end - bi->start)) {
> +			numa_move_tail_memblk(&numa_reserved_meminfo, i--, mi);
> +			continue;
> +		}
> +
> +		/* make sure all non-reserved blocks are inside the limits */
> +		bi->start = max(bi->start, low);
> +
> +		/* preserve info for non-RAM areas above 'max_pfn': */
> +		if (bi->end > high) {
> +			numa_add_memblk_to(bi->nid, high, bi->end,
> +					   &numa_reserved_meminfo);
> +			bi->end = high;
> +		}
> +
> +		/* and there's no empty block */
> +		if (bi->start >= bi->end)
> +			numa_remove_memblk_from(i--, mi);
> +	}
> +
> +	/* merge neighboring / overlapping entries */
> +	for (i = 0; i < mi->nr_blks; i++) {
> +		struct numa_memblk *bi = &mi->blk[i];
> +
> +		for (j = i + 1; j < mi->nr_blks; j++) {
> +			struct numa_memblk *bj = &mi->blk[j];
> +			u64 start, end;
> +
> +			/*
> +			 * See whether there are overlapping blocks.  Whine
> +			 * about but allow overlaps of the same nid.  They
> +			 * will be merged below.
> +			 */
> +			if (bi->end > bj->start && bi->start < bj->end) {
> +				if (bi->nid != bj->nid) {
> +					pr_err("node %d [mem %#010Lx-%#010Lx] overlaps with node %d [mem %#010Lx-%#010Lx]\n",
> +					       bi->nid, bi->start, bi->end - 1,
> +					       bj->nid, bj->start, bj->end - 1);
> +					return -EINVAL;
> +				}
> +				pr_warn("Warning: node %d [mem %#010Lx-%#010Lx] overlaps with itself [mem %#010Lx-%#010Lx]\n",
> +					bi->nid, bi->start, bi->end - 1,
> +					bj->start, bj->end - 1);
> +			}
> +
> +			/*
> +			 * Join together blocks on the same node, holes
> +			 * between which don't overlap with memory on other
> +			 * nodes.
> +			 */
> +			if (bi->nid != bj->nid)
> +				continue;
> +			start = min(bi->start, bj->start);
> +			end = max(bi->end, bj->end);
> +			for (k = 0; k < mi->nr_blks; k++) {
> +				struct numa_memblk *bk = &mi->blk[k];
> +
> +				if (bi->nid == bk->nid)
> +					continue;
> +				if (start < bk->end && end > bk->start)
> +					break;
> +			}
> +			if (k < mi->nr_blks)
> +				continue;
> +			pr_info("NUMA: Node %d [mem %#010Lx-%#010Lx] + [mem %#010Lx-%#010Lx] -> [mem %#010Lx-%#010Lx]\n",
> +			       bi->nid, bi->start, bi->end - 1, bj->start,
> +			       bj->end - 1, start, end - 1);
> +			bi->start = start;
> +			bi->end = end;
> +			numa_remove_memblk_from(j--, mi);
> +		}
> +	}
> +
> +	/* clear unused ones */
> +	for (i = mi->nr_blks; i < ARRAY_SIZE(mi->blk); i++) {
> +		mi->blk[i].start = mi->blk[i].end = 0;
> +		mi->blk[i].nid = NUMA_NO_NODE;
> +	}
> +
> +	return 0;
> +}

...


> +/*
> + * Mark all currently memblock-reserved physical memory (which covers the
> + * kernel's own memory ranges) as hot-unswappable.
> + */
> +static void __init numa_clear_kernel_node_hotplug(void)

This will be a change for non x86 architectures.  'should' be fine
but I'm not 100% sure.

> +{
> +	nodemask_t reserved_nodemask = NODE_MASK_NONE;
> +	struct memblock_region *mb_region;
> +	int i;
> +
> +	/*
> +	 * We have to do some preprocessing of memblock regions, to
> +	 * make them suitable for reservation.
> +	 *
> +	 * At this time, all memory regions reserved by memblock are
> +	 * used by the kernel, but those regions are not split up
> +	 * along node boundaries yet, and don't necessarily have their
> +	 * node ID set yet either.
> +	 *
> +	 * So iterate over all memory known to the x86 architecture,

Comment needs an update at least given not x86 specific any more.

> +	 * and use those ranges to set the nid in memblock.reserved.
> +	 * This will split up the memblock regions along node
> +	 * boundaries and will set the node IDs as well.
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +		int ret;
> +
> +		ret = memblock_set_node(mb->start, mb->end - mb->start,
> +					&memblock.reserved, mb->nid);
> +		WARN_ON_ONCE(ret);
> +	}
> +
> +	/*
> +	 * Now go over all reserved memblock regions, to construct a
> +	 * node mask of all kernel reserved memory areas.
> +	 *
> +	 * [ Note, when booting with mem=nn[kMG] or in a kdump kernel,
> +	 *   numa_meminfo might not include all memblock.reserved
> +	 *   memory ranges, because quirks such as trim_snb_memory()
> +	 *   reserve specific pages for Sandy Bridge graphics. ]
> +	 */
> +	for_each_reserved_mem_region(mb_region) {
> +		int nid = memblock_get_region_node(mb_region);
> +
> +		if (nid != MAX_NUMNODES)
> +			node_set(nid, reserved_nodemask);
> +	}
> +
> +	/*
> +	 * Finally, clear the MEMBLOCK_HOTPLUG flag for all memory
> +	 * belonging to the reserved node mask.
> +	 *
> +	 * Note that this will include memory regions that reside
> +	 * on nodes that contain kernel memory - entire nodes
> +	 * become hot-unpluggable:
> +	 */
> +	for (i = 0; i < numa_meminfo.nr_blks; i++) {
> +		struct numa_memblk *mb = numa_meminfo.blk + i;
> +
> +		if (!node_isset(mb->nid, reserved_nodemask))
> +			continue;
> +
> +		memblock_clear_hotplug(mb->start, mb->end - mb->start);
> +	}
> +}

  reply	other threads:[~2024-07-19 18:16 UTC|newest]

Thread overview: 180+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-16 11:13 [PATCH 00/17] mm: introduce numa_memblks Mike Rapoport
2024-07-16 11:13 ` Mike Rapoport
2024-07-16 11:13 ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 01/17] mm: move kernel/numa.c to mm/ Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-17 14:35   ` David Hildenbrand
2024-07-17 14:35     ` David Hildenbrand
2024-07-17 14:35     ` David Hildenbrand
2024-07-19 13:55   ` Jonathan Cameron
2024-07-19 13:55     ` Jonathan Cameron
2024-07-19 13:55     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 02/17] MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-17 14:32   ` David Hildenbrand
2024-07-17 14:32     ` David Hildenbrand
2024-07-17 14:32     ` David Hildenbrand
2024-07-19 14:38     ` Jonathan Cameron
2024-07-19 14:38       ` Jonathan Cameron
2024-07-19 14:38       ` Jonathan Cameron
2024-07-22  7:34       ` Mike Rapoport
2024-07-22  7:34         ` Mike Rapoport
2024-07-22  7:34         ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 03/17] MIPS: loongson64: rename __node_data to node_data Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 13:07   ` Jiaxun Yang
2024-07-16 13:07     ` Jiaxun Yang
2024-07-16 13:07     ` Jiaxun Yang
2024-07-17 14:33   ` David Hildenbrand
2024-07-17 14:33     ` David Hildenbrand
2024-07-17 14:33     ` David Hildenbrand
2024-07-19 15:27   ` Jonathan Cameron
2024-07-19 15:27     ` Jonathan Cameron
2024-07-19 15:27     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 04/17] arch, mm: move definition of node_data to generic code Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-17 14:35   ` David Hildenbrand
2024-07-17 14:35     ` David Hildenbrand
2024-07-17 14:35     ` David Hildenbrand
2024-07-19 15:39   ` Jonathan Cameron
2024-07-19 15:39     ` Jonathan Cameron
2024-07-19 15:39     ` Jonathan Cameron
2024-07-23  0:15   ` Davidlohr Bueso
2024-07-23  0:15     ` Davidlohr Bueso
2024-07-23  0:15     ` Davidlohr Bueso
2024-07-16 11:13 ` [PATCH 05/17] arch, mm: pull out allocation of NODE_DATA " Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-17 14:42   ` David Hildenbrand
2024-07-17 14:42     ` David Hildenbrand
2024-07-17 14:42     ` David Hildenbrand
2024-07-18  7:02     ` Mike Rapoport
2024-07-18  7:02       ` Mike Rapoport
2024-07-18  7:02       ` Mike Rapoport
2024-07-19 15:07       ` David Hildenbrand
2024-07-19 15:07         ` David Hildenbrand
2024-07-19 15:07         ` David Hildenbrand
2024-07-19 15:34         ` Mike Rapoport
2024-07-19 15:34           ` Mike Rapoport
2024-07-19 15:34           ` Mike Rapoport
2024-07-19 15:46           ` David Hildenbrand
2024-07-19 15:46             ` David Hildenbrand
2024-07-19 15:46             ` David Hildenbrand
2024-07-19 15:51         ` Jonathan Cameron
2024-07-19 15:51           ` Jonathan Cameron
2024-07-19 15:51           ` Jonathan Cameron
2024-07-19 16:07           ` David Hildenbrand
2024-07-19 16:07             ` David Hildenbrand
2024-07-19 16:07             ` David Hildenbrand
2024-07-20 10:24     ` Mike Rapoport
2024-07-20 10:24       ` Mike Rapoport
2024-07-20 10:24       ` Mike Rapoport
2024-07-19 16:11   ` Jonathan Cameron
2024-07-19 16:11     ` Jonathan Cameron
2024-07-19 16:11     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 06/17] x86/numa: simplify numa_distance allocation Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:28   ` Jonathan Cameron
2024-07-19 16:28     ` Jonathan Cameron
2024-07-19 16:28     ` Jonathan Cameron
2024-07-22  7:51     ` Mike Rapoport
2024-07-22  7:51       ` Mike Rapoport
2024-07-22  7:51       ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 07/17] x86/numa: move FAKE_NODE_* defines to numa_emu Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:30   ` Jonathan Cameron
2024-07-19 16:30     ` Jonathan Cameron
2024-07-19 16:30     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 08/17] x86/numa_emu: simplify allocation of phys_dist Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:38   ` Jonathan Cameron
2024-07-19 16:38     ` Jonathan Cameron
2024-07-19 16:38     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 09/17] x86/numa_emu: split __apicid_to_node update to a helper function Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:47   ` Jonathan Cameron
2024-07-19 16:47     ` Jonathan Cameron
2024-07-19 16:47     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 10/17] x86/numa_emu: use a helper function to get MAX_DMA32_PFN Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:50   ` Jonathan Cameron
2024-07-19 16:50     ` Jonathan Cameron
2024-07-19 16:50     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 11/17] x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:57   ` Jonathan Cameron
2024-07-19 16:57     ` Jonathan Cameron
2024-07-19 16:57     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 12/17] mm: introduce numa_memblks Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 18:16   ` Jonathan Cameron [this message]
2024-07-19 18:16     ` Jonathan Cameron
2024-07-19 18:16     ` Jonathan Cameron
2024-07-22  8:03     ` Mike Rapoport
2024-07-22  8:03       ` Mike Rapoport
2024-07-22  8:03       ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 13/17] mm: move numa_distance and related code from x86 to numa_memblks Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-18 21:46   ` Samuel Holland
2024-07-18 21:46     ` Samuel Holland
2024-07-18 21:46     ` Samuel Holland
2024-07-19  5:55     ` Mike Rapoport
2024-07-19  5:55       ` Mike Rapoport
2024-07-19  5:55       ` Mike Rapoport
2024-07-19 17:48   ` Jonathan Cameron
2024-07-19 17:48     ` Jonathan Cameron
2024-07-19 17:48     ` Jonathan Cameron
2024-07-20 12:25     ` Mike Rapoport
2024-07-20 12:25       ` Mike Rapoport
2024-07-20 12:25       ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 14/17] mm: introduce numa_emulation Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 16:03   ` Zi Yan
2024-07-19 16:03     ` Zi Yan
2024-07-19 16:03     ` Zi Yan
2024-07-20 12:09     ` Mike Rapoport
2024-07-20 12:09       ` Mike Rapoport
2024-07-20 12:09       ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 15/17] mm: make numa_memblks more self-contained Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 18:07   ` Jonathan Cameron
2024-07-19 18:07     ` Jonathan Cameron
2024-07-19 18:07     ` Jonathan Cameron
2024-07-20 12:32     ` Mike Rapoport
2024-07-20 12:32       ` Mike Rapoport
2024-07-20 12:32       ` Mike Rapoport
2024-07-22  8:05     ` Mike Rapoport
2024-07-22  8:05       ` Mike Rapoport
2024-07-22  8:05       ` Mike Rapoport
2024-07-16 11:13 ` [PATCH 16/17] arch_numa: switch over to numa_memblks Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 18:16   ` Jonathan Cameron
2024-07-19 18:16     ` Jonathan Cameron
2024-07-19 18:16     ` Jonathan Cameron
2024-07-16 11:13 ` [PATCH 17/17] mm: make range-to-target_node lookup facility a part of numa_memblks Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-16 11:13   ` Mike Rapoport
2024-07-19 18:19   ` Jonathan Cameron
2024-07-19 18:19     ` Jonathan Cameron
2024-07-19 18:19     ` Jonathan Cameron
2024-07-19 13:33 ` [PATCH 00/17] mm: introduce numa_memblks Jonathan Cameron
2024-07-19 13:33   ` Jonathan Cameron
2024-07-19 13:33   ` Jonathan Cameron
2024-07-22  8:08   ` Mike Rapoport
2024-07-22  8:08     ` Mike Rapoport
2024-07-22  8:08     ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240719191647.000072f6@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreas@gaisler.com \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=devicetree@vger.kernel.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=gor@linux.ibm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hca@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=loongarch@lists.linux.dev \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=nvdimm@lists.linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.