From: Helge Deller <deller@gmx.de>
To: Helge Deller <deller@gmx.de>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
Mel Gorman <mgorman@techsingularity.net>,
Mikulas Patocka <mpatocka@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
John David Anglin <dave.anglin@bell.net>,
linux-parisc@vger.kernel.org, linux-mm@kvack.org,
Vlastimil Babka <vbabka@suse.cz>,
Andrea Arcangeli <aarcange@redhat.com>,
Zi Yan <zi.yan@cs.rutgers.edu>
Subject: Re: Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs"
Date: Tue, 9 Apr 2019 22:09:46 +0200 [thread overview]
Message-ID: <20190409200946.GA3274@p100.box> (raw)
In-Reply-To: <1aca1299-8713-3d54-7c5e-adf791509987@gmx.de>
* Helge Deller <deller@gmx.de>:
> On 08.04.19 16:29, James Bottomley wrote:
> > On Mon, 2019-04-08 at 10:52 +0100, Mel Gorman wrote:
> >> First, if pa-risc is !NUMA then why are separate local ranges
> >> represented as separate nodes? Is it because of DISCONTIGMEM or
> >> something else? DISCONTIGMEM is before my time so I'm not familiar
> >> with it and I consider it "essentially dead" but the arch init code
> >> seems to setup pgdats for each physical contiguous range so it's a
> >> possibility. The most likely explanation is pa-risc does not have
> >> hardware with addressing limitations smaller than the CPUs physical
> >> address limits and it's possible to have more ranges than available
> >> zones but clarification would be nice.
> >
> > Let me try, since I remember the ancient history. In the early days,
> > there had to be a single mem_map array covering all of physical memory.
> > Some pa-risc systems had huge gaps in the physical memory; I think one
> > gap was somewhere around 1GB, so this lead us to wasting huge amounts
> > of space in mem_map on non-existent memory. What CONFIG_DISCONTIGMEM
> > did was allow you to represent this discontinuity on a non-NUMA system
> > using numa nodes, so we effectively got one node per discontiguous
> > range. It's hacky, but it worked. I thought we finally got converted
> > to sparsemem by the NUMA people, but I can't find the commit.
>
> James, you tried once:
> https://patchwork.kernel.org/patch/729441/
>
> It seems we better should move over to sparsemem now?
Below is an updated patch to convert parisc from DISCONTIGMEM to
SPARSEMEM. It builds and boots for me on 32- and 64-bit machines.
Mikulas, could you try if you still see the the cache limited to 1GiB
with this patch applied ?
Helge
---------------------
From 2c30c3a61bbfb56850862a7f7127416325fe126f Mon Sep 17 00:00:00 2001
From: Helge Deller <deller@gmx.de>
Date: Tue, 9 Apr 2019 21:52:35 +0200
Subject: [PATCH] parisc: Switch from DISCONTIGMEM to SPARSEMEM
Signed-off-by: Helge Deller <deller@gmx.de>
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index c8e6212..4f1397f 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -36,6 +36,7 @@ config PARISC
select GENERIC_STRNCPY_FROM_USER
select SYSCTL_ARCH_UNALIGN_ALLOW
select SYSCTL_EXCEPTION_TRACE
+ select ARCH_DISCARD_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC
select VIRT_TO_BUS
select MODULES_USE_ELF_RELA
@@ -311,21 +312,16 @@ config ARCH_SELECT_MEMORY_MODEL
def_bool y
depends on 64BIT
-config ARCH_DISCONTIGMEM_ENABLE
+config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on 64BIT
config ARCH_FLATMEM_ENABLE
def_bool y
-config ARCH_DISCONTIGMEM_DEFAULT
+config ARCH_SPARSEMEM_DEFAULT
def_bool y
- depends on ARCH_DISCONTIGMEM_ENABLE
-
-config NODES_SHIFT
- int
- default "3"
- depends on NEED_MULTIPLE_NODES
+ depends on ARCH_SPARSEMEM_ENABLE
source "kernel/Kconfig.hz"
diff --git a/arch/parisc/include/asm/mmzone.h b/arch/parisc/include/asm/mmzone.h
index fafa389..8d39040 100644
--- a/arch/parisc/include/asm/mmzone.h
+++ b/arch/parisc/include/asm/mmzone.h
@@ -2,62 +2,6 @@
#ifndef _PARISC_MMZONE_H
#define _PARISC_MMZONE_H
-#define MAX_PHYSMEM_RANGES 8 /* Fix the size for now (current known max is 3) */
+#define MAX_PHYSMEM_RANGES 4 /* Fix the size for now (current known max is 3) */
-#ifdef CONFIG_DISCONTIGMEM
-
-extern int npmem_ranges;
-
-struct node_map_data {
- pg_data_t pg_data;
-};
-
-extern struct node_map_data node_data[];
-
-#define NODE_DATA(nid) (&node_data[nid].pg_data)
-
-/* We have these possible memory map layouts:
- * Astro: 0-3.75, 67.75-68, 4-64
- * zx1: 0-1, 257-260, 4-256
- * Stretch (N-class): 0-2, 4-32, 34-xxx
- */
-
-/* Since each 1GB can only belong to one region (node), we can create
- * an index table for pfn to nid lookup; each entry in pfnnid_map
- * represents 1GB, and contains the node that the memory belongs to. */
-
-#define PFNNID_SHIFT (30 - PAGE_SHIFT)
-#define PFNNID_MAP_MAX 512 /* support 512GB */
-extern signed char pfnnid_map[PFNNID_MAP_MAX];
-
-#ifndef CONFIG_64BIT
-#define pfn_is_io(pfn) ((pfn & (0xf0000000UL >> PAGE_SHIFT)) == (0xf0000000UL >> PAGE_SHIFT))
-#else
-/* io can be 0xf0f0f0f0f0xxxxxx or 0xfffffffff0000000 */
-#define pfn_is_io(pfn) ((pfn & (0xf000000000000000UL >> PAGE_SHIFT)) == (0xf000000000000000UL >> PAGE_SHIFT))
-#endif
-
-static inline int pfn_to_nid(unsigned long pfn)
-{
- unsigned int i;
-
- if (unlikely(pfn_is_io(pfn)))
- return 0;
-
- i = pfn >> PFNNID_SHIFT;
- BUG_ON(i >= ARRAY_SIZE(pfnnid_map));
-
- return pfnnid_map[i];
-}
-
-static inline int pfn_valid(int pfn)
-{
- int nid = pfn_to_nid(pfn);
-
- if (nid >= 0)
- return (pfn < node_end_pfn(nid));
- return 0;
-}
-
-#endif
#endif /* _PARISC_MMZONE_H */
diff --git a/arch/parisc/include/asm/page.h b/arch/parisc/include/asm/page.h
index b77f49c..93caf17 100644
--- a/arch/parisc/include/asm/page.h
+++ b/arch/parisc/include/asm/page.h
@@ -147,9 +147,9 @@ extern int npmem_ranges;
#define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
#define pfn_valid(pfn) ((pfn) < max_mapnr)
-#endif /* CONFIG_DISCONTIGMEM */
+#endif
#ifdef CONFIG_HUGETLB_PAGE
#define HPAGE_SHIFT PMD_SHIFT /* fixed for transparent huge pages */
diff --git a/arch/parisc/include/asm/sparsemem.h b/arch/parisc/include/asm/sparsemem.h
new file mode 100644
index 0000000..b7d1dc9
--- /dev/null
+++ b/arch/parisc/include/asm/sparsemem.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ASM_PARISC_SPARSEMEM_H
+#define ASM_PARISC_SPARSEMEM_H
+
+/* We have these possible memory map layouts:
+ * Astro: 0-3.75, 67.75-68, 4-64
+ * zx1: 0-1, 257-260, 4-256
+ * Stretch (N-class): 0-2, 4-32, 34-xxx
+ */
+
+#define MAX_PHYSMEM_BITS 42
+#define SECTION_SIZE_BITS 37
+
+#endif
diff --git a/arch/parisc/kernel/parisc_ksyms.c b/arch/parisc/kernel/parisc_ksyms.c
index 7baa226..174213b 100644
--- a/arch/parisc/kernel/parisc_ksyms.c
+++ b/arch/parisc/kernel/parisc_ksyms.c
@@ -138,12 +138,6 @@ extern void $$dyncall(void);
EXPORT_SYMBOL($$dyncall);
#endif
-#ifdef CONFIG_DISCONTIGMEM
-#include <asm/mmzone.h>
-EXPORT_SYMBOL(node_data);
-EXPORT_SYMBOL(pfnnid_map);
-#endif
-
#ifdef CONFIG_FUNCTION_TRACER
extern void _mcount(void);
EXPORT_SYMBOL(_mcount);
diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c
index d0b1662..9523394 100644
--- a/arch/parisc/mm/init.c
+++ b/arch/parisc/mm/init.c
@@ -48,11 +48,6 @@ pmd_t pmd0[PTRS_PER_PMD] __attribute__ ((__section__ (".data..vm0.pmd"), aligned
pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__ ((__section__ (".data..vm0.pgd"), aligned(PAGE_SIZE)));
pte_t pg0[PT_INITIAL * PTRS_PER_PTE] __attribute__ ((__section__ (".data..vm0.pte"), aligned(PAGE_SIZE)));
-#ifdef CONFIG_DISCONTIGMEM
-struct node_map_data node_data[MAX_NUMNODES] __read_mostly;
-signed char pfnnid_map[PFNNID_MAP_MAX] __read_mostly;
-#endif
-
static struct resource data_resource = {
.name = "Kernel data",
.flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
@@ -76,11 +71,11 @@ static struct resource sysram_resources[MAX_PHYSMEM_RANGES] __read_mostly;
* information retrieved in kernel/inventory.c.
*/
-physmem_range_t pmem_ranges[MAX_PHYSMEM_RANGES] __read_mostly;
-int npmem_ranges __read_mostly;
+physmem_range_t pmem_ranges[MAX_PHYSMEM_RANGES] __initdata;
+int npmem_ranges __initdata;
#ifdef CONFIG_64BIT
-#define MAX_MEM (~0UL)
+#define MAX_MEM (1UL << MAX_PHYSMEM_BITS)
#else /* !CONFIG_64BIT */
#define MAX_MEM (3584U*1024U*1024U)
#endif /* !CONFIG_64BIT */
@@ -119,7 +114,7 @@ static void __init mem_limit_func(void)
static void __init setup_bootmem(void)
{
unsigned long mem_max;
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
physmem_range_t pmem_holes[MAX_PHYSMEM_RANGES - 1];
int npmem_holes;
#endif
@@ -137,23 +132,20 @@ static void __init setup_bootmem(void)
int j;
for (j = i; j > 0; j--) {
- unsigned long tmp;
+ physmem_range_t tmp;
if (pmem_ranges[j-1].start_pfn <
pmem_ranges[j].start_pfn) {
break;
}
- tmp = pmem_ranges[j-1].start_pfn;
- pmem_ranges[j-1].start_pfn = pmem_ranges[j].start_pfn;
- pmem_ranges[j].start_pfn = tmp;
- tmp = pmem_ranges[j-1].pages;
- pmem_ranges[j-1].pages = pmem_ranges[j].pages;
- pmem_ranges[j].pages = tmp;
+ tmp = pmem_ranges[j-1];
+ pmem_ranges[j-1] = pmem_ranges[j];
+ pmem_ranges[j] = tmp;
}
}
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
/*
* Throw out ranges that are too far apart (controlled by
* MAX_GAP).
@@ -165,7 +157,7 @@ static void __init setup_bootmem(void)
pmem_ranges[i-1].pages) > MAX_GAP) {
npmem_ranges = i;
printk("Large gap in memory detected (%ld pages). "
- "Consider turning on CONFIG_DISCONTIGMEM\n",
+ "Consider turning on CONFIG_SPARSEMEM\n",
pmem_ranges[i].start_pfn -
(pmem_ranges[i-1].start_pfn +
pmem_ranges[i-1].pages));
@@ -230,9 +222,8 @@ static void __init setup_bootmem(void)
printk(KERN_INFO "Total Memory: %ld MB\n",mem_max >> 20);
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
/* Merge the ranges, keeping track of the holes */
-
{
unsigned long end_pfn;
unsigned long hole_pages;
@@ -255,18 +246,6 @@ static void __init setup_bootmem(void)
}
#endif
-#ifdef CONFIG_DISCONTIGMEM
- for (i = 0; i < MAX_PHYSMEM_RANGES; i++) {
- memset(NODE_DATA(i), 0, sizeof(pg_data_t));
- }
- memset(pfnnid_map, 0xff, sizeof(pfnnid_map));
-
- for (i = 0; i < npmem_ranges; i++) {
- node_set_state(i, N_NORMAL_MEMORY);
- node_set_online(i);
- }
-#endif
-
/*
* Initialize and free the full range of memory in each range.
*/
@@ -314,7 +293,7 @@ static void __init setup_bootmem(void)
memblock_reserve(__pa(KERNEL_BINARY_TEXT_START),
(unsigned long)(_end - KERNEL_BINARY_TEXT_START));
-#ifndef CONFIG_DISCONTIGMEM
+#ifndef CONFIG_SPARSEMEM
/* reserve the holes */
@@ -360,6 +339,9 @@ static void __init setup_bootmem(void)
/* Initialize Page Deallocation Table (PDT) and check for bad memory. */
pdc_pdt_init();
+
+ memblock_allow_resize();
+ memblock_dump_all();
}
static int __init parisc_text_address(unsigned long vaddr)
@@ -709,37 +691,46 @@ static void __init gateway_init(void)
PAGE_SIZE, PAGE_GATEWAY, 1);
}
-void __init paging_init(void)
+static void __init parisc_bootmem_free(void)
{
+ unsigned long zones_size[MAX_NR_ZONES] = { 0, };
+ unsigned long holes_size[MAX_NR_ZONES] = { 0, };
+ unsigned long mem_start_pfn = ~0UL, mem_end_pfn = 0, mem_size_pfn = 0;
int i;
+ for (i = 0; i < npmem_ranges; i++) {
+ unsigned long start = pmem_ranges[i].start_pfn;
+ unsigned long size = pmem_ranges[i].pages;
+ unsigned long end = start + size;
+
+ if (mem_start_pfn > start)
+ mem_start_pfn = start;
+ if (mem_end_pfn < end)
+ mem_end_pfn = end;
+ mem_size_pfn += size;
+ }
+
+ zones_size[0] = mem_end_pfn - mem_start_pfn;
+ holes_size[0] = zones_size[0] - mem_size_pfn;
+
+ free_area_init_node(0, zones_size, mem_start_pfn, holes_size);
+}
+
+void __init paging_init(void)
+{
setup_bootmem();
pagetable_init();
gateway_init();
flush_cache_all_local(); /* start with known state */
flush_tlb_all_local(NULL);
- for (i = 0; i < npmem_ranges; i++) {
- unsigned long zones_size[MAX_NR_ZONES] = { 0, };
-
- zones_size[ZONE_NORMAL] = pmem_ranges[i].pages;
-
-#ifdef CONFIG_DISCONTIGMEM
- /* Need to initialize the pfnnid_map before we can initialize
- the zone */
- {
- int j;
- for (j = (pmem_ranges[i].start_pfn >> PFNNID_SHIFT);
- j <= ((pmem_ranges[i].start_pfn + pmem_ranges[i].pages) >> PFNNID_SHIFT);
- j++) {
- pfnnid_map[j] = i;
- }
- }
-#endif
-
- free_area_init_node(i, zones_size,
- pmem_ranges[i].start_pfn, NULL);
- }
+ /*
+ * Mark all memblocks as present for sparsemem using
+ * memory_present() and then initialize sparsemem.
+ */
+ memblocks_present();
+ sparse_init();
+ parisc_bootmem_free();
}
#ifdef CONFIG_PA20
prev parent reply other threads:[~2019-04-09 20:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-06 15:20 Memory management broken by "mm: reclaim small amounts of memory when an external fragmentation event occurs" Mikulas Patocka
2019-04-06 17:26 ` Mikulas Patocka
2019-04-08 9:52 ` Mel Gorman
2019-04-08 11:10 ` Mikulas Patocka
2019-04-08 12:54 ` Mel Gorman
2019-04-08 14:29 ` James Bottomley
2019-04-08 15:22 ` Helge Deller
2019-04-08 19:44 ` James Bottomley
2019-04-09 20:09 ` Helge Deller [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190409200946.GA3274@p100.box \
--to=deller@gmx.de \
--cc=James.Bottomley@HansenPartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.anglin@bell.net \
--cc=linux-mm@kvack.org \
--cc=linux-parisc@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mpatocka@redhat.com \
--cc=vbabka@suse.cz \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.