* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
@ 2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 8:19 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
I will try it out and I will paste the config.
> ---
> From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 23 May 2018 10:04:20 +0200
> Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust
>
> Oscar has reported:
> : Due to an unfortunate setting with movablecore, memblocks containing bootmem
> : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
> : So while trying to remove that memory, the system failed in do_migrate_range
> : and __offline_pages never returned.
>
> This is because we rely on start_isolate_page_range resp. has_unmovable_pages
> to do their jobb. The first one isolates the whole range to be offlined
> so that we do not allocate from it anymore and the later makes sure we
> are not stumbling over non-migrateable pages.
>
> has_unmovable_pages is overly optimistic, however. It doesn't check all
> the pages if we are withing zone_movable because we rely that those
> pages will be always migrateable. As it turns out we are still not
> perfect there. While bootmem pages in zonemovable sound like a clear bug
> which should be fixed let's remove the optimization for now and warn if
> we encounter unmovable pages in zone_movable in the meantime. That
> should help for now at least.
>
> Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug:
> do not fail offlining too early") because we used to have a small number
> of retries and then failed. This turned out to be too fragile though.
>
> Reported-by: Oscar Salvador <osalvador@techadventures.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/page_alloc.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c6f4008ea55..b9a45753244d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> unsigned long pfn, iter, found;
>
> /*
> - * For avoiding noise data, lru_add_drain_all() should be called
> - * If ZONE_MOVABLE, the zone never contains unmovable pages
> + * TODO we could make this much more efficient by not checking every
> + * page in the range if we know all of them are in MOVABLE_ZONE and
> + * that the movable zone guarantees that pages are migratable but
> + * the later is not the case right now unfortunatelly. E.g. movablecore
> + * can still lead to having bootmem allocations in zone_movable.
> */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> - return false;
>
> /*
> * CMA allocations (alloc_contig_range) really need to mark isolate
> @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> page = pfn_to_page(check);
>
> if (PageReserved(page))
> - return true;
> + goto unmovable;
>
> /*
> * Hugepages are not in LRU lists, but they're movable.
> @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * page at boot.
> */
> if (found > count)
> - return true;
> + goto unmovable;
> }
> return false;
> +unmovable:
> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> + return true;
> }
>
> #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> --
> 2.17.0
Thanks
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
@ 2018-05-23 9:28 ` Oscar Salvador
2018-05-23 10:26 ` Oscar Salvador
2 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 9:28 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
> ---
> From 6aa144a9b1c01255c89a4592221d706ccc4b4eea Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 23 May 2018 10:04:20 +0200
> Subject: [PATCH] mm, memory_hotplug: make has_unmovable_pages more robust
>
> Oscar has reported:
> : Due to an unfortunate setting with movablecore, memblocks containing bootmem
> : memory (pages marked by get_page_bootmem()) ended up marked in zone_movable.
> : So while trying to remove that memory, the system failed in do_migrate_range
> : and __offline_pages never returned.
>
> This is because we rely on start_isolate_page_range resp. has_unmovable_pages
> to do their jobb. The first one isolates the whole range to be offlined
> so that we do not allocate from it anymore and the later makes sure we
> are not stumbling over non-migrateable pages.
>
> has_unmovable_pages is overly optimistic, however. It doesn't check all
> the pages if we are withing zone_movable because we rely that those
> pages will be always migrateable. As it turns out we are still not
> perfect there. While bootmem pages in zonemovable sound like a clear bug
> which should be fixed let's remove the optimization for now and warn if
> we encounter unmovable pages in zone_movable in the meantime. That
> should help for now at least.
>
> Btw. this wasn't a real problem until 72b39cfc4d75 ("mm, memory_hotplug:
> do not fail offlining too early") because we used to have a small number
> of retries and then failed. This turned out to be too fragile though.
>
> Reported-by: Oscar Salvador <osalvador@techadventures.net>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/page_alloc.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3c6f4008ea55..b9a45753244d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7629,11 +7629,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> unsigned long pfn, iter, found;
>
> /*
> - * For avoiding noise data, lru_add_drain_all() should be called
> - * If ZONE_MOVABLE, the zone never contains unmovable pages
> + * TODO we could make this much more efficient by not checking every
> + * page in the range if we know all of them are in MOVABLE_ZONE and
> + * that the movable zone guarantees that pages are migratable but
> + * the later is not the case right now unfortunatelly. E.g. movablecore
> + * can still lead to having bootmem allocations in zone_movable.
> */
> - if (zone_idx(zone) == ZONE_MOVABLE)
> - return false;
>
> /*
> * CMA allocations (alloc_contig_range) really need to mark isolate
> @@ -7654,7 +7655,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> page = pfn_to_page(check);
>
> if (PageReserved(page))
> - return true;
> + goto unmovable;
>
> /*
> * Hugepages are not in LRU lists, but they're movable.
> @@ -7704,9 +7705,12 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * page at boot.
> */
> if (found > count)
> - return true;
> + goto unmovable;
> }
> return false;
> +unmovable:
> + WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
> + return true;
> }
>
> #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA)
> --
> 2.17.0
Tested-by: Oscar Salvador <osalvador@techadventures.net>
thanks!
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 8:16 ` Michal Hocko
2018-05-23 8:19 ` Oscar Salvador
2018-05-23 9:28 ` Oscar Salvador
@ 2018-05-23 10:26 ` Oscar Salvador
2018-05-23 11:38 ` Michal Hocko
2 siblings, 1 reply; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 10:26 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> [...]
> > Yeah, the current code is far from optimal. We
> > used to have a retry count but that one was removed exactly because of
> > premature failures. There are three things here
> > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > pages
> > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > has_unmovable_pages is overly optimistic and it should check all
> > pages even in movable zones.
> > 3) migrate_pages should really tell us whether the failure is temporal
> > or permanent. I am not sure we can do that easily though.
>
> 2) should be the most simple one for now. Could you give it a try? Btw.
> the exact configuration that led to boothmem pages in zone_movable would
> be really appreciated:
Here is some information:
** Qemu cmdline:
# qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
# Option movablecore=4G (cmdline)
** e820 map and some numa information:
linux kernel: BIOS-provided physical RAM map:
linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
linux kernel: NX (Execute Disable) protection: active
linux kernel: SMBIOS 2.8 present.
linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
linux kernel: Hypervisor detected: KVM
linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
** /proc/zoneinfo
Node 0, zone DMA
per-node stats
nr_inactive_anon 2107
nr_active_anon 49560
nr_inactive_file 25375
nr_active_file 19038
nr_unevictable 12
nr_slab_reclaimable 5996
nr_slab_unreclaimable 7236
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 48910
nr_mapped 13780
nr_file_pages 46676
nr_dirty 13
nr_writeback 0
nr_writeback_temp 0
nr_shmem 2263
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 50
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 17749
nr_written 17462
83328
pages free 3961
min 29
low 36
high 43
spanned 4095
present 3998
managed 3977
protection: (0, 2939, 2939, 3898, 3898)
nr_free_pages 3961
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 0
nr_zone_active_file 0
nr_zone_unevictable 0
nr_zone_write_pending 0
nr_mlock 0
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 2
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 1
numa_other 1
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 4
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 4
node_unreclaimable: 0
start_pfn: 1
Node 0, zone DMA32
pages free 724414
min 5583
low 6978
high 8373
spanned 1044480
present 782304
managed 758516
protection: (0, 0, 0, 959, 959)
nr_free_pages 724414
nr_zone_inactive_anon 0
nr_zone_active_anon 0
nr_zone_inactive_file 1697
nr_zone_active_file 8915
nr_zone_unevictable 0
nr_zone_write_pending 12
nr_mlock 0
nr_page_table_pages 2976
nr_kernel_stack 4000
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 281025
numa_miss 0
numa_foreign 0
numa_interleave 8583
numa_local 135392
numa_other 145633
pagesets
cpu: 0
count: 164
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 32
high: 186
batch: 31
vm stats threshold: 24
node_unreclaimable: 0
start_pfn: 4096
Node 0, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 7677, 7677)
Node 0, zone Movable
pages free 160140
min 1823
low 2278
high 2733
spanned 262144
present 262144
managed 245670
protection: (0, 0, 0, 0, 0)
nr_free_pages 160140
nr_zone_inactive_anon 2107
nr_zone_active_anon 49560
nr_zone_inactive_file 23678
nr_zone_active_file 10123
nr_zone_unevictable 12
nr_zone_write_pending 1
nr_mlock 12
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 214370
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 214344
numa_other 26
pagesets
cpu: 0
count: 32
high: 42
batch: 7
vm stats threshold: 16
cpu: 1
count: 26
high: 42
batch: 7
vm stats threshold: 16
node_unreclaimable: 0
start_pfn: 1048576
Node 0, zone Device
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 0, 0)
Node 1, zone DMA
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 2014, 2014)
Node 1, zone DMA32
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 2014, 2014)
Node 1, zone Normal
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 16117, 16117)
Node 1, zone Movable
per-node stats
nr_inactive_anon 524
nr_active_anon 25734
nr_inactive_file 28733
nr_active_file 12316
nr_unevictable 8
nr_slab_reclaimable 0
nr_slab_unreclaimable 0
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 0
workingset_activate 0
workingset_nodereclaim 0
nr_anon_pages 24656
nr_mapped 16871
nr_file_pages 41647
nr_dirty 1
nr_writeback 0
nr_writeback_temp 0
nr_shmem 598
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 8
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_dirtied 125
nr_written 98
0
pages free 448427
min 3827
low 4783
high 5739
spanned 524288
present 524288
managed 515766
protection: (0, 0, 0, 0, 0)
nr_free_pages 448427
nr_zone_inactive_anon 524
nr_zone_active_anon 25734
nr_zone_inactive_file 28733
nr_zone_active_file 12316
nr_zone_unevictable 8
nr_zone_write_pending 1
nr_mlock 8
nr_page_table_pages 0
nr_kernel_stack 0
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 199599
numa_miss 0
numa_foreign 0
numa_interleave 0
numa_local 199599
numa_other 0
pagesets
cpu: 0
count: 9
high: 42
batch: 7
vm stats threshold: 20
cpu: 1
count: 2
high: 42
batch: 7
vm stats threshold: 20
node_unreclaimable: 0
start_pfn: 1310720
Node 1, zone Device
pages free 0
min 0
low 0
high 0
spanned 0
present 0
managed 0
protection: (0, 0, 0, 0, 0)
I hope this is enough.
Thanks
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 10:26 ` Oscar Salvador
@ 2018-05-23 11:38 ` Michal Hocko
2018-05-23 11:53 ` Oscar Salvador
0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2018-05-23 11:38 UTC (permalink / raw)
To: Oscar Salvador; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed 23-05-18 12:26:43, Oscar Salvador wrote:
> On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> > On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> > [...]
> > > Yeah, the current code is far from optimal. We
> > > used to have a retry count but that one was removed exactly because of
> > > premature failures. There are three things here
> > > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > > pages
> > > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > > has_unmovable_pages is overly optimistic and it should check all
> > > pages even in movable zones.
> > > 3) migrate_pages should really tell us whether the failure is temporal
> > > or permanent. I am not sure we can do that easily though.
> >
> > 2) should be the most simple one for now. Could you give it a try? Btw.
> > the exact configuration that led to boothmem pages in zone_movable would
> > be really appreciated:
>
> Here is some information:
>
> ** Qemu cmdline:
>
> # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
> # Option movablecore=4G (cmdline)
>
> ** e820 map and some numa information:
>
> linux kernel: BIOS-provided physical RAM map:
> linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
> linux kernel: NX (Execute Disable) protection: active
> linux kernel: SMBIOS 2.8 present.
> linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
> linux kernel: Hypervisor detected: KVM
> linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
> linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
>
> linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
> linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
> linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
> linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
> linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
> linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
> linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
> linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
Could you also paste
"Zone ranges:"
and the follow up messages?
>From the zoneinfo it seems the movable zone got placed to both nodes.
And only Node0 is marked as hotplugable so early allocations can be
placed to Node1.
> ** /proc/zoneinfo
[...]
> Node 0, zone Movable
> pages free 160140
> min 1823
> low 2278
> high 2733
> spanned 262144
> present 262144
> managed 245670
it seems that 1G went to Node0
> Node 1, zone Movable
[...]
> pages free 448427
> min 3827
> low 4783
> high 5739
> spanned 524288
> present 524288
> managed 515766
and the rest to Node1. Guessing from spanned-managed it seems that used
memory is for memmaps (struct page arrays).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Checking for error code in __offline_pages
2018-05-23 11:38 ` Michal Hocko
@ 2018-05-23 11:53 ` Oscar Salvador
0 siblings, 0 replies; 11+ messages in thread
From: Oscar Salvador @ 2018-05-23 11:53 UTC (permalink / raw)
To: Michal Hocko; +Cc: linux-mm, vbabka, pasha.tatashin, akpm
On Wed, May 23, 2018 at 01:38:57PM +0200, Michal Hocko wrote:
> On Wed 23-05-18 12:26:43, Oscar Salvador wrote:
> > On Wed, May 23, 2018 at 10:16:09AM +0200, Michal Hocko wrote:
> > > On Wed 23-05-18 09:52:39, Michal Hocko wrote:
> > > [...]
> > > > Yeah, the current code is far from optimal. We
> > > > used to have a retry count but that one was removed exactly because of
> > > > premature failures. There are three things here
> > > > 1) zone_movable should contain any bootmem or otherwise non-migrateable
> > > > pages
> > > > 2) start_isolate_page_range should fail when seeing such pages - maybe
> > > > has_unmovable_pages is overly optimistic and it should check all
> > > > pages even in movable zones.
> > > > 3) migrate_pages should really tell us whether the failure is temporal
> > > > or permanent. I am not sure we can do that easily though.
> > >
> > > 2) should be the most simple one for now. Could you give it a try? Btw.
> > > the exact configuration that led to boothmem pages in zone_movable would
> > > be really appreciated:
> >
> > Here is some information:
> >
> > ** Qemu cmdline:
> >
> > # qemu-system-x86_64 -enable-kvm -smp 2 -monitor pty -m 6G,slots=8,maxmem=8G -numa node,mem=4096M -numa node,mem=2048M ...
> > # Option movablecore=4G (cmdline)
> >
> > ** e820 map and some numa information:
> >
> > linux kernel: BIOS-provided physical RAM map:
> > linux kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> > linux kernel: BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> > linux kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
> > linux kernel: BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
> > linux kernel: BIOS-e820: [mem 0x0000000100000000-0x00000001bfffffff] usable
> > linux kernel: NX (Execute Disable) protection: active
> > linux kernel: SMBIOS 2.8 present.
> > linux kernel: DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org
> > linux kernel: Hypervisor detected: KVM
> > linux kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> > linux kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
> > linux kernel: last_pfn = 0x1c0000 max_arch_pfn = 0x400000000
> >
> > linux kernel: SRAT: PXM 0 -> APIC 0x00 -> Node 0
> > linux kernel: SRAT: PXM 1 -> APIC 0x01 -> Node 1
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x13fffffff]
> > linux kernel: ACPI: SRAT: Node 1 PXM 1 [mem 0x140000000-0x1bfffffff]
> > linux kernel: ACPI: SRAT: Node 0 PXM 0 [mem 0x1c0000000-0x43fffffff] hotplug
> > linux kernel: NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0xbfffffff] -> [mem 0x0
> > linux kernel: NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x100000000-0x13fffffff] -> [mem 0
> > linux kernel: NODE_DATA(0) allocated [mem 0x13ffd6000-0x13fffffff]
> > linux kernel: NODE_DATA(1) allocated [mem 0x1bffd3000-0x1bfffcfff]
>
> Could you also paste
> "Zone ranges:"
> and the follow up messages?
Michal, here is the output about "Zone ranges:"
linux kernel: Zone ranges:
linux kernel: DMA [mem 0x0000000000001000-0x0000000000ffffff]
linux kernel: DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
linux kernel: Normal [mem 0x0000000100000000-0x00000001bfffffff]
linux kernel: Device empty
linux kernel: Movable zone start for each node
linux kernel: Node 0: 0x0000000100000000
linux kernel: Node 1: 0x0000000140000000
linux kernel: Early memory node ranges
linux kernel: node 0: [mem 0x0000000000001000-0x000000000009efff]
linux kernel: node 0: [mem 0x0000000000100000-0x00000000bffdffff]
linux kernel: node 0: [mem 0x0000000100000000-0x000000013fffffff]
linux kernel: node 1: [mem 0x0000000140000000-0x00000001bfffffff]
linux kernel: Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff]
linux kernel: On node 0 totalpages: 1048446
linux kernel: DMA zone: 64 pages used for memmap
linux kernel: DMA zone: 21 pages reserved
linux kernel: DMA zone: 3998 pages, LIFO batch:0
linux kernel: DMA32 zone: 12224 pages used for memmap
linux kernel: DMA32 zone: 782304 pages, LIFO batch:31
linux kernel: Movable zone: 4096 pages used for memmap
linux kernel: Movable zone: 262144 pages, LIFO batch:31
linux kernel: Initmem setup node 1 [mem 0x0000000140000000-0x00000001bfffffff]
linux kernel: On node 1 totalpages: 524288
linux kernel: Movable zone: 8192 pages used for memmap
linux kernel: Movable zone: 524288 pages, LIFO batch:31
linux kernel: Reserved but unavailable: 98 pages
Oscar Salvador
^ permalink raw reply [flat|nested] 11+ messages in thread