* [PATCH 0/0] Disable deferred struct page initialisation on Fadump
@ 2016-08-02 13:19 Srikar Dronamraju
2016-08-02 13:19 ` [PATCH 1/2] mm: Allow disabling deferred struct page initialisation Srikar Dronamraju
2016-08-02 13:19 ` [PATCH 2/2] fadump: Disable deferred page struct initialisation Srikar Dronamraju
0 siblings, 2 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-02 13:19 UTC (permalink / raw)
To: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev
Cc: Srikar Dronamraju
Fadump kernel reserves large chunks of memory even before the pages are
initialised. This could mean memory that corresponds to several nodes might
fall in memblock reserved regions.
Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. However such a kernel when booting a
secondary kernel will not be able to allocate the required amount of
memory to suffice for the dentry and inode caches. This results in
crashes like the below on large systems such as 32 TB systems.
Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
[c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
[c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
[c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
[c00000000108fd40] [c000000000aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
[c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
[c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
[c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
This can be solved by two approaches.
1. Disable deferred struct page initialisation on fadump.
2. Detect reserved nodes and allocate accordingly.
- Detecting nodes whose memblocks are mostly reserved.
- Allocating extra memory in other nodes in lieu of the nodes whose
memory is reserved.
This patchset takes the first approach.
Srikar Dronamraju (2):
mm: Allow disabling deferred struct page initialisation
fadump: Disable deferred page struct initialisation
arch/powerpc/kernel/fadump.c | 1 +
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 20 ++++++++++++++++++++
3 files changed, 22 insertions(+), 1 deletion(-)
--
1.8.5.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] mm: Allow disabling deferred struct page initialisation
2016-08-02 13:19 [PATCH 0/0] Disable deferred struct page initialisation on Fadump Srikar Dronamraju
@ 2016-08-02 13:19 ` Srikar Dronamraju
2016-08-02 18:09 ` Dave Hansen
2016-08-02 13:19 ` [PATCH 2/2] fadump: Disable deferred page struct initialisation Srikar Dronamraju
1 sibling, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-02 13:19 UTC (permalink / raw)
To: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev
Cc: Srikar Dronamraju
Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. However such a kernel when booting a
secondary kernel will not be able to allocate the required amount of
memory to suffice for the dentry and inode caches. This results in
crashes like the below on large systems such as 32 TB systems.
Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
[c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable)
[c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160
[c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340
[c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90
[c00000000108fd40] [c000000000aecfb0]
alloc_large_system_hash+0x1b8/0x2c0
[c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4
[c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c
[c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578
[c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8
Allow such kernels to disable deferred page struct initialisation.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
include/linux/mmzone.h | 2 +-
mm/page_alloc.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c60df92..1c55200 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1203,7 +1203,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
#else
#define pfn_valid_within(pfn) (1)
#endif
-
+void disable_deferred_meminit(void);
#ifdef CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
/*
* pfn_valid() is meant to be able to tell if a given PFN has valid memmap
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1069ef..dc6ebac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -301,6 +301,19 @@ static inline bool early_page_nid_uninitialised(unsigned long pfn, int nid)
}
/*
+ * Deferred struct page initialisation may not work on a multinode machine,
+ * if a significant amount of memory is reserved at early boot. Allow apis
+ * that reserve significant memory to disable deferred struct page
+ * initialisation.
+ */
+static bool defer_init_disabled;
+
+void disable_deferred_meminit(void)
+{
+ defer_init_disabled = true;
+}
+
+/*
* Returns false when the remaining initialisation should be deferred until
* later in the boot cycle when it can be parallelised.
*/
@@ -313,6 +326,9 @@ static inline bool update_defer_init(pg_data_t *pgdat,
/* Always populate low zones for address-contrained allocations */
if (zone_end < pgdat_end_pfn(pgdat))
return true;
+
+ if (defer_init_disabled)
+ return true;
/*
* Initialise at least 2G of a node but also take into account that
* two large system hashes that can take up 1GB for 0.25TB/node.
@@ -350,6 +366,10 @@ static inline bool update_defer_init(pg_data_t *pgdat,
{
return true;
}
+void disable_deferred_meminit(void)
+{
+}
+
#endif
--
1.8.5.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-02 13:19 [PATCH 0/0] Disable deferred struct page initialisation on Fadump Srikar Dronamraju
2016-08-02 13:19 ` [PATCH 1/2] mm: Allow disabling deferred struct page initialisation Srikar Dronamraju
@ 2016-08-02 13:19 ` Srikar Dronamraju
2016-08-03 5:20 ` Balbir Singh
1 sibling, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-02 13:19 UTC (permalink / raw)
To: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev
Cc: Srikar Dronamraju
Fadump kernel reserves significant number of memory blocks. On a multi-node
machine, with CONFIG_DEFFERRED_STRUCT_PAGE support, fadump kernel fails to
boot. Fix this by disabling deferred page struct initialisation.
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
arch/powerpc/kernel/fadump.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3cb3b02a..117faf2 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -318,6 +318,7 @@ int __init fadump_reserve_mem(void)
be64_to_cpu(fdm_active->rmr_region.source_len);
pr_debug("fadumphdr_addr = %p\n",
(void *) fw_dump.fadumphdr_addr);
+ disable_deferred_meminit();
} else {
/* Reserve the memory at the top of memory. */
size = get_fadump_area_size();
--
1.8.5.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] mm: Allow disabling deferred struct page initialisation
2016-08-02 13:19 ` [PATCH 1/2] mm: Allow disabling deferred struct page initialisation Srikar Dronamraju
@ 2016-08-02 18:09 ` Dave Hansen
2016-08-03 6:38 ` Srikar Dronamraju
0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2016-08-02 18:09 UTC (permalink / raw)
To: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
Michal Hocko, Andrew Morton, Michael Ellerman, linuxppc-dev
On 08/02/2016 06:19 AM, Srikar Dronamraju wrote:
> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> only certain size memory per node. The certain size takes into account
> the dentry and inode cache sizes. However such a kernel when booting a
> secondary kernel will not be able to allocate the required amount of
> memory to suffice for the dentry and inode caches. This results in
> crashes like the below on large systems such as 32 TB systems.
What's a "secondary kernel"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-02 13:19 ` [PATCH 2/2] fadump: Disable deferred page struct initialisation Srikar Dronamraju
@ 2016-08-03 5:20 ` Balbir Singh
2016-08-03 6:07 ` Vlastimil Babka
2016-08-03 6:35 ` Srikar Dronamraju
0 siblings, 2 replies; 15+ messages in thread
From: Balbir Singh @ 2016-08-03 5:20 UTC (permalink / raw)
To: Srikar Dronamraju, linux-mm, Mel Gorman, Vlastimil Babka,
Michal Hocko, Andrew Morton, Michael Ellerman, linuxppc-dev
On Tue, 2016-08-02 at 18:49 +0530, Srikar Dronamraju wrote:
> Fadump kernel reserves significant number of memory blocks. On a multi-node
> machine, with CONFIG_DEFFERRED_STRUCT_PAGE support, fadump kernel fails to
> boot. Fix this by disabling deferred page struct initialisation.
>A
How much memory does a fadump kernel need? Can we bump up the limits depending
on the config. I presume when you say fadump kernel you mean kernel with
FADUMP in the config?
BTW, I would much rather prefer a config based solution that does not select
DEFERRED_INIT if FADUMP is enabled.
Balbir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-03 5:20 ` Balbir Singh
@ 2016-08-03 6:07 ` Vlastimil Babka
2016-08-03 11:34 ` Michael Ellerman
2016-08-03 6:35 ` Srikar Dronamraju
1 sibling, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2016-08-03 6:07 UTC (permalink / raw)
To: Balbir Singh, Srikar Dronamraju, linux-mm, Mel Gorman,
Michal Hocko, Andrew Morton, Michael Ellerman, linuxppc-dev
On 08/03/2016 07:20 AM, Balbir Singh wrote:
> On Tue, 2016-08-02 at 18:49 +0530, Srikar Dronamraju wrote:
>> Fadump kernel reserves significant number of memory blocks. On a multi-node
>> machine, with CONFIG_DEFFERRED_STRUCT_PAGE support, fadump kernel fails to
>> boot. Fix this by disabling deferred page struct initialisation.
>>
>
> How much memory does a fadump kernel need? Can we bump up the limits depending
> on the config. I presume when you say fadump kernel you mean kernel with
> FADUMP in the config?
>
> BTW, I would much rather prefer a config based solution that does not select
> DEFERRED_INIT if FADUMP is enabled.
IIRC the kdump/fadump kernel is typically the same vmlinux as the main
kernel, just with special initrd and boot params. So if you want
deferred init for the main kernel, this would be impractical.
> Balbir
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-03 5:20 ` Balbir Singh
2016-08-03 6:07 ` Vlastimil Babka
@ 2016-08-03 6:35 ` Srikar Dronamraju
2016-08-03 19:40 ` Dave Hansen
1 sibling, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-03 6:35 UTC (permalink / raw)
To: Balbir Singh
Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, mahesh
* Balbir Singh <bsingharora@gmail.com> [2016-08-03 15:20:42]:
> On Tue, 2016-08-02 at 18:49 +0530, Srikar Dronamraju wrote:
> > Fadump kernel reserves significant number of memory blocks. On a multi-node
> > machine, with CONFIG_DEFFERRED_STRUCT_PAGE support, fadump kernel fails to
> > boot. Fix this by disabling deferred page struct initialisation.
> >
>
> How much memory does a fadump kernel need? Can we bump up the limits depending
> on the config. I presume when you say fadump kernel you mean kernel with
> FADUMP in the config?
On a regular kernel with CONFIG_FADUMP and fadump configured, 5% of the
total memory is reserved for booting the kernel on crash. On crash,
fadump kernel reserves the 95% memory and boots into the 5% memory that
was reserved for it. It then parses the reserved 95% memory to collect
the dump.
The problem is not about the amount of memory thats reserved for fadump
kernel. Even if we increase/decrease, we will still end up with the same
issue.
> BTW, I would much rather prefer a config based solution that does not select
> DEFERRED_INIT if FADUMP is enabled.
As Vlastimil rightly pointed out, for fadump, the same kernel is booted
back at a different location when we crash. So we cannot have a config
based solution.
--
Thanks and Regards
Srikar Dronamraju
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] mm: Allow disabling deferred struct page initialisation
2016-08-02 18:09 ` Dave Hansen
@ 2016-08-03 6:38 ` Srikar Dronamraju
2016-08-03 18:17 ` Dave Hansen
0 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-03 6:38 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev, mahesh, hbathini
* Dave Hansen <dave.hansen@intel.com> [2016-08-02 11:09:21]:
> On 08/02/2016 06:19 AM, Srikar Dronamraju wrote:
> > Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> > only certain size memory per node. The certain size takes into account
> > the dentry and inode cache sizes. However such a kernel when booting a
> > secondary kernel will not be able to allocate the required amount of
> > memory to suffice for the dentry and inode caches. This results in
> > crashes like the below on large systems such as 32 TB systems.
>
> What's a "secondary kernel"?
>
I mean the kernel thats booted to collect the crash, On fadump, the
first kernel acts as the secondary kernel i.e the same kernel is booted
to collect the crash.
--
Thanks and Regards
Srikar Dronamraju
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-03 6:07 ` Vlastimil Babka
@ 2016-08-03 11:34 ` Michael Ellerman
0 siblings, 0 replies; 15+ messages in thread
From: Michael Ellerman @ 2016-08-03 11:34 UTC (permalink / raw)
To: Vlastimil Babka, Balbir Singh, Srikar Dronamraju, linux-mm,
Mel Gorman, Michal Hocko, Andrew Morton, linuxppc-dev
Vlastimil Babka <vbabka@suse.cz> writes:
> On 08/03/2016 07:20 AM, Balbir Singh wrote:
>> On Tue, 2016-08-02 at 18:49 +0530, Srikar Dronamraju wrote:
>>> Fadump kernel reserves significant number of memory blocks. On a multi-node
>>> machine, with CONFIG_DEFFERRED_STRUCT_PAGE support, fadump kernel fails to
>>> boot. Fix this by disabling deferred page struct initialisation.
>>>
>>
>> How much memory does a fadump kernel need? Can we bump up the limits depending
>> on the config. I presume when you say fadump kernel you mean kernel with
>> FADUMP in the config?
>>
>> BTW, I would much rather prefer a config based solution that does not select
>> DEFERRED_INIT if FADUMP is enabled.
>
> IIRC the kdump/fadump kernel is typically the same vmlinux as the main
> kernel, just with special initrd and boot params. So if you want
> deferred init for the main kernel, this would be impractical.
Yes. Distros won't build a separate kernel, so it has to work at runtime.
cheers
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] mm: Allow disabling deferred struct page initialisation
2016-08-03 6:38 ` Srikar Dronamraju
@ 2016-08-03 18:17 ` Dave Hansen
2016-08-04 5:25 ` Srikar Dronamraju
0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2016-08-03 18:17 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev, mahesh, hbathini
On 08/02/2016 11:38 PM, Srikar Dronamraju wrote:
> * Dave Hansen <dave.hansen@intel.com> [2016-08-02 11:09:21]:
>> On 08/02/2016 06:19 AM, Srikar Dronamraju wrote:
>>> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
>>> only certain size memory per node. The certain size takes into account
>>> the dentry and inode cache sizes. However such a kernel when booting a
>>> secondary kernel will not be able to allocate the required amount of
>>> memory to suffice for the dentry and inode caches. This results in
>>> crashes like the below on large systems such as 32 TB systems.
>>
>> What's a "secondary kernel"?
>>
> I mean the kernel thats booted to collect the crash, On fadump, the
> first kernel acts as the secondary kernel i.e the same kernel is booted
> to collect the crash.
OK, but I'm still not seeing what the problem is. You've said that it
crashes and that it crashes during inode/dentry cache allocation.
But, *why* does the same kernel image crash in when it is used as a
"secondary kernel"?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-03 6:35 ` Srikar Dronamraju
@ 2016-08-03 19:40 ` Dave Hansen
2016-08-04 5:10 ` Srikar Dronamraju
0 siblings, 1 reply; 15+ messages in thread
From: Dave Hansen @ 2016-08-03 19:40 UTC (permalink / raw)
To: Srikar Dronamraju, Balbir Singh
Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, mahesh
On 08/02/2016 11:35 PM, Srikar Dronamraju wrote:
> On a regular kernel with CONFIG_FADUMP and fadump configured, 5% of the
> total memory is reserved for booting the kernel on crash. On crash,
> fadump kernel reserves the 95% memory and boots into the 5% memory that
> was reserved for it. It then parses the reserved 95% memory to collect
> the dump.
>
> The problem is not about the amount of memory thats reserved for fadump
> kernel. Even if we increase/decrease, we will still end up with the same
> issue.
Oh, and the dentry/inode caches are sized based on 100% of memory, not
the 5% that's left after the fadump reservation?
Is the deferred initialization kicked in progress at the time we do the
dentry/inode allocations? Can waiting a bit let the allocation succeed?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-03 19:40 ` Dave Hansen
@ 2016-08-04 5:10 ` Srikar Dronamraju
2016-08-04 10:28 ` Mel Gorman
0 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-04 5:10 UTC (permalink / raw)
To: Dave Hansen
Cc: Balbir Singh, linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, mahesh
* Dave Hansen <dave.hansen@intel.com> [2016-08-03 12:40:17]:
> On 08/02/2016 11:35 PM, Srikar Dronamraju wrote:
> > On a regular kernel with CONFIG_FADUMP and fadump configured, 5% of the
> > total memory is reserved for booting the kernel on crash. On crash,
> > fadump kernel reserves the 95% memory and boots into the 5% memory that
> > was reserved for it. It then parses the reserved 95% memory to collect
> > the dump.
> >
> > The problem is not about the amount of memory thats reserved for fadump
> > kernel. Even if we increase/decrease, we will still end up with the same
> > issue.
>
> Oh, and the dentry/inode caches are sized based on 100% of memory, not
> the 5% that's left after the fadump reservation?
Yes, the dentry/inode caches are sized based on the 100% memory.
>
> Is the deferred initialization kicked in progress at the time we do the
> dentry/inode allocations? Can waiting a bit let the allocation succeed?
>
Right now deferred initialisation kicks in after dentry/inode
allocations.
Can we defer the cache allocations till deferred
initialisation? I dont know. But if we can that could potentially solve
the problem. May be Mel or somebody might be able answer if we can defer
dentry/inode cache allocations till deferred initialisation kicks in.
The other idea is to detect nodes whose memory is reserved and allocate
extra memory from the nodes where memory is not yet reserved.
--
Thanks and Regards
Srikar Dronamraju
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] mm: Allow disabling deferred struct page initialisation
2016-08-03 18:17 ` Dave Hansen
@ 2016-08-04 5:25 ` Srikar Dronamraju
0 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-04 5:25 UTC (permalink / raw)
To: Dave Hansen
Cc: linux-mm, Mel Gorman, Vlastimil Babka, Michal Hocko,
Andrew Morton, Michael Ellerman, linuxppc-dev, mahesh, hbathini
* Dave Hansen <dave.hansen@intel.com> [2016-08-03 11:17:43]:
> On 08/02/2016 11:38 PM, Srikar Dronamraju wrote:
> > * Dave Hansen <dave.hansen@intel.com> [2016-08-02 11:09:21]:
> >> On 08/02/2016 06:19 AM, Srikar Dronamraju wrote:
> >>> Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialise
> >>> only certain size memory per node. The certain size takes into account
> >>> the dentry and inode cache sizes. However such a kernel when booting a
> >>> secondary kernel will not be able to allocate the required amount of
> >>> memory to suffice for the dentry and inode caches. This results in
> >>> crashes like the below on large systems such as 32 TB systems.
> >>
> >> What's a "secondary kernel"?
> >>
> > I mean the kernel thats booted to collect the crash, On fadump, the
> > first kernel acts as the secondary kernel i.e the same kernel is booted
> > to collect the crash.
>
> OK, but I'm still not seeing what the problem is. You've said that it
> crashes and that it crashes during inode/dentry cache allocation.
>
> But, *why* does the same kernel image crash in when it is used as a
> "secondary kernel"?
>
I guess you already got it. But let me try to explain it again.
Lets say we have a 32 TB system with 16 nodes each node having 2T of
memory. We are assuming deferred page initialisation is configured.
When the regular kernel boots,
1. It reserves 5% of the memory for fadump.
2. It initializes 8GB per node, i.e 128GB
3. It allocated dentry/inode cache which is around 16GB.
4. It then kicks the parallel page struct initialization.
Now lets say kernel crashed and fadump was triggered.
1. The same kernel boots in the 5% reserved space which is 1600GB
2. It reserves the rest 95% memory.
3. It tries to initialize 8GB per node but can only initialize 8GB.
(since except for 1st node the rest nodes are all reserved)
4. It tries to allocate dentry/inode cache of 16GB but fails.
(tries to reclaim but reclaim needs spinlock
and spinlock is not yet initialized.)
--
Thanks and Regards
Srikar Dronamraju
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-04 5:10 ` Srikar Dronamraju
@ 2016-08-04 10:28 ` Mel Gorman
2016-08-04 13:54 ` Srikar Dronamraju
0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2016-08-04 10:28 UTC (permalink / raw)
To: Srikar Dronamraju
Cc: Dave Hansen, Balbir Singh, linux-mm, Vlastimil Babka,
Michal Hocko, Andrew Morton, Michael Ellerman, mahesh
On Thu, Aug 04, 2016 at 10:40:35AM +0530, Srikar Dronamraju wrote:
> * Dave Hansen <dave.hansen@intel.com> [2016-08-03 12:40:17]:
>
> > On 08/02/2016 11:35 PM, Srikar Dronamraju wrote:
> > > On a regular kernel with CONFIG_FADUMP and fadump configured, 5% of the
> > > total memory is reserved for booting the kernel on crash. On crash,
> > > fadump kernel reserves the 95% memory and boots into the 5% memory that
> > > was reserved for it. It then parses the reserved 95% memory to collect
> > > the dump.
> > >
> > > The problem is not about the amount of memory thats reserved for fadump
> > > kernel. Even if we increase/decrease, we will still end up with the same
> > > issue.
> >
> > Oh, and the dentry/inode caches are sized based on 100% of memory, not
> > the 5% that's left after the fadump reservation?
>
> Yes, the dentry/inode caches are sized based on the 100% memory.
>
By and large, I'm not a major fan of introducing an API to disable it for
a single feature that is arch-specific because it's very heavy handed.
There is no guarantee that the existence of fadump will cause a failure
If fadump is reserving memory and alloc_large_system_hash(HASH_EARLY)
does not know about then then would an arch-specific callback for
arch_reserved_kernel_pages() be more appropriate? fadump would need to
return how many pages it reserved there. That would shrink the size of
the inode and dentry hash tables when booting with 95% of memory
reserved.
That approach would limit the impact to ppc64 and would be less costly than
doing a memblock walk instead of using nr_kernel_pages for everyone else.
> > Is the deferred initialization kicked in progress at the time we do the
> > dentry/inode allocations? Can waiting a bit let the allocation succeed?
> >
>
> Right now deferred initialisation kicks in after dentry/inode
> allocations.
>
> Can we defer the cache allocations till deferred
> initialisation? I dont know.
Only by backing it with vmalloc memory.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] fadump: Disable deferred page struct initialisation
2016-08-04 10:28 ` Mel Gorman
@ 2016-08-04 13:54 ` Srikar Dronamraju
0 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2016-08-04 13:54 UTC (permalink / raw)
To: Mel Gorman
Cc: Dave Hansen, Balbir Singh, linux-mm, Vlastimil Babka,
Michal Hocko, Andrew Morton, Michael Ellerman, mahesh
* Mel Gorman <mgorman@techsingularity.net> [2016-08-04 11:28:01]:
> > >
> > > Oh, and the dentry/inode caches are sized based on 100% of memory, not
> > > the 5% that's left after the fadump reservation?
> >
> > Yes, the dentry/inode caches are sized based on the 100% memory.
> >
>
> By and large, I'm not a major fan of introducing an API to disable it for
> a single feature that is arch-specific because it's very heavy handed.
> There is no guarantee that the existence of fadump will cause a failure
okay.
>
> If fadump is reserving memory and alloc_large_system_hash(HASH_EARLY)
> does not know about then then would an arch-specific callback for
> arch_reserved_kernel_pages() be more appropriate? fadump would need to
> return how many pages it reserved there. That would shrink the size of
> the inode and dentry hash tables when booting with 95% of memory
> reserved.
>
> That approach would limit the impact to ppc64 and would be less costly than
> doing a memblock walk instead of using nr_kernel_pages for everyone else.
>
I have posted a patch based on Mel and Dave's feedback
http://lkml.kernel.org/r/1470318165-2521-1-git-send-email-srikar@linux.vnet.ibm.com
--
Thanks and Regards
Srikar Dronamraju
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-08-04 13:54 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-02 13:19 [PATCH 0/0] Disable deferred struct page initialisation on Fadump Srikar Dronamraju
2016-08-02 13:19 ` [PATCH 1/2] mm: Allow disabling deferred struct page initialisation Srikar Dronamraju
2016-08-02 18:09 ` Dave Hansen
2016-08-03 6:38 ` Srikar Dronamraju
2016-08-03 18:17 ` Dave Hansen
2016-08-04 5:25 ` Srikar Dronamraju
2016-08-02 13:19 ` [PATCH 2/2] fadump: Disable deferred page struct initialisation Srikar Dronamraju
2016-08-03 5:20 ` Balbir Singh
2016-08-03 6:07 ` Vlastimil Babka
2016-08-03 11:34 ` Michael Ellerman
2016-08-03 6:35 ` Srikar Dronamraju
2016-08-03 19:40 ` Dave Hansen
2016-08-04 5:10 ` Srikar Dronamraju
2016-08-04 10:28 ` Mel Gorman
2016-08-04 13:54 ` Srikar Dronamraju
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).