linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Li Zhang <zhlcindy@gmail.com>
To: mpe@ellerman.id.au, khandual@linux.vnet.ibm.com,
	aneesh.kumar@linux.vnet.ibm.com, mgorman@techsingularity.net
Cc: linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Li Zhang <zhlcindy@linux.vnet.ibm.com>
Subject: [PATCH RFC 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot
Date: Wed,  2 Mar 2016 16:49:36 +0800	[thread overview]
Message-ID: <1456908577-4702-2-git-send-email-zhlcindy@gmail.com> (raw)
In-Reply-To: <1456908577-4702-1-git-send-email-zhlcindy@gmail.com>

From: Li Zhang <zhlcindy@linux.vnet.ibm.com>

This patch is based on Mel Gorman's old patch in the mailing list,
https://lkml.org/lkml/2015/5/5/280 which is dicussed but it is
fixed with a completion to wait for all memory initialised in
page_alloc_init_late(). It is to fix the oom problem on X86
with 24TB memory which allocates memory in late initialisation.
But for Power platform with 32TB memory, it causes a call trace
in vfs_caches_init->inode_init() and inode hash table needs more
memory.
So this patch allocates 1GB for 0.25TB/node for large system
as it is mentioned in https://lkml.org/lkml/2015/5/1/627

This call trace is found on Power with 32TB memory, 1024CPUs, 16nodes.
The log from dmesg as the following:

[    0.091780] Dentry cache hash table entries: 2147483648 (order: 18,
17179869184 bytes)
[    2.891012] vmalloc: allocation failure, allocated 16021913600 of
17179934720 bytes
[    2.891034] swapper/0: page allocation failure: order:0,
mode:0x2080020
[    2.891038] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-0-ppc64
[    2.891041] Call Trace:
[    2.891046] [c0000000012bfa00] [c0000000007c4a50]
                .dump_stack+0xb4/0xb664 (unreliable)
[    2.891051] [c0000000012bfa80] [c0000000001f93d4]
                .warn_alloc_failed+0x114/0x160
[    2.891054] [c0000000012bfb30] [c00000000023c204]
                .__vmalloc_area_node+0x1a4/0x2b0
[    2.891058] [c0000000012bfbf0] [c00000000023c3f4]
                .__vmalloc_node_range+0xe4/0x110
[    2.891061] [c0000000012bfc90] [c00000000023c460]
                .__vmalloc_node+0x40/0x50
[    2.891065] [c0000000012bfd10] [c000000000b67d60]
                .alloc_large_system_hash+0x134/0x2a4
[    2.891068] [c0000000012bfdd0] [c000000000b70924]
                .inode_init+0xa4/0xf0
[    2.891071] [c0000000012bfe60] [c000000000b706a0]
                .vfs_caches_init+0x80/0x144
[    2.891074] [c0000000012bfef0] [c000000000b35208]
                .start_kernel+0x40c/0x4e0
[    2.891078] [c0000000012bff90] [c000000000008cfc]
                start_here_common+0x20/0x4a4
[    2.891080] Mem-Info:

Signed-off-by: Li Zhang <zhlcindy@linux.vnet.ibm.com>
---
 mm/page_alloc.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb..4847f25 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -293,13 +293,20 @@ static inline bool update_defer_init(pg_data_t *pgdat,
 				unsigned long pfn, unsigned long zone_end,
 				unsigned long *nr_initialised)
 {
+	unsigned long max_initialise;
+
 	/* Always populate low zones for address-contrained allocations */
 	if (zone_end < pgdat_end_pfn(pgdat))
 		return true;
+	/*
+	* Initialise at least 2G of a node but also take into account that
+	* two large system hashes that can take up 1GB for 0.25TB/node.
+	*/
+	max_initialise = max(2UL << (30 - PAGE_SHIFT),
+		(pgdat->node_spanned_pages >> 8));
 
-	/* Initialise at least 2G of the highest zone */
 	(*nr_initialised)++;
-	if (*nr_initialised > (2UL << (30 - PAGE_SHIFT)) &&
+	if ((*nr_initialised > max_initialise) &&
 	    (pfn & (PAGES_PER_SECTION - 1)) == 0) {
 		pgdat->first_deferred_pfn = pfn;
 		return false;
-- 
2.1.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-08  2:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  8:49 [PATCH RFC 0/2] mm: Enable page parallel initialisation for Power Li Zhang
2016-03-02  8:49 ` Li Zhang [this message]
2016-03-02  8:49 ` [PATCH RFC 2/2] powerpc/mm: Enable page parallel initialisation Li Zhang
  -- strict thread matches above, loose matches on Subject: below --
2016-03-03  7:01 [PATCH RFC 0/2] mm: Enable page parallel initialisation for Power Li Zhang
2016-03-03  7:01 ` [PATCH RFC 1/2] mm: meminit: initialise more memory for inode/dentry hash tables in early boot Li Zhang
2016-03-03  8:34   ` Mel Gorman
2016-03-03  9:41   ` Anshuman Khandual
2016-03-04  5:21     ` Li Zhang
2016-03-04  8:48   ` Vlastimil Babka
2016-03-04 13:52     ` Li Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456908577-4702-2-git-send-email-zhlcindy@gmail.com \
    --to=zhlcindy@gmail.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=mpe@ellerman.id.au \
    --cc=zhlcindy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).