From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f49.google.com (mail-oi0-f49.google.com [209.85.218.49]) by kanga.kvack.org (Postfix) with ESMTP id 1185A6B0038 for ; Mon, 4 May 2015 23:32:37 -0400 (EDT) Received: by oiko83 with SMTP id o83so127738904oik.1 for ; Mon, 04 May 2015 20:32:36 -0700 (PDT) Received: from g9t5009.houston.hp.com (g9t5009.houston.hp.com. [15.240.92.67]) by mx.google.com with ESMTPS id g3si9284446oeu.95.2015.05.04.20.32.36 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 May 2015 20:32:36 -0700 (PDT) Message-ID: <554839D0.3080703@hp.com> Date: Mon, 04 May 2015 23:32:32 -0400 From: Waiman Long MIME-Version: 1.0 Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4 References: <1430231830-7702-1-git-send-email-mgorman@suse.de> <554030D1.8080509@hp.com> <5543F802.9090504@hp.com> <554415B1.2050702@hp.com> <20150504143046.9404c572486caf71bdef0676@linux-foundation.org> In-Reply-To: <20150504143046.9404c572486caf71bdef0676@linux-foundation.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Mel Gorman , Nathan Zimmer , Dave Hansen , Scott Norton , Daniel J Blueman , Linux-MM , LKML On 05/04/2015 05:30 PM, Andrew Morton wrote: > On Fri, 01 May 2015 20:09:21 -0400 Waiman Long wrote: > >> On 05/01/2015 06:02 PM, Waiman Long wrote: >>> Bad news! >>> >>> I tried your patch on a 24-TB DragonHawk and got an out of memory >>> panic. The kernel log messages were: >> ... >> >>> [ 81.360287] [] dump_stack+0x68/0x77 >>> [ 81.365942] [] panic+0xb9/0x219 >>> [ 81.371213] [] ? >>> __blocking_notifier_call_chain+0x63/0x80 >>> [ 81.378971] [] __out_of_memory+0x34e/0x350 >>> [ 81.385292] [] out_of_memory+0x5e/0x90 >>> [ 81.391230] [] __alloc_pages_slowpath+0x6be/0x740 >>> [ 81.398219] [] __alloc_pages_nodemask+0x23c/0x250 >>> [ 81.405212] [] kmem_getpages+0x56/0x110 >>> [ 81.411246] [] fallback_alloc+0x164/0x200 >>> [ 81.417474] [] ____cache_alloc_node+0x8d/0x170 >>> [ 81.424179] [] kmem_cache_alloc_trace+0x17b/0x240 >>> [ 81.431169] [] init_memory_block+0x3a/0x110 >>> [ 81.437586] [] memory_dev_init+0xd7/0x13d >>> [ 81.443810] [] driver_init+0x2f/0x37 >>> [ 81.449556] [] do_basic_setup+0x29/0xd5 >>> [ 81.455597] [] ? sched_init_smp+0x140/0x147 >>> [ 81.462015] [] kernel_init_freeable+0x20e/0x297 >>> [ 81.468815] [] ? rest_init+0x80/0x80 >>> [ 81.474565] [] kernel_init+0x9/0xf0 >>> [ 81.480216] [] ret_from_fork+0x58/0x90 >>> [ 81.486156] [] ? rest_init+0x80/0x80 >>> [ 81.492350] ---[ end Kernel panic - not syncing: Out of memory and >>> no killable processes... >>> [ 81.492350] >>> >>> -Longman >> I increased the pre-initialized memory per node in update_defer_init() >> of mm/page_alloc.c from 2G to 4G. Now I am able to boot the 24-TB >> machine without error. The 12-TB has 0.75TB/node, while the 24-TB >> machine has 1.5TB/node. I would suggest something like pre-initializing >> 1G per 0.25TB/node. In this way, it will scale properly with the memory >> size. > We're using more than 2G before we've even completed do_basic_setup()? > Where did it all go? I think they may be used in the allocation of the hash tables like: [ 2.367440] Dentry cache hash table entries: 2147483648 (order: 22, 17179869184 bytes) [ 11.522768] Inode-cache hash table entries: 2147483648 (order: 22, 17179869184 bytes) [ 18.598513] Mount-cache hash table entries: 67108864 (order: 17, 536870912 bytes) [ 18.667485] Mountpoint-cache hash table entries: 67108864 (order: 17, 536870912 bytes) The size of those hash tables do scale somewhat linearly with the amount of total memory available. >> Before the patch, the boot time from elilo prompt to ssh login was 694s. >> After the patch, the boot up time was 346s, a saving of 348s (about 50%). > Having to guesstimate the amount of memory which is needed for a > successful boot will be painful. Any number we choose will be wrong > 99% of the time. > > If the kswapd threads have started, all we need to do is to wait: take > a little nap in the allocator's page==NULL slowpath. > > I'm not seeing any reason why we can't start kswapd much earlier - > right at the start of do_basic_setup()? I think we can, we just have to change the hash table allocator to do that. Cheers, Longman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org