From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com [74.125.82.44]) by kanga.kvack.org (Postfix) with ESMTP id 9FC9582997 for ; Fri, 22 May 2015 05:33:19 -0400 (EDT) Received: by wgfl8 with SMTP id l8so12184207wgf.2 for ; Fri, 22 May 2015 02:33:19 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id df4si7844930wib.111.2015.05.22.02.33.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 May 2015 02:33:18 -0700 (PDT) Date: Fri, 22 May 2015 10:33:13 +0100 From: Mel Gorman Subject: Re: [PATCH] mm: meminit: Finish initialisation of struct pages before basic setup Message-ID: <20150522093313.GZ2462@suse.de> References: <1431597783.26797.1@cpanel21.proisp.no> <1432276201.11133.1@cpanel21.proisp.no> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1432276201.11133.1@cpanel21.proisp.no> Sender: owner-linux-mm@kvack.org List-ID: To: Daniel J Blueman Cc: Andrew Morton , nzimmer , Waiman Long , Dave Hansen , Scott Norton , Linux-MM , LKML , Steffen Persvold On Fri, May 22, 2015 at 02:30:01PM +0800, Daniel J Blueman wrote: > On Thu, May 14, 2015 at 6:03 PM, Daniel J Blueman > wrote: > >On Thu, May 14, 2015 at 12:31 AM, Mel Gorman wrote: > >>On Wed, May 13, 2015 at 10:53:33AM -0500, nzimmer wrote: > >>> I am just noticed a hang on my largest box. > >>> I can only reproduce with large core counts, if I turn down the > >>> number of cpus it doesn't have an issue. > >>> > >> > >>Odd. The number of core counts should make little a difference > >>as only > >>one CPU per node should be in use. Does sysrq+t give any > >>indication how > >>or where it is hanging? > > > >I was seeing the same behaviour of 1000ms increasing to 5500ms > >[1]; this suggests either lock contention or O(n) behaviour. > > > >Nathan, can you check with this ordering of patches from Andrew's > >cache [2]? I was getting hanging until I a found them all. > > > >I'll follow up with timing data. > > 7TB over 216 NUMA nodes, 1728 cores, from kernel 4.0.4 load to login: > > 1. 2086s with patches 01-19 [1] > > 2. 2026s adding "Take into account that large system caches scale > linearly with memory", which has: > min(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3)); > > 3. 2442s fixing to: > max(2UL << (30 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 3)); > > 4. 2064s adjusting minimum and shift to: > max(512UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8)); > > 5. 1934s adjusting minimum and shift to: > max(128UL << (20 - PAGE_SHIFT), (pgdat->node_spanned_pages >> 8)); > > 6. 930s #5 with the non-temporal PMD init patch I had earlier > proposed (I'll pursue separately) > > The scaling patch isn't in -mm. That patch was superceded by "mm: meminit: finish initialisation of struct pages before basic setup" and "mm-meminit-finish-initialisation-of-struct-pages-before-basic-setup-fix" so that's ok. FWIW, I think you should still go ahead with the non-temporal patches because there is potential benefit there other than the initialisation. If there was an arch-optional implementation of a non-termporal clear then it would also be worth considering if __GFP_ZERO should use non-temporal stores. At a greater stretch it would be worth considering if kswapd freeing should zero pages to avoid a zero on the allocation side in the general case as it would be more generally useful and a stepping stone towards what the series "Sanitizing freed pages" attempts. > #5 tests out nice on a bunch of > other AMD systems, 64GB and up, so: Tested-by: Daniel J Blueman > . > Thanks very much Daniel, much appreciated. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org