linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/14] Parallel memory initialisation
@ 2015-04-13 10:16 Mel Gorman
  2015-04-13 10:16 ` [PATCH 01/14] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
                   ` (16 more replies)
  0 siblings, 17 replies; 34+ messages in thread
From: Mel Gorman @ 2015-04-13 10:16 UTC (permalink / raw)
  To: Linux-MM
  Cc: Robin Holt, Nathan Zimmer, Daniel Rahn, Davidlohr Bueso,
	Dave Hansen, Tom Vaden, Scott Norton, LKML, Mel Gorman

Memory initialisation had been identified as one of the reasons why large
machines take a long time to boot. Patches were posted a long time ago
that attempted to move deferred initialisation into the page allocator
paths. This was rejected on the grounds it should not be necessary to hurt
the fast paths to parallelise initialisation. This series reuses much of
the work from that time but defers the initialisation of memory to kswapd
so that one thread per node initialises memory local to that node. The
issue is that on the machines I tested with, memory initialisation was not
a major contributor to boot times. I'm posting the RFC to both review the
series and see if it actually helps users of very large machines.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[   11.913324] kswapd 0 initialised deferred memory in 1168ms
[   12.220011] kswapd 2 initialised deferred memory in 1476ms
[   12.245369] kswapd 3 initialised deferred memory in 1500ms
[   12.271680] kswapd 1 initialised deferred memory in 1528ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 10 seconds (about 8% improvement on kernel times but 1-2%
overall as POST takes so long).

It would be nice if the people that have access to really large machines
would test this series and report back if the complexity is justified.

Patches are against 4.0-rc7.

 Documentation/kernel-parameters.txt |   8 +
 arch/ia64/mm/numa.c                 |  19 +-
 arch/x86/Kconfig                    |   2 +
 include/linux/memblock.h            |  18 ++
 include/linux/mm.h                  |   8 +-
 include/linux/mmzone.h              |  37 +++-
 init/main.c                         |   1 +
 mm/Kconfig                          |  29 +++
 mm/bootmem.c                        |   6 +-
 mm/internal.h                       |  23 ++-
 mm/memblock.c                       |  34 ++-
 mm/mm_init.c                        |   9 +-
 mm/nobootmem.c                      |   7 +-
 mm/page_alloc.c                     | 398 +++++++++++++++++++++++++++++++-----
 mm/vmscan.c                         |   6 +-
 15 files changed, 507 insertions(+), 98 deletions(-)

-- 
2.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [RFC PATCH 0/14] Parallel memory initialisation
@ 2015-04-16  7:51 Daniel J Blueman
  2015-04-20  3:15 ` Daniel J Blueman
  0 siblings, 1 reply; 34+ messages in thread
From: Daniel J Blueman @ 2015-04-16  7:51 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Steffen Persvold, Linux-MM, Robin Holt, Nathan Zimmer,
	Daniel Rahn, Davidlohr Bueso, Dave Hansen, Tom Vaden,
	Scott Norton, LKML

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]

On Monday, April 13, 2015 at 6:20:05 PM UTC+8, Mel Gorman wrote:
 > Memory initialisation had been identified as one of the reasons why 
large
 > machines take a long time to boot. Patches were posted a long time 
ago
 > that attempted to move deferred initialisation into the page 
allocator
 > paths. This was rejected on the grounds it should not be necessary 
to hurt
 > the fast paths to parallelise initialisation. This series reuses 
much of
 > the work from that time but defers the initialisation of memory to 
kswapd
 > so that one thread per node initialises memory local to that node. 
The
 > issue is that on the machines I tested with, memory initialisation 
was not
 > a major contributor to boot times. I'm posting the RFC to both 
review the
 > series and see if it actually helps users of very large machines.
 >
 > After applying the series and setting the appropriate Kconfig 
variable I
 > see this in the boot log on a 64G machine
 >
 > [    7.383764] kswapd 0 initialised deferred memory in 188ms
 > [    7.404253] kswapd 1 initialised deferred memory in 208ms
 > [    7.411044] kswapd 3 initialised deferred memory in 216ms
 > [    7.411551] kswapd 2 initialised deferred memory in 216ms
 >
 > On a 1TB machine, I see
 >
 > [   11.913324] kswapd 0 initialised deferred memory in 1168ms
 > [   12.220011] kswapd 2 initialised deferred memory in 1476ms
 > [   12.245369] kswapd 3 initialised deferred memory in 1500ms
 > [   12.271680] kswapd 1 initialised deferred memory in 1528ms
 >
 > Once booted the machine appears to work as normal. Boot times were 
measured
 > from the time shutdown was called until ssh was available again.  In 
the
 > 64G case, the boot time savings are negligible. On the 1TB machine, 
the
 > savings were 10 seconds (about 8% improvement on kernel times but 
1-2%
 > overall as POST takes so long).
 >
 > It would be nice if the people that have access to really large 
machines
 > would test this series and report back if the complexity is 
justified.

Nice work!

On an older Numascale system with 1TB memory and 256 cores/32 NUMA 
nodes, platform init takes 52s (cold boot), firmware takes 84s 
(includes one warm reboot), stock linux 4.0 then takes 732s to boot [1] 
(due to the 700ns roundtrip, RMW cache-coherent cycles due to the 
temporal writes for pagetable init and per-core store queue limits), so 
there is huge potential.

Alas I ran into crashing during list manipulation [2] which list 
debugging detects [3]; I had started adding some debug [4], but need to 
look a bit deeper into it. I annotated the time of the output from cold 
power on.

Thanks,
  Daniel

[1] https://resources.numascale.com/telemetry/defermem/console-stock.txt
[2] 
https://resources.numascale.com/telemetry/defermem/console-patched.txt
[3] 
https://resources.numascale.com/telemetry/defermem/console-patched-debug.txt

-- [4]

static void free_pcppages_bulk(struct zone *zone, int count,
					struct per_cpu_pages *pcp)
...
		pr_err("migrate_type=%d\n", migratetype);

		/* This is the only non-empty list. Free them all. */
		if (batch_free == MIGRATE_PCPTYPES)
			batch_free = to_free;

		do {
			int mt;	/* migratetype of the to-be-freed page */

			pr_err("list_empty=%d\n", list_empty(list));

[-- Attachment #2: Type: text/html, Size: 4893 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2015-04-20  3:15 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-13 10:16 [RFC PATCH 0/14] Parallel memory initialisation Mel Gorman
2015-04-13 10:16 ` [PATCH 01/14] memblock: Introduce a for_each_reserved_mem_region iterator Mel Gorman
2015-04-13 10:16 ` [PATCH 02/14] mm: meminit: Move page initialization into a separate function Mel Gorman
2015-04-13 10:16 ` [PATCH 03/14] mm: meminit: Only set page reserved in the memblock region Mel Gorman
2015-04-13 10:16 ` [PATCH 04/14] mm: page_alloc: Pass PFN to __free_pages_bootmem Mel Gorman
2015-04-13 10:16 ` [PATCH 05/14] mm: meminit: Make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid Mel Gorman
2015-04-13 10:16 ` [PATCH 06/14] mm: meminit: Inline some helper functions Mel Gorman
2015-04-13 10:16 ` [PATCH 07/14] mm: meminit: Partially initialise memory if CONFIG_DEFERRED_MEM_INIT is set Mel Gorman
2015-04-13 10:17 ` [PATCH 08/14] mm: meminit: Initialise remaining memory in parallel with kswapd Mel Gorman
2015-04-13 10:17 ` [PATCH 09/14] mm: meminit: Minimise number of pfn->page lookups during initialisation Mel Gorman
2015-04-13 10:17 ` [PATCH 10/14] x86: mm: Enable deferred memory initialisation on x86-64 Mel Gorman
2015-04-13 18:21   ` Paul Bolle
2015-04-13 10:17 ` [PATCH 11/14] mm: meminit: Control parallel memory initialisation from command line and config Mel Gorman
2015-04-13 10:17 ` [PATCH 12/14] mm: meminit: Free pages in large chunks where possible Mel Gorman
2015-04-13 10:17 ` [PATCH 13/14] mm: meminit: Reduce number of times pageblocks are set during initialisation Mel Gorman
2015-04-13 10:17 ` [PATCH 14/14] mm: meminit: Remove mminit_verify_page_links Mel Gorman
2015-04-13 10:29 ` [RFC PATCH 0/14] Parallel memory initialisation Mel Gorman
2015-04-15 13:15 ` Waiman Long
2015-04-15 13:38   ` Mel Gorman
2015-04-15 14:50     ` Waiman Long
2015-04-15 15:44       ` Mel Gorman
2015-04-15 21:37         ` nzimmer
2015-04-16 18:20     ` Waiman Long
2015-04-15 14:27   ` Peter Zijlstra
2015-04-15 14:34     ` Mel Gorman
2015-04-15 14:48       ` Peter Zijlstra
2015-04-15 16:18         ` Waiman Long
2015-04-15 16:42           ` Norton, Scott J
2015-04-16  7:25 ` Andrew Morton
2015-04-16  8:46   ` Mel Gorman
2015-04-16 17:26     ` Andrew Morton
2015-04-16 17:37       ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-04-16  7:51 Daniel J Blueman
2015-04-20  3:15 ` Daniel J Blueman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).