public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 23 second kernel compile (aka which patches help scalibility on NUMA)
@ 2002-03-09  5:47 Martin J. Bligh
  2002-03-09 16:43 ` Erik Andersen
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Martin J. Bligh @ 2002-03-09  5:47 UTC (permalink / raw)
  To: lse-tech; +Cc: linux-kernel

"time make -j32 bzImage" is now down to 23 seconds.
(16 way NUMA-Q, 700MHz P3's, 4Gb RAM).

Below is a description of which patches helped get there.

	Start (2.4.18)
47s	
	{make NUMA local memory allocation work}
	memalloc-15setup                       (Pat Gaughen)
	memalloc-16discont                     (Pat Gaughen)
	pageallocnull fix + force CONFIG_NUMA  (Martin Bligh)
27s
	{O(1) scheduler}
	sched-O1-2.4.18-pre8-K3.patch          (Ingo Molnar)
25s
	{NUMA scheduler}
	numaK3.patch                           (Mike Kravetz)
24s
	{dcache cacheline bouncing fixes}
	dcache/fast_walkA2-2.4.18.patch        (Hanna Linder)
23s

Appling Ingo's patch alone took time from 47s to 30s.
The benefits after the local mem stuff aren't quite as
stunning, but still good.

Top 10 profile hitters left:

 21439 total                                      0.0228
  9112 default_idle                             175.2308
  3364 _text_lock_swap                           62.2963
   790 lru_cache_add                              8.5870
   750 _text_lock_namei                           0.7184
   587 do_anonymous_page                          1.7681
   572 lru_cache_del                             26.0000
   569 do_generic_file_read                       0.5117
   510 __free_pages_ok                            0.9733
   421 _text_lock_dec_and_lock                   17.5417
   318 _text_lock_read_write                      2.6949

Big locks left:

pagemap_lru_lock
20.2% 57.1%  5.4us(  86us)  111us(  16ms)(14.7%)   1014988 42.9% 57.1%    0%  

pagecache_lock
17.5% 31.3%  7.5us(  99us)   52us(4023us)( 2.4%)    631988 68.7% 31.3%    0% 

others:
dcache_lock (much improved, but still work to be done)
BKL (isn't it always ;-)

Planned work next:

1. Try John Stultz's mcslocks 
	(note high max wait vs low max hold currently)
2. Try rmap + pagemap_lru_breakup from Arjan
3. Try radix tree pagecache.
4. Try grafting NUMA-Q page local alloc onto -aa tree
5. Try SGI NUMA zone ordering stuff.
6. [HARD] Break up ZONE_NORMAL between nodes 
   (all currently on node 0).

Any other suggestions are welcome. I'd also be interested
to know if 23s is fast for make bzImage, or if other big
iron machines can kick this around the room.

Thanks,

Martin.

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: 23 second kernel compile (aka which patches help scalibility on NUMA)
@ 2002-03-09 19:44 Dieter Nützel
  2002-03-09 20:19 ` Martin J. Bligh
  0 siblings, 1 reply; 14+ messages in thread
From: Dieter Nützel @ 2002-03-09 19:44 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Linux Kernel List, Andrea Arcange, Ingo Molnar, Robert Love,
	Oleg Drokin, ReiserFS List

On Saturday, 9. März 2002 05:47:04, Martin J. Bligh wrote:
> "time make -j32 bzImage" is now down to 23 seconds.
> (16 way NUMA-Q, 700MHz P3's, 4Gb RAM).

I want such a beast, too:-)))

[-]
Planned work next:

1. Try John Stultz's mcslocks 
        (note high max wait vs low max hold currently)
2. Try rmap + pagemap_lru_breakup from Arjan
3. Try radix tree pagecache.
4. Try grafting NUMA-Q page local alloc onto -aa tree
5. Try SGI NUMA zone ordering stuff.
6. [HARD] Break up ZONE_NORMAL between nodes 
   (all currently on node 0).
[-]

No flamewar intended, but shouldn't you start with 4. and 5.?
-aa is the way to go for the 2.4.18+ tree. -rmap later for 2.5.x.

Have you tried the OOM case?
vm_29 and before fixed it for me.
Throughput is much improved with -aa.

Have you checked latency?
I found weird behavior of latest O(1)-K3 with latencytest0.42-png and higher 
latency then with clean 2.4.18.
Do you have some former O(1) versions around? Ingo removed them form his 
archive.

Preemption?

Running 2.4.19-pre2-dn1 :-)
Taken from -jam3:
00-vm-29
01-vm-io-3
10-x86-fast-pte-1
11-spinlock-cacheline-3
12-clone-flags
20-sched-O1-K3
21-sched-balance
22-sched-aa-fixes
23-lowlatency-mini
24-read-latency-2
30-aic7xxx-6.2.5
31-ide-20020215
all latest ReiserFS stuff 2.4.18.pending
preempt-kernel-rml-2.4.18-rc1-ingo-K3-1.patch
lock-break-rml-2.4.18-1

Regards,
	Dieter

BTW Anyone out there who have a copy of the mem "test" prog handy?
I've accidentally removed one of my development folders...

Would be nice to see some "Hammer" systems from IBM next winter;-)

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: Dieter.Nuetzel@hamburg.de

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-03-11 18:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-09  5:47 23 second kernel compile (aka which patches help scalibility on NUMA) Martin J. Bligh
2002-03-09 16:43 ` Erik Andersen
2002-03-09 17:53   ` Martin J. Bligh
2002-03-09 18:37     ` Fabio Massimo Di Nitto
2002-03-11 18:23     ` Timothy D. Witham
2002-03-11  2:14 ` Andrea Arcangeli
2002-03-11  2:23   ` Rik van Riel
2002-03-11  4:12     ` Andrea Arcangeli
2002-03-11 10:45 ` Denis Vlasenko
  -- strict thread matches above, loose matches on Subject: below --
2002-03-09 19:44 Dieter Nützel
2002-03-09 20:19 ` Martin J. Bligh
2002-03-09 21:07   ` Andrea Arcangeli
2002-03-10  9:26   ` Samuel Ortiz
2002-03-10 17:13     ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox