* Re: 2.5.50-BK + 24 CPUs
@ 2002-12-08 21:22 Manfred Spraul
2002-12-08 21:28 ` William Lee Irwin III
0 siblings, 1 reply; 11+ messages in thread
From: Manfred Spraul @ 2002-12-08 21:22 UTC (permalink / raw)
To: anton; +Cc: linux-kernel
Anton wrote:
>schedules:
> 56283 total
> 41984 pipe_wait
> 9746 do_work
> 1949 do_exit
> 1834 sys_wait4
>
>ie during the compile we scheduled 56283 times, and 41984 of them were
>caused by pipes.
>
The linux pipe implementation has only a page sized buffer - with 4 kB
pages, transfering 1 MB through a pipe means at 512 context switches.
--
Manfred
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: 2.5.50-BK + 24 CPUs
2002-12-08 21:22 2.5.50-BK + 24 CPUs Manfred Spraul
@ 2002-12-08 21:28 ` William Lee Irwin III
2002-12-08 23:22 ` David S. Miller
0 siblings, 1 reply; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-08 21:28 UTC (permalink / raw)
To: Manfred Spraul; +Cc: anton, linux-kernel
Anton wrote:
>> ie during the compile we scheduled 56283 times, and 41984 of them were
>> caused by pipes.
On Sun, Dec 08, 2002 at 10:22:03PM +0100, Manfred Spraul wrote:
> The linux pipe implementation has only a page sized buffer - with 4 kB
> pages, transfering 1 MB through a pipe means at 512 context switches.
Hmm. What happened to that pipe buffer size increase patch? That sounds
like it might help here, but only if those things are trying to shove
more than 4KB through the pipe at a time.
Bill
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-08 21:28 ` William Lee Irwin III
@ 2002-12-08 23:22 ` David S. Miller
2002-12-08 23:01 ` William Lee Irwin III
2002-12-09 17:03 ` Manfred Spraul
0 siblings, 2 replies; 11+ messages in thread
From: David S. Miller @ 2002-12-08 23:22 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Manfred Spraul, anton, linux-kernel
On Sun, 2002-12-08 at 13:28, William Lee Irwin III wrote:
> Hmm. What happened to that pipe buffer size increase patch? That sounds
> like it might help here, but only if those things are trying to shove
> more than 4KB through the pipe at a time.
You probably mean the zero-copy pipe patches, which I think really
should go in. The most recent version of the diffs I saw didn't
use the zero copy bits unless the trasnfers were quite large so it
should be ok and not pessimize small transfers.
That patch has been gathering cobwebs for more than a year now when I
first did it, let's push this in already :-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-08 23:22 ` David S. Miller
@ 2002-12-08 23:01 ` William Lee Irwin III
2002-12-09 17:03 ` Manfred Spraul
1 sibling, 0 replies; 11+ messages in thread
From: William Lee Irwin III @ 2002-12-08 23:01 UTC (permalink / raw)
To: David S. Miller; +Cc: Manfred Spraul, anton, linux-kernel
On Sun, 2002-12-08 at 13:28, William Lee Irwin III wrote:
>> Hmm. What happened to that pipe buffer size increase patch? That sounds
>> like it might help here, but only if those things are trying to shove
>> more than 4KB through the pipe at a time.
On Sun, Dec 08, 2002 at 03:22:58PM -0800, David S. Miller wrote:
> You probably mean the zero-copy pipe patches, which I think really
> should go in. The most recent version of the diffs I saw didn't
> use the zero copy bits unless the trasnfers were quite large so it
> should be ok and not pessimize small transfers.
> That patch has been gathering cobwebs for more than a year now when I
> first did it, let's push this in already :-)
I was actually referring to one that explicitly used larger pipe
buffers, but this sounds useful too.
Bill
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-08 23:22 ` David S. Miller
2002-12-08 23:01 ` William Lee Irwin III
@ 2002-12-09 17:03 ` Manfred Spraul
2002-12-09 20:15 ` David S. Miller
1 sibling, 1 reply; 11+ messages in thread
From: Manfred Spraul @ 2002-12-09 17:03 UTC (permalink / raw)
To: David S. Miller; +Cc: William Lee Irwin III, anton, linux-kernel
David S. Miller wrote:
>On Sun, 2002-12-08 at 13:28, William Lee Irwin III wrote:
>
>
>>Hmm. What happened to that pipe buffer size increase patch? That sounds
>>like it might help here, but only if those things are trying to shove
>>more than 4KB through the pipe at a time.
>>
>>
>
>You probably mean the zero-copy pipe patches, which I think really
>should go in. The most recent version of the diffs I saw didn't
>use the zero copy bits unless the trasnfers were quite large so it
>should be ok and not pessimize small transfers.
>
>That patch has been gathering cobwebs for more than a year now when I
>first did it, let's push this in already :-)
>
>
Unfortunately zero-copy doesn't help to avoid the schedules:
Zero copy just avoid the copy to kernel - you still need one schedule
for each page to be transfered.
writer calls
for(;;){
prepare_data(buf);
write(fd,buf,PAGE_SIZE);
}
reader calls
for(;;) {
read(fd,buf,PAGE_SIZE);
use_data(buf);
}
What's needed is a large kernel buffer - I've seen buffers between 64
and 256 kB in other unices.
zero copy only helps lmbench and other apps where the whole working set
fits into the cpu cache.
The difference between
main-mem->cache;cache->main_mem [non-zerocopy]
and
main-mem->main-mem [zerocopy, the copy to kernel is skipped]
is small.
--
Manfred
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: 2.5.50-BK + 24 CPUs
2002-12-09 17:03 ` Manfred Spraul
@ 2002-12-09 20:15 ` David S. Miller
2002-12-09 21:12 ` Manfred Spraul
0 siblings, 1 reply; 11+ messages in thread
From: David S. Miller @ 2002-12-09 20:15 UTC (permalink / raw)
To: manfred; +Cc: wli, anton, linux-kernel
From: Manfred Spraul <manfred@colorfullife.com>
Date: Mon, 09 Dec 2002 18:03:10 +0100
Unfortunately zero-copy doesn't help to avoid the schedules:
Zero copy just avoid the copy to kernel - you still need one schedule
for each page to be transfered.
The zerocopy patches copied up to 64k (or rather, 16 pages, something
like that) at once, that's going to lead to 16 times less schedules.
The 64k number was decided arbitrarily (it's what freebsd's pipe code
uses) and it can be experimented with.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-09 20:15 ` David S. Miller
@ 2002-12-09 21:12 ` Manfred Spraul
0 siblings, 0 replies; 11+ messages in thread
From: Manfred Spraul @ 2002-12-09 21:12 UTC (permalink / raw)
To: David S. Miller; +Cc: wli, anton, linux-kernel
David S. Miller wrote:
> From: Manfred Spraul <manfred@colorfullife.com>
> Date: Mon, 09 Dec 2002 18:03:10 +0100
>
> Unfortunately zero-copy doesn't help to avoid the schedules:
> Zero copy just avoid the copy to kernel - you still need one schedule
> for each page to be transfered.
>
>The zerocopy patches copied up to 64k (or rather, 16 pages, something
>like that) at once, that's going to lead to 16 times less schedules.
>
>The 64k number was decided arbitrarily (it's what freebsd's pipe code
>uses) and it can be experimented with.
>
>
Only if user space writes in 64 kB chunks - if user space writes 4 kB
chunks, then zerocopy doesn't help much against schedule [depending on
the implementation, it halves the number of schedules].
And page table tricks (COW) tricks are not acceptable.
--
Manfred
^ permalink raw reply [flat|nested] 11+ messages in thread
* 2.5.50-BK + 24 CPUs
@ 2002-12-08 13:09 Anton Blanchard
2002-12-08 14:49 ` Rik van Riel
0 siblings, 1 reply; 11+ messages in thread
From: Anton Blanchard @ 2002-12-08 13:09 UTC (permalink / raw)
To: linux-kernel
Hi,
I found time to run a few benchmarks over a largish machine (24 way
ppc64) running 2.5.50-BK from a few days ago.
1. kernel compile benchmark (ie build an x86 2.4.18 kernel)
I hijacked /proc/profile to log functions where we call schedule from.
It shows:
schedules:
56283 total
41984 pipe_wait
9746 do_work
1949 do_exit
1834 sys_wait4
ie during the compile we scheduled 56283 times, and 41984 of them were
caused by pipes. Simple fix, remove -pipe from the Makefile of the
kernel I was building:
schedules:
8497 total
3665 do_work
1878 do_exit
1824 sys_wait4
306 cpu_idle
260 open_namei
256 pipe_wait
Much nicer. Does it make sense to use -pipe in our kernel Makefile these
days? Note "do_work" is a ppc64 assembly function which checks
need_resched and calls schedule if the timeslice has been exceeded. So
its nice to see almost all of the schedules are due to timeslice
expiration, processes exiting or processes doing a wait().
Now we can look at the profile:
profile:
66260 total
54227 cpu_idle
1000 page_remove_rmap
909 __get_page_state
830 page_add_rmap
753 save_remaining_regs
646 do_anonymous_page
529 do_page_fault
475 release_pages
468 pSeries_flush_hash_range
462 pSeries_hpte_insert
266 __copy_tofrom_user
215 zap_pte_range
214 sys_brk
210 __pagevec_lru_add_active
209 buffered_rmqueue
201 find_get_page
185 vm_enough_memory
183 nr_free_pages
Mostly idle time, theres a limit to how much we can parallelize here.
Note: save_remaining_regs is the syscall/interrupt entry path for ppc64.
2. dbench 24
Lets not pay too much attention here but there are a few things to keep
in mind:
schedules:
1635314 total
753694 cpu_idle
357910 ext2_new_block
289189 ext2_free_blocks
123788 ext2_new_inode
95025 ext2_free_inode
Whee, look at all the schedules we took inside the ext2 code. Of course
its due to the superblock lock semaphore.
profile:
370142 total
302615 cpu_idle
8600 __copy_tofrom_user
3119 schedule
2760 current_kernel_time
Lots of idle time in part due to the superblock lock (oh yeah and my
slow to react finger stopping profiling after the benchmark finished).
current_kernel_time makes a recent appearance in the profile, we are
working on a number of things to address this.
3. "University workload"
A benchmark that does lots of shell scripts, cc, troff, etc.
schedules:
470212 total
126262 do_work
86986 ext2_free_blocks
58039 ext2_new_block
53627 cpu_idle
43140 ext2_new_inode
30934 ext2_free_inode
19849 do_exit
18526 sys_wait4
The superblock lock semaphore makes an appearance in the schedule
summary again (ext2_*). Now for the profile:
profile:
136296 total
41592 cpu_idle
16319 page_remove_rmap
7338 page_add_rmap
3583 save_remaining_regs
3072 pSeries_flush_hash_range
2832 release_pages
2584 do_page_fault
2281 find_get_page
2238 pSeries_hpte_insert
2117 copy_page_range
2085 current_kernel_time
2028 zap_pte_range
1886 __get_page_state
1689 atomic_dec_and_lock
No big suprises in the profile. This benchmark tends to be a worst case
scenario for rmap, think of 100s of shells all mapping the same text
pages.
Anton
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-08 13:09 Anton Blanchard
@ 2002-12-08 14:49 ` Rik van Riel
2002-12-08 16:45 ` Rik van Riel
0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2002-12-08 14:49 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
On Mon, 9 Dec 2002, Anton Blanchard wrote:
> profile:
> 66260 total
> 54227 cpu_idle
> 1000 page_remove_rmap
> 909 __get_page_state
> 830 page_add_rmap
Looks like the bitflag locking in rmap is hurting you.
How does it work with a real spinlock in the struct page
instead of using a bit in page->flags ?
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"october@surriel.com">october@surriel.com</a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.5.50-BK + 24 CPUs
2002-12-08 14:49 ` Rik van Riel
@ 2002-12-08 16:45 ` Rik van Riel
2002-12-09 14:08 ` Anton Blanchard
0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2002-12-08 16:45 UTC (permalink / raw)
To: Anton Blanchard; +Cc: linux-kernel
On Sun, 8 Dec 2002, Rik van Riel wrote:
> On Mon, 9 Dec 2002, Anton Blanchard wrote:
>
> > profile:
> > 66260 total
> > 54227 cpu_idle
> > 1000 page_remove_rmap
> > 909 __get_page_state
> > 830 page_add_rmap
>
> Looks like the bitflag locking in rmap is hurting you.
> How does it work with a real spinlock in the struct page
> instead of using a bit in page->flags ?
In particular, something like the (completely untested) patch
below. Yes, this patch is on the wrong side of the space/time
tradeoff for machines with highmem, but it might be worth it
for 64 bit machines, especially those with slow bitops.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"october@surriel.com">october@surriel.com</a>
===== include/linux/mm.h 1.97 vs edited =====
--- 1.97/include/linux/mm.h Thu Nov 7 08:48:53 2002
+++ edited/include/linux/mm.h Sun Dec 8 14:36:44 2002
@@ -169,6 +169,7 @@
* protected by PG_chainlock */
pte_addr_t direct;
} pte;
+ spinlock_t ptechain_lock; /* Lock for pte.chain and pte.direct */
unsigned long private; /* mapping-private opaque data */
/*
===== include/linux/rmap-locking.h 1.1 vs edited =====
--- 1.1/include/linux/rmap-locking.h Sun Sep 1 17:56:32 2002
+++ edited/include/linux/rmap-locking.h Sun Dec 8 14:37:49 2002
@@ -14,20 +14,10 @@
* busywait with less bus contention for a good time to
* attempt to acquire the lock bit.
*/
- preempt_disable();
-#ifdef CONFIG_SMP
- while (test_and_set_bit(PG_chainlock, &page->flags)) {
- while (test_bit(PG_chainlock, &page->flags))
- cpu_relax();
- }
-#endif
+ spin_lock(&page->ptechain_lock);
}
static inline void pte_chain_unlock(struct page *page)
{
-#ifdef CONFIG_SMP
- smp_mb__before_clear_bit();
- clear_bit(PG_chainlock, &page->flags);
-#endif
- preempt_enable();
+ spin_unlock(&page->ptechain_lock);
}
===== mm/page_alloc.c 1.135 vs edited =====
--- 1.135/mm/page_alloc.c Mon Dec 2 18:31:01 2002
+++ edited/mm/page_alloc.c Sun Dec 8 14:39:06 2002
@@ -1129,6 +1129,7 @@
struct page *page = lmem_map + local_offset + i;
set_page_zone(page, nid * MAX_NR_ZONES + j);
set_page_count(page, 0);
+ page->ptechain_lock = SPIN_LOCK_UNLOCKED;
SetPageReserved(page);
INIT_LIST_HEAD(&page->list);
#ifdef WANT_PAGE_VIRTUAL
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-12-09 21:05 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-08 21:22 2.5.50-BK + 24 CPUs Manfred Spraul
2002-12-08 21:28 ` William Lee Irwin III
2002-12-08 23:22 ` David S. Miller
2002-12-08 23:01 ` William Lee Irwin III
2002-12-09 17:03 ` Manfred Spraul
2002-12-09 20:15 ` David S. Miller
2002-12-09 21:12 ` Manfred Spraul
-- strict thread matches above, loose matches on Subject: below --
2002-12-08 13:09 Anton Blanchard
2002-12-08 14:49 ` Rik van Riel
2002-12-08 16:45 ` Rik van Riel
2002-12-09 14:08 ` Anton Blanchard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox