All of lore.kernel.org
 help / color / mirror / Atom feed
* [parisc-linux] 2.6.10-rc1-pa11 profile data
@ 2004-11-11  7:54 Grant Grundler
  2004-11-11  8:11 ` Randolph Chung
  2004-11-12  5:29 ` [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler
  0 siblings, 2 replies; 25+ messages in thread
From: Grant Grundler @ 2004-11-11  7:54 UTC (permalink / raw)
  To: parisc-linux

I was comparing "time" output for various flavors of
kernels and arches. We are something like 11m/17m/5m
for real/user/sys on a dual j6700 (dual 750Mhz, running
2.6.10-rc1-pa11-32SMP kernel building a 64-bit kernel
(using gcc 3.0.4). Similar numbers for J6000 (dual 550Mhz)
doing a 32-bit kernel build (gcc 3.3.x): 14m/22m/5m.

While this might look very favorable to a similar full kernel
build on a 1.5Ghz RX2600 which takes about as long (11m/20m/1m), 
the ia64 machine spends less than 1m in the kernel.

I've collect two profiles for -64SMP and will collect
some UP profiles tomorrow. profiles so far are measuring
a full kernel build. I expect I'll do the same for -64UP
kernels too.

What I have so far is on:
	http://www.parisc-linux.org/~grundler/prof-j6700/

d- and i-cache flushing routines are still the top consumers.

hth,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11  7:54 [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler
@ 2004-11-11  8:11 ` Randolph Chung
  2004-11-11 17:39   ` Carlos O'Donell
                     ` (5 more replies)
  2004-11-12  5:29 ` [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler
  1 sibling, 6 replies; 25+ messages in thread
From: Randolph Chung @ 2004-11-11  8:11 UTC (permalink / raw)
  To: Grant Grundler; +Cc: parisc-linux

> I've collect two profiles for -64SMP and will collect
> some UP profiles tomorrow. profiles so far are measuring
> a full kernel build. I expect I'll do the same for -64UP
> kernels too.

hmm.. interesting. top consumers are (with idle loop functions removed)

 40646 flush_kernel_icache_page                 406.4600
  7364 fdsync                                   368.2000
 10567 flush_user_dcache_range_asm              293.5278
 10387 flush_user_icache_range_asm              288.5278
 21409 __clear_user_page_asm                    191.1518
  5356 _spin_lock_irqsave                       111.5833
  1768 fisync                                   110.5000
  1928 _spin_lock                                48.2000
  4255 purge_kernel_dcache_page                  42.5500
   339 $lclu_done                                42.3750
  4089 flush_kernel_dcache_page                  40.8900
  5053 copy_user_page_asm                        33.2434
   569 _write_unlock_irq                         17.7812
   422 _spin_unlock                              17.5833
  1567 find_vma_prev                             16.3229
   181 $lslen_loop                               15.0833
    96 $lslen_done                               12.0000
   996 _write_trylock                            11.3182
   137 $lsfu_loop                                 8.5625
   748 flush_user_dcache_page                     7.4800

we really need to do better at cache flushing..... anybody have any
ideas? :)

but looking at the other ones:
- __clear_user_page_asm can be optimized for 64-bit by writing 8 bytes
  at a time instead of 4
- _spin_lock* needs investigation to see if we have some bad locks
  someplace. lockmeter anybody?
- *lclu* can be rewritten to do better than 1-byte at a time
- copy_user_page_asm can be sped up slightly by using pa_memcpy, but not
  much when i tried last time
- *lslen* can also probably be written in a smarter way... 

i suspect some areas for further investigation are:
- can we do tlb_flush_mm() in a smarter way for SMP?
- can we improve kernel entry time for interrupts (and syscalls) by
  being smarter about what we save on the stack? (i.e. only callee-save
  registers and not all the registers?)

volunteers? :)

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11  8:11 ` Randolph Chung
@ 2004-11-11 17:39   ` Carlos O'Donell
  2004-11-11 17:42     ` Randolph Chung
  2004-11-11 18:23   ` Joel Soete
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Carlos O'Donell @ 2004-11-11 17:39 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

> - can we improve kernel entry time for interrupts (and syscalls) by
>   being smarter about what we save on the stack? (i.e. only callee-save
>   registers and not all the registers?)
> 
> volunteers? :)

I have been stewing over the following:

Leave the existing syscall save everything code in place.

Create a branch infront of the syscall save everything code that
branches on the value of "enable_fast_syscall"

The variable is set via some mechanism. What's the currently accepted
way? /proc twiddle?

The branched code path contains the fast callee-save only register.

Allow a compile time option to switch kernel syscalls to the 'fast'
function call ABI method for people that know they are installing 
on a recent glibc.

That's much later on my todo list, but because sometimes I get
frustrated with binutils I go to work on other things for a break :)

Cheers,
Carlos.

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11 17:39   ` Carlos O'Donell
@ 2004-11-11 17:42     ` Randolph Chung
  2004-11-11 17:50       ` Matthew Wilcox
  0 siblings, 1 reply; 25+ messages in thread
From: Randolph Chung @ 2004-11-11 17:42 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

> I have been stewing over the following:
> 
> Leave the existing syscall save everything code in place.
> 
> Create a branch infront of the syscall save everything code that
> branches on the value of "enable_fast_syscall"

eh? nononono. we should *always* be able to only preserve callee-saved
registers. From the application point of view, when they call e.g.
read(), it is a function call. The app should not expect any
caller-saved registers to be preserved across the function/system call.

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11 17:42     ` Randolph Chung
@ 2004-11-11 17:50       ` Matthew Wilcox
  2004-11-11 17:59         ` Randolph Chung
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2004-11-11 17:50 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

On Thu, Nov 11, 2004 at 09:42:58AM -0800, Randolph Chung wrote:
> eh? nononono. we should *always* be able to only preserve callee-saved
> registers. From the application point of view, when they call e.g.
> read(), it is a function call. The app should not expect any
> caller-saved registers to be preserved across the function/system call.

As I'm sure you already know, we do have to be careful to avoid leaking
kernel-internal or another task's information in the registers that
are call-clobbered.  I know some architectures do this by having a
kernel exit path that deliberately clobbers as many registers as possible.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11 17:50       ` Matthew Wilcox
@ 2004-11-11 17:59         ` Randolph Chung
  2004-11-11 18:36           ` Grant Grundler
  0 siblings, 1 reply; 25+ messages in thread
From: Randolph Chung @ 2004-11-11 17:59 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux

> As I'm sure you already know, we do have to be careful to avoid leaking
> kernel-internal or another task's information in the registers that
> are call-clobbered.  I know some architectures do this by having a
> kernel exit path that deliberately clobbers as many registers as possible.

sure, we can zero all the call clobbered registers on exit. But not
having to save all of those pesky floating pointer registers and half a
dozen general registers should still be a huge win.

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11  8:11 ` Randolph Chung
  2004-11-11 17:39   ` Carlos O'Donell
@ 2004-11-11 18:23   ` Joel Soete
  2004-11-11 18:51     ` Randolph Chung
  2004-11-26 16:59   ` flush_kernel_[di]cache_page question? [WAS: " Joel Soete
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Joel Soete @ 2004-11-11 18:23 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux



Randolph Chung wrote:
>>I've collect two profiles for -64SMP and will collect
>>some UP profiles tomorrow. profiles so far are measuring
>>a full kernel build. I expect I'll do the same for -64UP
>>kernels too.
> 
> 
> hmm.. interesting. top consumers are (with idle loop functions removed)
> 
>  40646 flush_kernel_icache_page                 406.4600
>   7364 fdsync                                   368.2000
>  10567 flush_user_dcache_range_asm              293.5278
>  10387 flush_user_icache_range_asm              288.5278
mmm (may be another stupid remarks but) I noticed that:
     748 flush_user_dcache_page                     7.4800
    648 flush_user_icache_page                     6.4800
   4255 purge_kernel_dcache_page                  42.5500
  10567 flush_user_dcache_range_asm              293.5278
  10387 flush_user_icache_range_asm              288.5278
  40646 flush_kernel_icache_page                 406.4600
     10 flush_kernel_icache_range_asm              0.0862

i.e. flush_kernel_[di]cache_page is few used versus flush_kernel_[di]cache_range_asm while flush_user_[di]cache_range_asm is more 
used then flush_user_[di]cache_page.

Isn't it strange?

[...]

mmm also:
  49576 machine_restart                          774.6250

??
(I don't understand because stat were cleaned "readprofile -r" before the build)

> 
> we really need to do better at cache flushing..... anybody have any
> ideas? :)
> 
> but looking at the other ones:
> - __clear_user_page_asm can be optimized for 64-bit by writing 8 bytes
>   at a time instead of 4
> - _spin_lock* needs investigation to see if we have some bad locks
>   someplace. lockmeter anybody?
> - *lclu* can be rewritten to do better than 1-byte at a time
> - copy_user_page_asm can be sped up slightly by using pa_memcpy, but not
>   much when i tried last time
> - *lslen* can also probably be written in a smarter way... 
> 
> i suspect some areas for further investigation are:
> - can we do tlb_flush_mm() in a smarter way for SMP?
> - can we improve kernel entry time for interrupts (and syscalls) by
>   being smarter about what we save on the stack? (i.e. only callee-save
>   registers and not all the registers?)
> 
> volunteers? :)
> 
I couldn't realy help more but I will take a look in more details from time to time :)

Thanks,
	Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11 17:59         ` Randolph Chung
@ 2004-11-11 18:36           ` Grant Grundler
  0 siblings, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2004-11-11 18:36 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

On Thu, Nov 11, 2004 at 09:59:33AM -0800, Randolph Chung wrote:
> sure, we can zero all the call clobbered registers on exit. But not
> having to save all of those pesky floating pointer registers and half a
> dozen general registers should still be a huge win.

Randolph and I talked about this more privately.
In a nutshell, "huge win" is slightly overstating it and we agree
fixing the cache utilization would be a much bigger win.

Randolph thinks we can save 20 load and stores per interrupt
and potential context switches. The thinking is we are saving/restoring
some registers twice and should split the save/restore between
interrupt/trap and context switch code. So if no context switch is
performed, we only save/restore a subset of the registers manually
and the rest are preserved according to the ABI.
Did I get that right?

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11 18:23   ` Joel Soete
@ 2004-11-11 18:51     ` Randolph Chung
  0 siblings, 0 replies; 25+ messages in thread
From: Randolph Chung @ 2004-11-11 18:51 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> i.e. flush_kernel_[di]cache_page is few used versus 
> flush_kernel_[di]cache_range_asm while flush_user_[di]cache_range_asm is 
> more used then flush_user_[di]cache_page.
> 
> Isn't it strange?

could it be that kernel mappings tend to be bigger and user mappings
tend to be smaller? i'm only guessing here...

> mmm also:
>  49576 machine_restart                          774.6250

this is an artifact of the way the measurements are done. these are
actually calls to cpu_idle().

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11  7:54 [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler
  2004-11-11  8:11 ` Randolph Chung
@ 2004-11-12  5:29 ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2004-11-12  5:29 UTC (permalink / raw)
  To: parisc-linux

On Thu, Nov 11, 2004 at 12:54:31AM -0700, Grant Grundler wrote:
> I've collect two profiles for -64SMP and will collect
> some UP profiles tomorrow.

"tomorrow" finally arrived. :^)

> 	http://www.parisc-linux.org/~grundler/prof-j6700/

I've added the 64-bit UP profile numbers as promised.
And some of the top consumers look familiar:
root@gggj6k:~# sort -rnk 3 prof-2.6.10-rc1-pa11-64-01.txt
 40150 flush_kernel_icache_page                 401.5000
  6798 fdsync                                   339.9000
 10645 flush_user_dcache_range_asm              295.6944
 10353 flush_user_icache_range_asm              287.5833
 13871 machine_restart                          216.7344
 20839 __clear_user_page_asm                    186.0625
 10478 cpu_idle                                 145.5278
  1380 fisync                                    86.2500
   365 $lclu_done                                45.6250
  3794 purge_kernel_dcache_page                  37.9400
  3535 flush_kernel_dcache_page                  35.3500
  3279 copy_user_page_asm                        21.5724
  1358 find_get_page                             18.8611
   185 $lslen_loop                               15.4167
  1228 find_vma_prev                             12.7917
   101 $lslen_done                               12.6250
   128 $lsfu_loop                                 8.0000
   162 file_ra_state_init                         6.7500
...


enjoy!
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* flush_kernel_[di]cache_page question? [WAS: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-11  8:11 ` Randolph Chung
  2004-11-11 17:39   ` Carlos O'Donell
  2004-11-11 18:23   ` Joel Soete
@ 2004-11-26 16:59   ` Joel Soete
  2004-11-26 17:13     ` Randolph Chung
  2004-11-26 19:02     ` Grant Grundler
  2004-11-28 21:01   ` [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data] Joel Soete
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Joel Soete @ 2004-11-26 16:59 UTC (permalink / raw)
  To: Randolph Chung, Grant Grundler; +Cc: parisc-linux

Hello all,

> 
> hmm.. interesting. top consumers are (with idle loop functions removed)=

> 
>  40646 flush_kernel_icache_page                 406.4600
[...]
>   4089 flush_kernel_dcache_page                  40.8900
[...]
> we really need to do better at cache flushing..... anybody have any
> ideas? :)
> 
Is somebody can help me to understand those:
[...]
flush_kernel_dcache_page:
	.proc
	.callinfo NO_CALLS
	.entry

	ldil    L%dcache_stride,%r1
	ldw     R%dcache_stride(%r1),%r23

#ifdef __LP64__
	depdi,z 1,63-PAGE_SHIFT,1,%r25
#else
	depwi,z 1,31-PAGE_SHIFT,1,%r25
#endif
	add     %r26,%r25,%r25
	sub     %r25,%r23,%r25


1:      fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	fdc,m   %r23(%r26)
	CMPB<<  %r26,%r25,1b
	fdc,m   %r23(%r26)

	sync
	bv      %r0(%r2)
	nop
	.exit

	.procend
	
	.export flush_user_dcache_page

[...]
flush_kernel_icache_page:
	.proc
	.callinfo NO_CALLS
	.entry

	ldil    L%icache_stride,%r1
	ldw     R%icache_stride(%r1),%r23

#ifdef __LP64__
	depdi,z 1,63-PAGE_SHIFT,1,%r25
#else
	depwi,z 1,31-PAGE_SHIFT,1,%r25
#endif
	add     %r26,%r25,%r25
	sub     %r25,%r23,%r25


1:      fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	fic,m   %r23(%r26)
	CMPB<<  %r26,%r25,1b
	fic,m   %r23(%r26)

	sync
	bv      %r0(%r2)
	nop
	.exit

	.procend
[...]

I try google on the p-l m-l but I didn't reach to find out why those chai=
n
of 15 f[di]c,m?

Thanks in advance for your attention,
    Joel


-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR




_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: flush_kernel_[di]cache_page question? [WAS: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-26 16:59   ` flush_kernel_[di]cache_page question? [WAS: " Joel Soete
@ 2004-11-26 17:13     ` Randolph Chung
  2004-11-26 19:02     ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Randolph Chung @ 2004-11-26 17:13 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> I try google on the p-l m-l but I didn't reach to find out why those chain
> of 15 f[di]c,m?

16, actually (including the one in the delay slot of the cmpib).

,m is post increment, so we are just doing an unrolled loop of

fdc offset(address)
offset = offset + cache_stride

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: flush_kernel_[di]cache_page question? [WAS: [parisc-linux] 2.6.10-rc1-pa11 profile data
  2004-11-26 16:59   ` flush_kernel_[di]cache_page question? [WAS: " Joel Soete
  2004-11-26 17:13     ` Randolph Chung
@ 2004-11-26 19:02     ` Grant Grundler
  1 sibling, 0 replies; 25+ messages in thread
From: Grant Grundler @ 2004-11-26 19:02 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Fri, Nov 26, 2004 at 05:59:18PM +0100, Joel Soete wrote:
> >  40646 flush_kernel_icache_page                 406.4600
> [...]
> >   4089 flush_kernel_dcache_page                  40.8900
> [...]
> > we really need to do better at cache flushing..... anybody have any
> > ideas? :)
> >
> Is somebody can help me to understand those:
> [...]
> flush_kernel_dcache_page:

Joel,
the problem is not in the flush_kernel_dcache_page() routine.
The problem is we are calling too often.

grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-11  8:11 ` Randolph Chung
                     ` (2 preceding siblings ...)
  2004-11-26 16:59   ` flush_kernel_[di]cache_page question? [WAS: " Joel Soete
@ 2004-11-28 21:01   ` Joel Soete
  2004-11-28 21:13     ` Matthew Wilcox
  2004-12-01 17:44   ` More questions " Joel Soete
  2004-12-03 15:00   ` *lcul and memory granularity question[Was: " Joel Soete
  5 siblings, 1 reply; 25+ messages in thread
From: Joel Soete @ 2004-11-28 21:01 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

Hello all,


Randolph Chung wrote:
>>I've collect two profiles for -64SMP and will collect
>>some UP profiles tomorrow. profiles so far are measuring
>>a full kernel build. I expect I'll do the same for -64UP
>>kernels too.
> 
> 
> hmm.. interesting. top consumers are (with idle loop functions removed)
> 
>  40646 flush_kernel_icache_page                 406.4600
>   7364 fdsync                                   368.2000
>  10567 flush_user_dcache_range_asm              293.5278
>  10387 flush_user_icache_range_asm              288.5278

I have additional question about such functions:

	* in parisc above ..._dcache_... refer well to data cache?
	* and respectively ..._icache_... refer to instruction cache?

Have they different meaning for generic linux?

The confusion came for me from:

include/asm-parisc/cacheflush.h:
[...]
#define flush_icache_page(vma,page)   do { flush_kernel_dcache_page(page_address(page)); 
flush_kernel_icache_page(page_address(page)); } while (0)
[...]

Thanks again,
	Joel

PS: I didn't suspect any error, I am just confused :(


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-28 21:01   ` [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data] Joel Soete
@ 2004-11-28 21:13     ` Matthew Wilcox
  2004-11-29  1:14       ` Michael S. Zick
  0 siblings, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2004-11-28 21:13 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Sun, Nov 28, 2004 at 09:01:42PM +0000, Joel Soete wrote:
> I have additional question about such functions:
> 
> 	* in parisc above ..._dcache_... refer well to data cache?
> 	* and respectively ..._icache_... refer to instruction cache?
> 
> Have they different meaning for generic linux?

No, your understanding is correct (see Documentation/cachetlb.txt)

> The confusion came for me from:
> 
> include/asm-parisc/cacheflush.h:
> [...]
> #define flush_icache_page(vma,page)   do { 
> flush_kernel_dcache_page(page_address(page)); 
> flush_kernel_icache_page(page_address(page)); } while (0)
> [...]

I see why this confuses you.  PA-RISC has writeback data caches that
are non-coherent with the instruction cache.  So it's not enough to
just flush the icache; if the page has been modified, we need to force
the data in the dcache back to ram, then remove any existing cache for
instructions in that page.  Then instruction accesses to that page will
fetch the correct data from memory and everything will work.

Many other architectures have writethrough data caches.  They don't need
to flush the dcache.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-28 21:13     ` Matthew Wilcox
@ 2004-11-29  1:14       ` Michael S. Zick
  2004-11-29  2:00         ` Matthew Wilcox
  0 siblings, 1 reply; 25+ messages in thread
From: Michael S. Zick @ 2004-11-29  1:14 UTC (permalink / raw)
  To: parisc-linux

On Sun November 28 2004 15:13, Matthew Wilcox wrote:
> On Sun, Nov 28, 2004 at 09:01:42PM +0000, Joel Soete wrote:
> > 
> > include/asm-parisc/cacheflush.h:
> > [...]
> > #define flush_icache_page(vma,page)   do { 
> > flush_kernel_dcache_page(page_address(page)); 
> > flush_kernel_icache_page(page_address(page)); } while (0)
> > [...]
> 
> I see why this confuses you.  PA-RISC has writeback data caches that
> are non-coherent with the instruction cache.  So it's not enough to
> just flush the icache; if the page has been modified, we need to force
> the data in the dcache back to ram, then remove any existing cache for
> instructions in that page.  Then instruction accesses to that page will
> fetch the correct data from memory and everything will work.
> 
Matt, Joel,
Here is a, perhaps dumb, question from a non-parisc source...

I note Matt's statement: "...then remove any existing cache for
instructions in that page."

Which sounds very reasonable.

Question:
Is the:
> > flush_kernel_icache_page(page_address(page)); 
(or the hardware that receives the command)

smart enough to just mark the page 'invalid' or is it
actually an 'absolute update external storage'?

I ask because both flush commands are written the
same, BUT...
The first should be an 'absolute update external
storage'.
The second should be either just a 'mark
invalid' or 'conditional update external storage'.

Mike

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-29  1:14       ` Michael S. Zick
@ 2004-11-29  2:00         ` Matthew Wilcox
  0 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox @ 2004-11-29  2:00 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: parisc-linux

On Sun, Nov 28, 2004 at 07:14:14PM -0600, Michael S. Zick wrote:
> Question:
> Is the:
> > > flush_kernel_icache_page(page_address(page)); 
> (or the hardware that receives the command)
> 
> smart enough to just mark the page 'invalid' or is it
> actually an 'absolute update external storage'?

The I-cache is, by definition, read-only, so there's nothing to update
main memory with.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-11  8:11 ` Randolph Chung
                     ` (3 preceding siblings ...)
  2004-11-28 21:01   ` [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data] Joel Soete
@ 2004-12-01 17:44   ` Joel Soete
  2004-12-01 17:56     ` Matthew Wilcox
  2004-12-03 10:24     ` Joel Soete
  2004-12-03 15:00   ` *lcul and memory granularity question[Was: " Joel Soete
  5 siblings, 2 replies; 25+ messages in thread
From: Joel Soete @ 2004-12-01 17:44 UTC (permalink / raw)
  To: Randolph Chung, Grant Grundler; +Cc: parisc-linux

[...]
> 
> but looking at the other ones:
> - __clear_user_page_asm can be optimized for 64-bit by writing 8 bytes
>   at a time instead of 4
Is it the idea (principle only):
--- arch/parisc/kernel/pacache.S.Orig	2004-12-01 17:09:53.000000000 +0100=

+++ arch/parisc/kernel/pacache.S-t1	2004-12-01 17:58:16.000000000 +0100
@@ -505,6 +505,16 @@
 	ldi 64,%r1
 
 1:
+#ifdef __LP64__
+	std	%r0,0(%r28)
+	std	%r0,8(%r28)
+	std	%r0,16(%r28)
+	std	%r0,24(%r28)
+	std	%r0,32(%r28)
+	std	%r0,40(%r28)
+	std	%r0,48(%r28)
+	std	%r0,56(%r28)
+#else
 	stw %r0,0(%r28)
 	stw %r0,4(%r28)
 	stw %r0,8(%r28)
@@ -521,6 +531,7 @@
 	stw %r0,52(%r28)
 	stw %r0,56(%r28)
 	stw %r0,60(%r28)
+#endif
 	ADDIB>  -1,%r1,1b
 	ldo 64(%r28),%r28

I doubt that's enough because I don't yet find how r0 is set?

[...]
Thanks in advance for your attention,
    Joel

-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR




_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-01 17:44   ` More questions " Joel Soete
@ 2004-12-01 17:56     ` Matthew Wilcox
  2004-12-01 18:33       ` Joel Soete
  2004-12-03 10:24     ` Joel Soete
  1 sibling, 1 reply; 25+ messages in thread
From: Matthew Wilcox @ 2004-12-01 17:56 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Wed, Dec 01, 2004 at 06:44:44PM +0100, Joel Soete wrote:
> I doubt that's enough because I don't yet find how r0 is set?

r0 is magic on PA.  Writes are discarded, reads return 0.  It's the
hardware equivalent of /dev/zero ;-)

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-01 17:56     ` Matthew Wilcox
@ 2004-12-01 18:33       ` Joel Soete
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Soete @ 2004-12-01 18:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: parisc-linux



> 
> On Wed, Dec 01, 2004 at 06:44:44PM +0100, Joel Soete wrote:
> > I doubt that's enough because I don't yet find how r0 is set?
> 
> r0 is magic on PA.  Writes are discarded, reads return 0.  It's the
> hardware equivalent of /dev/zero ;-)
> 
Cool that make the stuff so ?
And iirc registers are 64bit wide on pa2.0?

Thanks,
    Joel


-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR




_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-01 17:44   ` More questions " Joel Soete
  2004-12-01 17:56     ` Matthew Wilcox
@ 2004-12-03 10:24     ` Joel Soete
  2004-12-03 15:41       ` Randolph Chung
  1 sibling, 1 reply; 25+ messages in thread
From: Joel Soete @ 2004-12-03 10:24 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

Hello all,

I need more advise:

Joel Soete wrote:
> [...]
> 
>>but looking at the other ones:
>>- __clear_user_page_asm can be optimized for 64-bit by writing 8 bytes
>>  at a time instead of 4
> 
> Is it the idea (principle only):
> --- arch/parisc/kernel/pacache.S.Orig	2004-12-01 17:09:53.000000000 +0100
> +++ arch/parisc/kernel/pacache.S-t1	2004-12-01 17:58:16.000000000 +0100
> @@ -505,6 +505,16 @@
>  	ldi 64,%r1
>  
>  1:
> +#ifdef __LP64__
> +	std	%r0,0(%r28)
> +	std	%r0,8(%r28)
> +	std	%r0,16(%r28)
> +	std	%r0,24(%r28)
> +	std	%r0,32(%r28)
> +	std	%r0,40(%r28)
> +	std	%r0,48(%r28)
> +	std	%r0,56(%r28)
> +#else
>  	stw %r0,0(%r28)
>  	stw %r0,4(%r28)
>  	stw %r0,8(%r28)
> @@ -521,6 +531,7 @@
>  	stw %r0,52(%r28)
>  	stw %r0,56(%r28)
>  	stw %r0,60(%r28)
> +#endif
>  	ADDIB>  -1,%r1,1b
>  	ldo 64(%r28),%r28
> 
I test it on my b2k with 64bit kernel and it works but it didn't seems to bring me any benefit?
As far as I can believe readprofile: the ratio between the first and last column is always the same (1/80 in this case).
What did I miss?
My hope was at least a befenit of 1/2 as I reduce insn number in the same rate.
Right now my idea is to write a small test case to see if using the same number on insn but with less loop will help or not?


Thanks again for more advise,
	Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* *lcul and memory granularity question[Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-11-11  8:11 ` Randolph Chung
                     ` (4 preceding siblings ...)
  2004-12-01 17:44   ` More questions " Joel Soete
@ 2004-12-03 15:00   ` Joel Soete
  2004-12-03 15:13     ` Matthew Wilcox
  5 siblings, 1 reply; 25+ messages in thread
From: Joel Soete @ 2004-12-03 15:00 UTC (permalink / raw)
  Cc: parisc-linux

Hello all,

Randolph Chung wrote:
> - *lclu* can be rewritten to do better than 1-byte at a time

I have an additional question about parisc alignment and this remark:
a char type var is 1byte align; ... but what's about a 3, 5, 7 and more bytes struct size?

My idea is that a 3bytes could be align as a 32bites word and clearing such struct could be done by clearing all the word;
the same for 5 and 7 bytes if aligned as 2*32bites and so on for an unrolled loop of the max cache size (128 bytes iirc);
and btw using a case define as we use for __put/get__user/kernel_asm?

Or the memory management is more complex then I imagine and I would really consider a 3bytes as 2+1 bytes (5=2*2+1, ...)?

Thanks a lot,
	Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: *lcul and memory granularity question[Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-03 15:00   ` *lcul and memory granularity question[Was: " Joel Soete
@ 2004-12-03 15:13     ` Matthew Wilcox
  0 siblings, 0 replies; 25+ messages in thread
From: Matthew Wilcox @ 2004-12-03 15:13 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Fri, Dec 03, 2004 at 03:00:33PM +0000, Joel Soete wrote:
> >- *lclu* can be rewritten to do better than 1-byte at a time
> 
> I have an additional question about parisc alignment and this remark:
> a char type var is 1byte align; ... but what's about a 3, 5, 7 and more 
> bytes struct size?

Don't think in terms of structs, think in terms of an arbitrary array of
bytes.  You can't assume anything about the alignment of lclu or the size.
In particular, you can't write more than the requested number of bytes.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-03 10:24     ` Joel Soete
@ 2004-12-03 15:41       ` Randolph Chung
  2004-12-07 14:42         ` Joel Soete
  0 siblings, 1 reply; 25+ messages in thread
From: Randolph Chung @ 2004-12-03 15:41 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

> I test it on my b2k with 64bit kernel and it works but it didn't seems to 
> bring me any benefit?
> As far as I can believe readprofile: the ratio between the first and last 
> column is always the same (1/80 in this case).
> What did I miss?

what workload did you test this on? possibly you only see a benefit with
workloads that need to do a lot of page clearings.... so you probably
want to find a workload so that the first column is >>1

randolph
-- 
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: More questions [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data]
  2004-12-03 15:41       ` Randolph Chung
@ 2004-12-07 14:42         ` Joel Soete
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Soete @ 2004-12-07 14:42 UTC (permalink / raw)
  To: Randolph Chung; +Cc: parisc-linux

Hello Randolph,

> 
> 
> > I test it on my b2k with 64bit kernel and it works but it didn't seem=
s
> to 
> > bring me any benefit?
> > As far as I can believe readprofile: the ratio between the first and
last
> 
> > column is always the same (1/80 in this case).
> > What did I miss?
> 
> what workload did you test this on? possibly you only see a benefit wit=
h
> workloads that need to do a lot of page clearings.... so you probably
> want to find a workload so that the first column is >>1
> 
I re-do the following test with 2.6.10-rc3-pa2 with this b2k and 64bits
kernel:
readprofile -r ; make V=3D1 vmlinux 2>&1 | tee /var/logs/k-2.6.10-rc3-pa2=
-b2k64
; readprofile > /var/logs/prof2b-2.6.10-rc3-pa2-b2k64-3

I build first the kernel from cvs and reboot it to obtain following resul=
t:
/var/logs/prof2b-2.6.10-rc3-pa2-b2k64-3: 15500 __clear_user_page_asm   
                138.3929 (i.e. 1/112)

I apply previous mentioned lclu patch, rebuild again and reboot to rebuil=
d
a 4th time this kernel to obtain:
/var/logs/prof2b-2.6.10-rc3-pa2-b2k64-4: 12609 __clear_user_page_asm   
                112.5804 (i.e. 1/112)

Interesting: the ratio stay cst between test but the number of clock tick=
s
was well reduced (so I presume a potential benefit even though small ;-)

hth,
    Joel


-------------------------------------------------------------------------=
--
Tiscali vous offre 3 mois d'ADSL et 3 mois de DVD gratuits...profitez-en.=
..
http://reg.tiscali.be/adsl/default.asp?lg=3DFR




_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-12-07 14:42 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-11  7:54 [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler
2004-11-11  8:11 ` Randolph Chung
2004-11-11 17:39   ` Carlos O'Donell
2004-11-11 17:42     ` Randolph Chung
2004-11-11 17:50       ` Matthew Wilcox
2004-11-11 17:59         ` Randolph Chung
2004-11-11 18:36           ` Grant Grundler
2004-11-11 18:23   ` Joel Soete
2004-11-11 18:51     ` Randolph Chung
2004-11-26 16:59   ` flush_kernel_[di]cache_page question? [WAS: " Joel Soete
2004-11-26 17:13     ` Randolph Chung
2004-11-26 19:02     ` Grant Grundler
2004-11-28 21:01   ` [id]cache meaning? [Was: [parisc-linux] 2.6.10-rc1-pa11 profile data] Joel Soete
2004-11-28 21:13     ` Matthew Wilcox
2004-11-29  1:14       ` Michael S. Zick
2004-11-29  2:00         ` Matthew Wilcox
2004-12-01 17:44   ` More questions " Joel Soete
2004-12-01 17:56     ` Matthew Wilcox
2004-12-01 18:33       ` Joel Soete
2004-12-03 10:24     ` Joel Soete
2004-12-03 15:41       ` Randolph Chung
2004-12-07 14:42         ` Joel Soete
2004-12-03 15:00   ` *lcul and memory granularity question[Was: " Joel Soete
2004-12-03 15:13     ` Matthew Wilcox
2004-11-12  5:29 ` [parisc-linux] 2.6.10-rc1-pa11 profile data Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.