* copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
[not found] ` <41C5D761.4030004@tiscali.be>
@ 2004-12-19 20:27 ` Joel Soete
0 siblings, 0 replies; 18+ messages in thread
From: Joel Soete @ 2004-12-19 20:27 UTC (permalink / raw)
To: parisc-linux, Randolph Chung
Hello pa*,
Here is the last exchange I had with ggg:
Joel Soete wrote:
>
>
> Grant Grundler wrote:
>
>> On Sat, Dec 18, 2004 at 09:38:34PM +0000, Joel Soete wrote:
>>
>>> interesting results:
>>> on b2k 64bit kernel:
>>> user 18m52.169s
>>> # including all those patch
>>> user 18m46.224s
>>> much better then clear_user_page_asm :-)
>>
>>
>>
>> "much better" is slightly overstated for a 0.62 % improvement.
>> That's a 7 second improvement over a 1132 second time frame.
>>
> Ok with the previous changes of clear_user_page_asm() the test case let
> us expected some overall improvement (less ticks consumption :))
> but the final time results were absolutly desapointing istr:
> I test it on my b2k and here are the results:
> time make V=1 vmlinux (under 2.6.10-rc3-pa3-cvs)
> real 23m23.239s
> user 18m50.141s
> sys 4m21.203s
>
> time make V=1 vmlinux (under 2.6.10-rc3-pa4-patch clear_user_...)
> real 23m18.552s
> user 18m50.534s
> sys 4m16.903s
>
> :_(
>
>> Sorry - I can't see what changed to bring about this improvement.
>> Can you point that out to me?
>
> that make sense as you rejected previous patche, I will re-write it as
> #ifdef __LP64__ so; NP
>
>> I suspect there might be something else involved - perhaps this
>> difference is just within the noise of the test.
>>
> Ok I will make much more run ;-)
>
>> BTW, I'll be on the road for the rest of the week...I won't be
>> able to pursue this until after I get back. If there are more
>> changes you want to the file, please post them to the list.
>>
> NP ;-) (have a nice travel and be carefull)
>
> Thanks again for all,
> Joel
>
regarding this subject:
Joel Soete wrote:
> Grant,
>
> still some more thought (and btw additional questions)
>
> As we choose to keep the same number of insn in this loop, I was looking
> for something like:
> #ifdef __LP64__
> #define STR std
> #else
> #define STR stw
> #endif
>
> but figure out that already exist and so why not using addtional
> displacement #define and re-write loop like:
> --- arch/parisc/kernel/pacache.S.New 2004-12-18 15:39:11.000000000 +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 19:19:53.862854692 +0100
> @@ -288,6 +288,49 @@
>
> .procend
>
> +#ifdef __LP64__
> + /* PREFETCH (Write) has not (yet) been proven to help here */
> +/* # define PREFETCHW_OP ldd 256(%0), %r0 */
> +
> +#define INCR 128 /* Loop's INCRement */
> +#define LN 32 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
> +#define D0 0 /* 1st insn displacement */
> +#define D1 8 /* 2d insn displacement */
> +#define D2 16
> +#define D3 24
> +#define D4 32
> +#define D5 40
> +#define D6 48
> +#define D7 56
> +#define D8 64
> +#define D9 72
> +#define D10 80
> +#define D11 88
> +#define D12 96
> +#define D13 104
> +#define D14 112
> +#define D15 120 /* last insn displacement */
> +#else /* !__LP64__ */
> +#define INCR 64 /* Loop's INCRement */
> +#define LN 64 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 64 */
> +#define D0 0 /* 1st insn displacement */
> +#define D1 4 /* 2d insn displacement */
> +#define D2 8
> +#define D3 12
> +#define D4 16
> +#define D5 20
> +#define D6 24
> +#define D7 28
> +#define D8 32
> +#define D9 36
> +#define D10 40
> +#define D11 44
> +#define D12 48
> +#define D13 52
> +#define D14 56
> +#define D15 60 /* last insn displacement */
> +#endif /* __LP64__ */
> +
> .export copy_user_page_asm,code
>
> copy_user_page_asm:
> @@ -502,55 +545,26 @@
>
> pdtlb 0(%r28)
>
> -#ifdef __LP64__
> - ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
> -
> - /* PREFETCH (Write) has not (yet) been proven to help here */
> -/* #define PREFETCHW_OP ldd 256(%0), %r0 */
> + ldi LN, %r1 /* PAGE_SIZE/INCR == LN */
>
> -1: std %r0, 0(%r28)
> - std %r0, 8(%r28)
> - std %r0, 16(%r28)
> - std %r0, 24(%r28)
> - std %r0, 32(%r28)
> - std %r0, 40(%r28)
> - std %r0, 48(%r28)
> - std %r0, 56(%r28)
> - std %r0, 64(%r28)
> - std %r0, 72(%r28)
> - std %r0, 80(%r28)
> - std %r0, 88(%r28)
> - std %r0, 96(%r28)
> - std %r0, 104(%r28)
> - std %r0, 112(%r28)
> - std %r0, 120(%r28)
> +1: STREG %r0, D0(%r28)
> + STREG %r0, D1(%r28)
> + STREG %r0, D2(%r28)
> + STREG %r0, D3(%r28)
> + STREG %r0, D4(%r28)
> + STREG %r0, D5(%r28)
> + STREG %r0, D6(%r28)
> + STREG %r0, D7(%r28)
> + STREG %r0, D8(%r28)
> + STREG %r0, D9(%r28)
> + STREG %r0, D10(%r28)
> + STREG %r0, D11(%r28)
> + STREG %r0, D12(%r28)
> + STREG %r0, D13(%r28)
> + STREG %r0, D14(%r28)
> + STREG %r0, D15(%r28)
> ADDIB> -1, %r1, 1b
> - ldo 128(%r28), %r28
> -
> -#else /* ! __LP64 */
> -
> - ldi 64, %r1 /* PAGE_SIZE/64 == 64 */
> -
> -1:
> - stw %r0, 0(%r28)
> - stw %r0, 4(%r28)
> - stw %r0, 8(%r28)
> - stw %r0, 12(%r28)
> - stw %r0, 16(%r28)
> - stw %r0, 20(%r28)
> - stw %r0, 24(%r28)
> - stw %r0, 28(%r28)
> - stw %r0, 32(%r28)
> - stw %r0, 36(%r28)
> - stw %r0, 40(%r28)
> - stw %r0, 44(%r28)
> - stw %r0, 48(%r28)
> - stw %r0, 52(%r28)
> - stw %r0, 56(%r28)
> - stw %r0, 60(%r28)
> - ADDIB> -1, %r1, 1b
> - ldo 64(%r28), %r28
> -#endif /* __LP64 */
> + ldo INCR(%r28), %r28
>
> bv %r0(%r2)
> nop
> =========> arch_parisc_kernel_pacache.S.diff4 <=========
this was definitely rejected:
Grant Grundler wrote:
> On Sat, Dec 18, 2004 at 07:38:21PM +0000, Joel Soete wrote:
>
>>Grant,
>>
>>still some more thought (and btw additional questions)
>>
>>As we choose to keep the same number of insn in this loop, I was looking
>>for something like:
>>#ifdef __LP64__
>>#define STR std
>>#else
>>#define STR stw
>>#endif
>>
>>but figure out that already exist and so why not using addtional
>>displacement #define and re-write loop like:
>>--- arch/parisc/kernel/pacache.S.New 2004-12-18 15:39:11.000000000 +0100
>>+++ arch/parisc/kernel/pacache.S 2004-12-18 19:19:53.862854692 +0100
>>@@ -288,6 +288,49 @@
>>
>> .procend
>>
>>+#ifdef __LP64__
>>+ /* PREFETCH (Write) has not (yet) been proven to help here */
>>+/* # define PREFETCHW_OP ldd 256(%0), %r0 */
>>+
>>+#define INCR 128 /* Loop's INCRement */
>>+#define LN 32 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
>>+#define D0 0 /* 1st insn displacement */
>>+#define D1 8 /* 2d insn displacement */
>
> ...
>
>
> Sorry - definitely not. That just obscures whats going on.
>
> grant
>
So will have to rewrite this one too
>
> mmm but copy_user_page_asm has the same structure and so why not using
> the same schema:
> --- arch/parisc/kernel/pacache.S.New1 2004-12-18 19:21:25.503229430
> +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 19:43:33.184799992 +0100
> @@ -338,7 +338,7 @@
> .callinfo NO_CALLS
> .entry
>
> - ldi 64, %r1
> + ldi LN, %r1
>
> /*
> * This loop is optimized for PCXL/PCXL2 ldw/ldw and stw/stw
> @@ -349,43 +349,41 @@
> * use ldd/std on a 32 bit kernel.
> */
>
> -
> -1:
> - ldw 0(%r25), %r19
> - ldw 4(%r25), %r20
> - ldw 8(%r25), %r21
> - ldw 12(%r25), %r22
> - stw %r19, 0(%r26)
> - stw %r20, 4(%r26)
> - stw %r21, 8(%r26)
> - stw %r22, 12(%r26)
> - ldw 16(%r25), %r19
> - ldw 20(%r25), %r20
> - ldw 24(%r25), %r21
> - ldw 28(%r25), %r22
> - stw %r19, 16(%r26)
> - stw %r20, 20(%r26)
> - stw %r21, 24(%r26)
> - stw %r22, 28(%r26)
> - ldw 32(%r25), %r19
> - ldw 36(%r25), %r20
> - ldw 40(%r25), %r21
> - ldw 44(%r25), %r22
> - stw %r19, 32(%r26)
> - stw %r20, 36(%r26)
> - stw %r21, 40(%r26)
> - stw %r22, 44(%r26)
> - ldw 48(%r25), %r19
> - ldw 52(%r25), %r20
> - ldw 56(%r25), %r21
> - ldw 60(%r25), %r22
> - stw %r19, 48(%r26)
> - stw %r20, 52(%r26)
> - stw %r21, 56(%r26)
> - stw %r22, 60(%r26)
> - ldo 64(%r26), %r26
> +1: LDREG D0(%r25), %r19
> + LDREG D1(%r25), %r20
> + LDREG D2(%r25), %r21
> + LDREG D3(%r25), %r22
> + STREG %r19, D0(%r26)
> + STREG %r20, D1(%r26)
> + STREG %r21, D2(%r26)
> + STREG %r22, D3(%r26)
> + LDREG D4(%r25), %r19
> + LDREG D5(%r25), %r20
> + LDREG D6(%r25), %r21
> + LDREG D7(%r25), %r22
> + STREG %r19, D4(%r26)
> + STREG %r20, D5(%r26)
> + STREG %r21, D6(%r26)
> + STREG %r22, D7(%r26)
> + LDREG D8(%r25), %r19
> + LDREG D9(%r25), %r20
> + LDREG D10(%r25), %r21
> + LDREG D11(%r25), %r22
> + STREG %r19, D8(%r26)
> + STREG %r20, D9(%r26)
> + STREG %r21, D10(%r26)
> + STREG %r22, D11(%r26)
> + LDREG D12(%r25), %r19
> + LDREG D13(%r25), %r20
> + LDREG D14(%r25), %r21
> + LDREG D15(%r25), %r22
> + STREG %r19, D12(%r26)
> + STREG %r20, D13(%r26)
> + STREG %r21, D14(%r26)
> + STREG %r22, D15(%r26)
> + ldo INCR(%r26), %r26
> ADDIB> -1, %r1, 1b
> - ldo 64(%r25), %r25
> + ldo INCR(%r25), %r25
>
> bv %r0(%r2)
> nop
> =========> arch_parisc_kernel_pacache.S.diff5 <=========
>
> And finaly why didn't we have to purge related dtlb entries as we did
> (see #if 0 below) and we do in clear_user_page_asm:
> --- arch/parisc/kernel/pacache.S.New2 2004-12-18 19:44:46.684937345
> +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 20:19:51.113544409 +0100
> @@ -338,6 +338,11 @@
> .callinfo NO_CALLS
> .entry
>
> + /* Purge any old translations */
> +
> + pdtlb 0(%r25)
> + pdtlb 0(%r26)
> +
> ldi LN, %r1
>
> /*
> =========> arch_parisc_kernel_pacache.S.diff6 <=========
>
But what's about this one?
What you opinion?
Thanks,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
[not found] <418A80E8000124B5@mail-6-bnl.tiscali.it>
@ 2004-12-27 7:36 ` Grant Grundler
2004-12-27 10:40 ` Joel Soete
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Grant Grundler @ 2004-12-27 7:36 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Tue, Dec 21, 2004 at 02:37:47PM +0100, Joel Soete wrote:
> Hello all,
Joel,
I trim your postings to only include the parts I need to respond to.
Could you please do the same?
I hate having to scroll down pages of stuff to get to your comment.
That's probably why no one else responded.
> As promised, here is a cleaner (?) patch:
> --- arch/parisc/kernel/pacache.S.Orig 2004-12-20 08:28:23.000000000 +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-20 14:49:35.000000000 +0100
> @@ -295,7 +295,52 @@
> .callinfo NO_CALLS
> .entry
>
> - ldi 64, %r1
> + pdtlb 0(%r25)
> + pdtlb 0(%r26)
Sorry - I missed why the pdtlb needs to be added.
Could you explain?
Won't the pdtlb guarantee at least one trap per page copied?
I would hope we guarantee the D-TLB is "clean" when calling this function.
> +#ifdef __LP64__
> +
> + ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
> +
> +1: ldd 0(%r25), %r19
> + ldd 8(%r25), %r20
> + ldd 16(%r25), %r21
> + ldd 24(%r25), %r22
> + std %r19, 0(%r26)
> + std %r20, 8(%r26)
> + std %r21, 16(%r26)
> + std %r22, 24(%r26)
This looks good.
PA2.0 can retire 2 loads and 2 stores per cycle IFF there are no dependencies.
can be executed in one cycle.
That means we want something like this:
+1: ldd 0(%r25), %r19
+ ldd 8(%r25), %r20
+ ldd 16(%r25), %r21
+ ldd 24(%r25), %r22
+ std %r19, 0(%r26)
+ std %r20, 8(%r26)
+ ldd 32(%r25), %r19
+ ldd 40(%r25), %r20
+ std %r21, 16(%r26)
+ std %r22, 24(%r26)
+ ldd 48(%r25), %r21
+ ldd 56(%r25), %r22
+ std %r19, 32(%r26)
+ std %r20, 40(%r26)
...
+ ldd 112(%r25), %r21
+ ldd 120(%r25), %r22
+ std %r19, 96(%r26)
+ std %r20, 104(%r26)
+ ldo 128(%r25), %r25
+ std %r21, 112(%r26)
+ std %r22, 120(%r26)
+ ADDIB> -1, %r1, 1b
+ ldo 128(%r26), %r26
...
[ Note that I've moved the "ldo" around as well!]
More distance between the "ldd %rX" and the corresponding
"std %rX" is generally a good thing.
This routine could use more registers in the loop to get more "distance".
It costs us 1 cycle to save two registers on the stack.
Once the data is in L1-Cache, IFF the CPU needs more than one cycle
to retire successive loads, we gain several cycles assuming additional
register pairs are used multiply times per loop.
Anyone know how many cycles ldd from L1 takes?
I expect gcc encodes those times so it can schedule stuff optimally.
But I've forgotten where to find the PA2.0 scheduling magic.
It might be worth just letting gcc unroll the loop for us since
SR0 (kernel) is implied in all the ldd/std instructions.
> - extrd,u %r26,56,32, %r26 /* convert phys addr to tlb insert format */
> - extrd,u %r23,56,32, %r23 /* convert phys addr to tlb insert format */
> - depd %r24,63,22, %r28 /* Form aliased virtual address 'to' */
> + extrd,u %r26,56,32, %r26 /* convert phys addr to tlb insert format */
> + extrd,u %r23,56,32, %r23 /* convert phys addr to tlb insert format */
> + depd %r24,63,22, %r28 /* Form aliased virtual address 'to' */
Please post white space changes as seperate patches.
> the loop used:
> export i=0 ; while [ $i -le 10 ] ; do make clean ; make oldconfig ; readprofile
3 to 5 iterations are sufficient for me (since they take so long).
> -r ; time make vmlinux ; readprofile >> /var/logs/prof.doc; i=$((i+1)) ;
> done 2>&1 | tee /var/logs/k-loop1
>
> * with original 2.6.10-rc3-pa8 running kernel
> # grep "^user" k-loop1
Please use "^sys" or "^real".
"user" time is only number that should NOT change with this patch.
> # grep copy_user_page_asm prof.doc
> 3254 copy_user_page_asm 20.3375
> 3273 copy_user_page_asm 20.4563
...
> * with 2.6.10-rc3-pa8 + patch and without "pdtlb 0(%r2[56])"
...
> # grep copy_user_page_asm prof.doc
> 1818 copy_user_page_asm 11.3625
> 1763 copy_user_page_asm 11.0188
> 1785 copy_user_page_asm 11.1562
...
This is clearly goodness.
> * with 2.6.10-rc3-pa8 + full patch
...
> # grep copy_user_page_asm prof.doc
> 1894 copy_user_page_asm 11.8375
> 1972 copy_user_page_asm 12.3250
> 1975 copy_user_page_asm 12.3438
> 1880 copy_user_page_asm 11.7500
> 1923 copy_user_page_asm 12.0188
I expect extra traps and/or time spent ordering the TLB operations.
pdtlb is costing about 8% performance in this routine.
I definitely want a clear explanation before adding this.
> So the main interest is to reduce the number of clock ticks :-)
Yes. :^)
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
@ 2004-12-27 10:40 ` Joel Soete
2004-12-27 15:08 ` James Bottomley
` (2 more replies)
2004-12-28 16:25 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case) Joel Soete
2004-12-30 8:10 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2 siblings, 3 replies; 18+ messages in thread
From: Joel Soete @ 2004-12-27 10:40 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Grant Grundler wrote:
> On Tue, Dec 21, 2004 at 02:37:47PM +0100, Joel Soete wrote:
>
>>Hello all,
>
>
> Joel,
> I trim your postings to only include the parts I need to respond to.
> Could you please do the same?
>
Apologies, I would just like to be as detailed as possible for the others who didn't follow our previous mail exchange before :-(
> I hate having to scroll down pages of stuff to get to your comment.
> That's probably why no one else responded.
>
I understand that make stuff too noisy
>
>
>>As promised, here is a cleaner (?) patch:
>>--- arch/parisc/kernel/pacache.S.Orig 2004-12-20 08:28:23.000000000 +0100
>>+++ arch/parisc/kernel/pacache.S 2004-12-20 14:49:35.000000000 +0100
>>@@ -295,7 +295,52 @@
>> .callinfo NO_CALLS
>> .entry
>>
>>- ldi 64, %r1
>>+ pdtlb 0(%r25)
>>+ pdtlb 0(%r26)
>
>
> Sorry - I missed why the pdtlb needs to be added.
> Could you explain?
Sorry no, that was a question of mine:
the previous inplementation of copy_user_page_asm() (between #if 0 ... #endif below in the code) started with:
[...]
/* Purge any old translations */
pdtlb 0(%r28)
pdtlb 0(%r29)
ldi 64, %r1
[...]
and we do the same in __clear_user_page_asm()
[...]
/* Purge any old translation */
pdtlb 0(%r28)
[...]
>
> Won't the pdtlb guarantee at least one trap per page copied?
> I would hope we guarantee the D-TLB is "clean" when calling this function.
>
Should be why it was removed but as far as I didn't find any explanation (that's obvious: that's nearly impossible to explain all
details of implementation ;-)
>
>>+#ifdef __LP64__
>>+
>>+ ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
>>+
>>+1: ldd 0(%r25), %r19
>>+ ldd 8(%r25), %r20
>>+ ldd 16(%r25), %r21
>>+ ldd 24(%r25), %r22
>>+ std %r19, 0(%r26)
>>+ std %r20, 8(%r26)
[...]
>
> This looks good.
>
> PA2.0 can retire 2 loads and 2 stores per cycle IFF there are no dependencies.
> can be executed in one cycle.
>
> That means we want something like this:
>
> +1: ldd 0(%r25), %r19
> + ldd 8(%r25), %r20
> + ldd 16(%r25), %r21
> + ldd 24(%r25), %r22
> + std %r19, 0(%r26)
> + std %r20, 8(%r26)
> + ldd 32(%r25), %r19
> + ldd 40(%r25), %r20
[...]
> + ldo 128(%r25), %r25
> + std %r21, 112(%r26)
> + std %r22, 120(%r26)
> + ADDIB> -1, %r1, 1b
> + ldo 128(%r26), %r26
> ...
>
> [ Note that I've moved the "ldo" around as well!]
>
> More distance between the "ldd %rX" and the corresponding
> "std %rX" is generally a good thing.
> This routine could use more registers in the loop to get more "distance".
Ok that was another possibility: I trust that we can use r23, r24 as far as:
r23-r26: these are arg3-arg0, i.e. you can use them if you
don't care about the values that were passed in anymore.
but not more of r3-r18 because:
r3-r18,r27,r30 need to be saved and restored. r3-r18 are just
general purpose registers. [...]
>
> It costs us 1 cycle to save two registers on the stack.
> Once the data is in L1-Cache, IFF the CPU needs more than one cycle
> to retire successive loads, we gain several cycles assuming additional
> register pairs are used multiply times per loop.
Well that (cache management) is still far beyond my skill :-(
[...]
>>- extrd,u %r26,56,32, %r26 /* convert phys addr to tlb insert format */
...
>>+ extrd,u %r26,56,32, %r26 /* convert phys addr to tlb insert format */
>
> Please post white space changes as seperate patches.
>
oops my bad (apologies)
>
[...]
>>* with original 2.6.10-rc3-pa8 running kernel
>># grep "^user" k-loop1
>
> Please use "^sys" or "^real".
> "user" time is only number that should NOT change with this patch.
>
I will try to recover those info
>
[...]
>
>>So the main interest is to reduce the number of clock ticks :-)
>
>
> Yes. :^)
>
Thanks for your patience and relevant remarks, I will come back we more material soon ;-)
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 10:40 ` Joel Soete
@ 2004-12-27 15:08 ` James Bottomley
2004-12-31 20:26 ` Michael S. Zick
2004-12-27 17:34 ` Joel Soete
2004-12-27 18:32 ` Joel Soete
2 siblings, 1 reply; 18+ messages in thread
From: James Bottomley @ 2004-12-27 15:08 UTC (permalink / raw)
To: Joel Soete; +Cc: PARISC list
On Mon, 2004-12-27 at 10:40 +0000, Joel Soete wrote:
> Should be why it was removed but as far as I didn't find any explanation (that's obvious: that's nearly impossible to explain all
> details of implementation ;-)
I haven't time to look through the patch, but I can explain what the
pdtlb's are about in pacache.S.
Both copy_user_page_asm and __clear_user_page_asm use something called
the tmpalias mapping. This is a 8MB reserved area that's used to prime
the user space cache. What you do is to set up a temporary mapping for
the target of the copy which is congruent to the user space address
somewhere in the tmpalias region. Then when you do the copy, the user
alias is automatically up to date as well (because the cache sees the
collision by virtue of its congruence properties).
It's a nice idea, but we've never been able to make it work in practise,
because the user page we're copying can be an executable page, and this
scheme only makes the d-cache correct. If we had a way of telling
whether it's a data page or and instruction page, we could make it work.
That's why the mechanism is #if 0'd out.
On the other hand, we can use it for clear_user_page, because no-one
ever wants to clear an executable page before returning it to the user.
James
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 10:40 ` Joel Soete
2004-12-27 15:08 ` James Bottomley
@ 2004-12-27 17:34 ` Joel Soete
2004-12-27 18:32 ` Joel Soete
2 siblings, 0 replies; 18+ messages in thread
From: Joel Soete @ 2004-12-27 17:34 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
Joel Soete wrote:
>
>
> Grant Grundler wrote:
>
>> On Tue, Dec 21, 2004 at 02:37:47PM +0100, Joel Soete wrote:
>>
[...]
>>> * with original 2.6.10-rc3-pa8 running kernel
>>> # grep "^user" k-loop1
>>
>>
>> Please use "^sys" or "^real".
>> "user" time is only number that should NOT change with this patch.
>>
> I will try to recover those info
>
Those results was:
k-loop1 (i.e. cvs 2.6.10-rc3-pa8)
real 23m7.594s
user 18m47.768s
sys 4m2.585s
real 22m53.506s
user 18m47.400s
sys 4m0.321s
real 22m54.599s
user 18m47.492s
sys 4m0.226s
real 22m53.410s
user 18m48.205s
sys 3m59.351s
k-loop2 (i.e. cvs 2.6.10-rc3-pa8 + patch without pdtlb)
real 23m4.170s
user 18m47.511s
sys 4m0.654s
real 22m59.651s
user 18m51.133s
sys 3m58.969s
real 23m0.391s
user 18m50.908s
sys 3m59.588s
real 22m59.401s
user 18m51.090s
sys 3m59.673s
k-loop3 (i.e. cvs 2.6.10-rc3-pa8 + full patch)
real 23m28.521s
user 18m53.815s
sys 3m57.967s
real 23m32.696s
user 18m54.045s
sys 3m58.598s
real 23m28.981s
user 18m54.774s
sys 3m58.128s
real 23m30.631s
user 18m54.405s
sys 3m58.974s
hth,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 10:40 ` Joel Soete
2004-12-27 15:08 ` James Bottomley
2004-12-27 17:34 ` Joel Soete
@ 2004-12-27 18:32 ` Joel Soete
2 siblings, 0 replies; 18+ messages in thread
From: Joel Soete @ 2004-12-27 18:32 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
Joel Soete wrote:
>
>
> Grant Grundler wrote:
>
>> On Tue, Dec 21, 2004 at 02:37:47PM +0100, Joel Soete wrote:
>>
>>> Hello all,
>>
[...]
>> This routine could use more registers in the loop to get more "distance".
>
> Ok that was another possibility: I trust that we can use r23, r24 as far
> as:
> r23-r26: these are arg3-arg0, i.e. you can use them if you
> don't care about the values that were passed in anymore.
>
Here is a first writing just to be sure I well understand:
#ifdef __LP64__
ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
1: ldd 0(%r25), %r19
ldd 8(%r25), %r20
ldd 16(%r25), %r21
ldd 24(%r25), %r22
ldd 32(%r25), %r23
ldd 40(%r25), %r24
std %r19, 0(%r26)
std %r20, 8(%r26)
std %r21, 16(%r26)
std %r22, 24(%r26)
std %r23, 32(%r26)
std %r24, 40(%r26)
ldd 48(%r25), %r19
ldd 56(%r25), %r20
ldd 64(%r25), %r21
ldd 72(%r25), %r22
ldd 80(%r25), %r23
ldd 88(%r25), %r24
std %r19, 48(%r26)
std %r20, 56(%r26)
std %r21, 64(%r26)
std %r22, 72(%r26)
std %r23, 80(%r26)
std %r24, 88(%r26)
ldd 96(%r25), %r19
ldd 104(%r25), %r20
ldd 112(%r25), %r21
ldd 120(%r25), %r22
std %r19, 96(%r26)
std %r20, 104(%r26)
std %r21, 112(%r26)
std %r22, 120(%r26)
ldo 128(%r26), %r26
ADDIB> -1, %r1, 1b
ldo 128(%r25), %r25
#else /* !__LP64__ */
just have to re-arrange with distance between couple std/ldd?
What do you think?
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case)
2004-12-27 7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-27 10:40 ` Joel Soete
@ 2004-12-28 16:25 ` Joel Soete
2004-12-29 5:46 ` Grant Grundler
2004-12-30 8:10 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2 siblings, 1 reply; 18+ messages in thread
From: Joel Soete @ 2004-12-28 16:25 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
[-- Attachment #1: Type: text/plain, Size: 2267 bytes --]
A test case may can help better to show improvement:
gcc -O2 -o cpup0 cpup0.c
gcc -march=2.0 -O2 -DLP64 -o cpup1 cpup0.c
gcc -march=2.0 -O2 -DLP64 -DV1 -o cpup2 cpup0.c
gcc -march=2.0 -O2 -DLP64 -DV2 -o cpup3 cpup0.c
Linux patst006 2.6.10-rc3-pa4-n4kmp #3 SMP Fri Dec 10 13:45:46 CET 2004 parisc64 GNU/Linux
# time ./cpup0 ; time ./cpup1; time ./cpup2 ; time ./cpup3
real 0m2.294s
user 0m0.226s
sys 0m2.068s
real 0m2.213s
user 0m0.140s
sys 0m2.074s
real 0m2.217s
user 0m0.108s
sys 0m2.110s
real 0m2.208s
user 0m0.108s
sys 0m2.100s
# time ./cpup0 ; time ./cpup1; time ./cpup2 ; time ./cpup3
real 0m2.316s
user 0m0.197s
sys 0m2.119s
real 0m2.217s
user 0m0.117s
sys 0m2.101s
real 0m2.203s
user 0m0.119s
sys 0m2.084s
real 0m2.205s
user 0m0.126s
sys 0m2.079s
# time ./cpup0 ; time ./cpup1; time ./cpup2 ; time ./cpup3
real 0m2.316s
user 0m0.194s
sys 0m2.122s
real 0m2.211s
user 0m0.126s
sys 0m2.086s
real 0m2.208s
user 0m0.106s
sys 0m2.102s
real 0m2.217s
user 0m0.113s
sys 0m2.105s
# time ./cpup0 ; time ./cpup1; time ./cpup2 ; time ./cpup3
real 0m2.311s
user 0m0.219s
sys 0m2.093s
real 0m2.222s
user 0m0.141s
sys 0m2.082s
real 0m2.207s
user 0m0.115s
sys 0m2.093s
real 0m2.208s
user 0m0.117s
sys 0m2.091s
# time ./cpup0 ; time ./cpup1; time ./cpup2 ; time ./cpup3
real 0m2.310s
user 0m0.205s
sys 0m2.105s
real 0m2.213s
user 0m0.104s
sys 0m2.109s
real 0m2.207s
user 0m0.115s
sys 0m2.092s
real 0m2.205s
user 0m0.108s
sys 0m2.096s
I would like here to know if the order could have importance?
# time ./cpup0 ; time ./cpup1; time ./cpup3 ; time ./cpup2
real 0m2.294s
user 0m0.196s
sys 0m2.100s
real 0m2.221s
user 0m0.111s
sys 0m2.111s
real 0m2.226s
user 0m0.097s
sys 0m2.130s
real 0m2.208s
user 0m0.107s
sys 0m2.101s
# time ./cpup0 ; time ./cpup3; time ./cpup2 ; time ./cpup1
real 0m2.302s
user 0m0.200s
sys 0m2.102s
real 0m2.206s
user 0m0.110s
sys 0m2.097s
real 0m2.213s
user 0m0.108s
sys 0m2.106s
real 0m2.214s
user 0m0.123s
sys 0m2.092s
# time ./cpup3 ; time ./cpup2; time ./cpup1 ; time ./cpup0
real 0m2.209s
user 0m0.104s
sys 0m2.105s
real 0m2.221s
user 0m0.115s
sys 0m2.106s
real 0m2.227s
user 0m0.111s
sys 0m2.116s
real 0m2.296s
user 0m0.212s
sys 0m2.085s
May be more improvement in 'more register used' (i.e. V2 and cpup3)?
Joel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: cpup0.c --]
[-- Type: text/x-csrc; name="cpup0.c", Size: 8594 bytes --]
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <asm/page.h>
void __copy_user_page_asm(void *to, void *from)
{
register unsigned long __to __asm__ ("r26") = (unsigned long)to;
register unsigned long __from __asm__ ("r25") = (unsigned long)from;
#ifdef LP64
asm volatile ("ldi 32, %%r1\n" /* PAGE_SIZE/128 == 32 */
#if V2
"1: ldd 0(%0), %%r19\n"
" ldd 8(%0), %%r20\n"
" ldd 16(%0), %%r21\n"
" ldd 24(%0), %%r22\n"
" std %%r19, 0(%1)\n"
" std %%r20, 8(%1)\n"
" ldd 32(%0), %%r23\n"
" ldd 40(%0), %%r24\n"
" std %%r21, 16(%1)\n"
" std %%r22, 24(%1)\n"
" ldd 48(%0), %%r19\n"
" ldd 56(%0), %%r20\n"
" std %%r23, 32(%1)\n"
" std %%r24, 40(%1)\n"
" ldd 64(%0), %%r21\n"
" ldd 72(%0), %%r22\n"
" std %%r19, 48(%1)\n"
" std %%r20, 56(%1)\n"
" ldd 80(%0), %%r23\n"
" ldd 88(%0), %%r24\n"
" std %%r21, 64(%1)\n"
" std %%r22, 72(%1)\n"
" ldd 96(%0), %%r19\n"
" ldd 104(%0), %%r20\n"
" std %%r23, 80(%1)\n"
" std %%r24, 88(%1)\n"
" ldd 112(%0), %%r21\n"
" ldd 120(%0), %%r22\n"
" std %%r19, 96(%1)\n"
" std %%r20, 104(%1)\n"
" ldo 128(%0), %0\n"
" std %%r21, 112(%1)\n"
" std %%r22, 120(%1)\n"
" addib,> -1, %%r1, 1b\n"
" ldo 128(%1), %1"
#else /* !V2 */
"1: ldd 0(%0), %%r19\n"
" ldd 8(%0), %%r20\n"
" ldd 16(%0), %%r21\n"
" ldd 24(%0), %%r22\n"
" std %%r19, 0(%1)\n"
" std %%r20, 8(%1)\n"
#ifndef V1
" std %%r21, 16(%1)\n"
" std %%r22, 24(%1)\n"
" ldd 32(%0), %%r19\n"
" ldd 40(%0), %%r20\n"
" ldd 48(%0), %%r21\n"
" ldd 56(%0), %%r22\n"
" std %%r19, 32(%1)\n"
" std %%r20, 40(%1)\n"
" std %%r21, 48(%1)\n"
" std %%r22, 56(%1)\n"
" ldd 64(%0), %%r19\n"
" ldd 72(%0), %%r20\n"
" ldd 80(%0), %%r21\n"
" ldd 88(%0), %%r22\n"
" std %%r19, 64(%1)\n"
" std %%r20, 72(%1)\n"
" std %%r21, 80(%1)\n"
" std %%r22, 88(%1)\n"
" ldd 96(%0), %%r19\n"
" ldd 104(%0), %%r20\n"
" ldd 112(%0), %%r21\n"
" ldd 120(%0), %%r22\n"
" std %%r19, 96(%1)\n"
" std %%r20, 104(%1)\n"
" std %%r21, 112(%1)\n"
" std %%r22, 120(%1)\n"
" ldo 128(%1), %1\n"
" addib,> -1, %%r1, 1b\n"
" ldo 128(%0), %0"
#else /* V1 */
" ldd 32(%0), %%r19\n"
" ldd 40(%0), %%r20\n"
" std %%r21, 16(%1)\n"
" std %%r22, 24(%1)\n"
" ldd 48(%0), %%r21\n"
" ldd 56(%0), %%r22\n"
" std %%r19, 32(%1)\n"
" std %%r20, 40(%1)\n"
" ldd 64(%0), %%r19\n"
" ldd 72(%0), %%r20\n"
" std %%r21, 48(%1)\n"
" std %%r22, 56(%1)\n"
" ldd 80(%0), %%r21\n"
" ldd 88(%0), %%r22\n"
" std %%r19, 64(%1)\n"
" std %%r20, 72(%1)\n"
" ldd 96(%0), %%r19\n"
" ldd 104(%0), %%r20\n"
" std %%r21, 80(%1)\n"
" std %%r22, 88(%1)\n"
" ldd 112(%0), %%r21\n"
" ldd 120(%0), %%r22\n"
" std %%r19, 96(%1)\n"
" std %%r20, 104(%1)\n"
" ldo 128(%0), %0\n"
" std %%r21, 112(%1)\n"
" std %%r22, 120(%1)\n"
" addib,> -1, %%r1, 1b\n"
" ldo 128(%1), %1"
#endif /* V1 */
#endif /* 0 */
#else /* !__LP64__ */
asm volatile ("ldi 64, %%r1\n"
"1: ldw 0(%0), %%r19\n"
" ldw 4(%0), %%r20\n"
" ldw 8(%0), %%r21\n"
" ldw 12(%0), %%r22\n"
" stw %%r19, 0(%1)\n"
" stw %%r20, 4(%1)\n"
" stw %%r21, 8(%1)\n"
" stw %%r22, 12(%1)\n"
" ldw 16(%0), %%r19\n"
" ldw 20(%0), %%r20\n"
" ldw 24(%0), %%r21\n"
" ldw 28(%0), %%r22\n"
" stw %%r19, 16(%1)\n"
" stw %%r20, 20(%1)\n"
" stw %%r21, 24(%1)\n"
" stw %%r22, 28(%1)\n"
" ldw 32(%0), %%r19\n"
" ldw 36(%0), %%r20\n"
" ldw 40(%0), %%r21\n"
" ldw 44(%0), %%r22\n"
" stw %%r19, 32(%1)\n"
" stw %%r20, 36(%1)\n"
" stw %%r21, 40(%1)\n"
" stw %%r22, 44(%1)\n"
" ldw 48(%0), %%r19\n"
" ldw 52(%0), %%r20\n"
" ldw 56(%0), %%r21\n"
" ldw 60(%0), %%r22\n"
" stw %%r19, 48(%1)\n"
" stw %%r20, 52(%1)\n"
" stw %%r21, 56(%1)\n"
" stw %%r22, 60(%1)\n"
" ldo 64(%1), %1\n"
" addib,> -1, %%r1, 1b\n"
" ldo 64(%0), %0"
#endif /* __LP64__ */
:
: "r"(__from), "r"(__to) );
}
/*
#define INIT 1
#define DEBUG 1
*/
#define BUFFSIZE (1024*1024*256)
#define PPB (BUFFSIZE/PAGE_SIZE) /* Pages Per Buff */
int main(int argc, char * * argv, char * * env)
{
char MemSrc[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmn" ;
char *MemDst;
int i, j, k;
MemDst = malloc(BUFFSIZE);
for (j = 0; j < PPB ; j++) {
__copy_user_page_asm(MemDst+(j*PAGE_SIZE), MemSrc);
}
MemDst[BUFFSIZE] = '\0';
#if DEBUG
/*
printf("MemDst = %s\n", MemDst);
*/
for (i=0; i<BUFFSIZE; i++) {
printf("MemDst[%d] = %c\n", i, MemDst[i]);
}
#endif
return 0;
}
[-- Attachment #3: Type: text/plain, Size: 169 bytes --]
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case)
2004-12-28 16:25 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case) Joel Soete
@ 2004-12-29 5:46 ` Grant Grundler
2004-12-29 11:36 ` Joel Soete
0 siblings, 1 reply; 18+ messages in thread
From: Grant Grundler @ 2004-12-29 5:46 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Tue, Dec 28, 2004 at 04:25:45PM +0000, Joel Soete wrote:
> A test case may can help better to show improvement:
>
> gcc -O2 -o cpup0 cpup0.c
> gcc -march=2.0 -O2 -DLP64 -o cpup1 cpup0.c
> gcc -march=2.0 -O2 -DLP64 -DV1 -o cpup2 cpup0.c
> gcc -march=2.0 -O2 -DLP64 -DV2 -o cpup3 cpup0.c
As usual, I've hacked the cpup.c test.
Don't compare my results below with the previous ones Joel posted.
I've committed my version of cpup.c to "build-tools" repository.
grundler <549>for j in 1 2 3 4 5 ; do echo -n $j " " ; for i in 0 1 2 3; do time ./cpup$i ; done 2>&1 | fgrep user | cut -f2 ; done
and globbed the output a bit so it looks like a table:
# cpup0 cpup1 cpup2 cpup3
1 0m1.033s 0m0.616s 0m0.607s 0m0.616s
2 0m1.039s 0m0.651s 0m0.587s 0m0.589s
3 0m1.004s 0m0.605s 0m0.631s 0m0.613s
4 0m1.015s 0m0.615s 0m0.572s 0m0.592s
5 0m1.014s 0m0.619s 0m0.564s 0m0.607s
Results are not statistically significant between 64-bit variants.
Results are from 2.6.10-rc3-pa6 SMP 64-bit kernel on a500-65 w/8G RAM
that was running a compile in the background.
cpupX columns above are defined by the following:
/*
** gcc -O2 -o cpup0 cpup.c vanilla 32-bit loop
** -march=2.0 -DLP64 -o cpup1 64-bit, 4ld + 4st sequences
** -march=2.0 -DLP64 -DV1 -o cpup2 64-bit, 4regs, 2ld/2st bundles
** -march=2.0 -DLP64 -DUSE6REGS -o cpup3 64-bit, 6 regs, 2ld/2st bundles
*/
And I'm wondering how/if 64-bit user space test ever worked since we don't
officially support 64-bit user space.
Likely I'm copying trash around even though the pointers are probably intact.
cpup2 is what I'd like to commit for the kernel version.
I've appended the patch.
I was expecting cpup3 would be slightly faster but don't have data
to prove it. And I'm still worried that GR23/GR24 won't be saved
by the caller since the __copy_user_page_asm function prototype
only specifies two arguments.
thanks,
grant
Index: arch/parisc/kernel/pacache.S
===================================================================
RCS file: /var/cvs/linux-2.6/arch/parisc/kernel/pacache.S,v
retrieving revision 1.13
diff -u -p -r1.13 pacache.S
--- arch/parisc/kernel/pacache.S 19 Dec 2004 04:50:35 -0000 1.13
+++ arch/parisc/kernel/pacache.S 29 Dec 2004 05:37:46 -0000
@@ -295,17 +295,72 @@ copy_user_page_asm:
.callinfo NO_CALLS
.entry
- ldi 64, %r1
+#ifdef __LP64__
+ /* PA8x00 CPUs can consume 2 loads and 2 stores per cycle.
+ * Unroll the loop by hand and arrange insn appropriately.
+ * GCC probably can do this just as well.
+ *
+ * Prefetching and using more regs to increase the "distance"
+ * between ldd and corresponding std are possible optimizations.
+ */
+
+ ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
+
+1: ldd 0(%r25), %r19 /* prolog == 1 bundle */
+ ldd 8(%r25), %r20
+
+ ldd 16(%r25), %r21 /* bundle 2 */
+ ldd 24(%r25), %r22
+ std %r19, 0(%r26)
+ std %r20, 8(%r26)
+
+ ldd 32(%r25), %r19 /* bundle 3 */
+ ldd 40(%r25), %r20
+ std %r21, 16(%r26)
+ std %r22, 24(%r26)
+
+ ldd 48(%r25), %r21 /* bundle 4 */
+ ldd 56(%r25), %r22
+ std %r19, 32(%r26)
+ std %r20, 40(%r26)
+
+ ldd 64(%r25), %r19 /* bundle 5 */
+ ldd 72(%r25), %r20
+ std %r21, 48(%r26)
+ std %r22, 56(%r26)
+
+ ldd 80(%r25), %r21 /* bundle 6 */
+ ldd 88(%r25), %r22
+ std %r19, 64(%r26)
+ std %r20, 72(%r26)
+
+ ldd 96(%r25), %r19 /* bundle 7 */
+ ldd 104(%r25), %r20
+ std %r21, 80(%r26)
+ std %r22, 88(%r26)
+
+ ldd 112(%r25), %r21 /* bundle 8 */
+ ldd 120(%r25), %r22
+ std %r19, 96(%r26)
+ std %r20, 104(%r26)
+
+ ldo 128(%r25), %r25 /* epilog == 2 bundles */
+ std %r21, 112(%r26)
+ std %r22, 120(%r26)
+
+ ADDIB> -1, %r1, 1b
+ ldo 128(%r26), %r26
+
+#else
/*
* This loop is optimized for PCXL/PCXL2 ldw/ldw and stw/stw
- * bundles (very restricted rules for bundling). It probably
- * does OK on PCXU and better, but we could do better with
- * ldd/std instructions. Note that until (if) we start saving
+ * bundles (very restricted rules for bundling).
+ * Note that until (if) we start saving
* the full 64 bit register values on interrupt, we can't
* use ldd/std on a 32 bit kernel.
*/
-
+ ldi 64, %r1 /* PAGE_SIZE/64 == 64 */
1:
ldw 0(%r25), %r19
@@ -343,7 +398,7 @@ copy_user_page_asm:
ldo 64(%r26), %r26
ADDIB> -1, %r1, 1b
ldo 64(%r25), %r25
^ permalink raw reply [flat|nested] 18+ messages in thread
* [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case)
2004-12-29 5:46 ` Grant Grundler
@ 2004-12-29 11:36 ` Joel Soete
0 siblings, 0 replies; 18+ messages in thread
From: Joel Soete @ 2004-12-29 11:36 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
Grant Grundler wrote:
> On Tue, Dec 28, 2004 at 04:25:45PM +0000, Joel Soete wrote:
[...]
>
> As usual, I've hacked the cpup.c test.
Cool ;-)
> Don't compare my results below with the previous ones Joel posted.
> I've committed my version of cpup.c to "build-tools" repository.
>
Thanks
[...]
>
> cpup2 is what I'd like to commit for the kernel version.
> I've appended the patch.
>
Nice
> I was expecting cpup3 would be slightly faster but don't have data
> to prove it.
Don't know where can we find L1-Cache state diagram to help more?
> And I'm still worried that GR23/GR24 won't be saved
> by the caller since the __copy_user_page_asm function prototype
> only specifies two arguments.
>
> thanks,
> grant
>
Thanks for your attention,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-27 10:40 ` Joel Soete
2004-12-28 16:25 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case) Joel Soete
@ 2004-12-30 8:10 ` Grant Grundler
2004-12-30 17:04 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-l John David Anglin
2 siblings, 1 reply; 18+ messages in thread
From: Grant Grundler @ 2004-12-30 8:10 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Mon, Dec 27, 2004 at 12:36:54AM -0700, Grant Grundler wrote:
> Anyone know how many cycles ldd from L1 takes?
I found the answer for PCX-W CPU:
The PCXW Data cache is a 4-way set associative 1 MB cache, split
into two banks and interleaved on double word boundaries to allow
two simultaneous uses of the cache. Each bank is further divided
into independent tag and data ports, primarily to allow effective
single cycle stores. The two tags hold identical information.
Each port returns data in two cycles, but can start a new access
every cycle.
I'll assume PA8[567]00 CPUs have similar if not identical behavior.
PA8800 may not and I'd be curious if anyone knows.
I've just committed a "simple" version that uses r19/20/21/22.
I've got another version that also uses r23/r24 but it didn't boot
and I didn't chase down why. It's possibly a HW bug with this particular
A500. I'll try it again.
Lamont tells me r23/24/28/29 are *caller* saves registers.
Ie I could r28/29 as well (or instead of r23/24).
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-l
2004-12-30 8:10 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
@ 2004-12-30 17:04 ` John David Anglin
0 siblings, 0 replies; 18+ messages in thread
From: John David Anglin @ 2004-12-30 17:04 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc-linux
> Lamont tells me r23/24/28/29 are *caller* saves registers.
> Ie I could r28/29 as well (or instead of r23/24).
Here is a summary of the uses for general call used registers. These are
r1, r2 and r19 to r31.
Register 32-bit 64-bit
Arguments r23-r26 r19-r26
Argument Pointer NA r29
Static Chain r29 r31
PIC Offset Table Pointer r19 r27
Stack Pointer r30 r30
Return Pointer r2 r2
Millicode Return Pointer r31 r2 (r31 for local millicode)
Pointer for $$dyncall r21 NA
You can never use r30 and you can't use r2 without saving it. Watch out
for the PIC register conventions. Depending on circumstances, the rest
should be usable.
Dave
--
J. David Anglin dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-27 15:08 ` James Bottomley
@ 2004-12-31 20:26 ` Michael S. Zick
2004-12-31 20:56 ` Grant Grundler
2004-12-31 21:21 ` James Bottomley
0 siblings, 2 replies; 18+ messages in thread
From: Michael S. Zick @ 2004-12-31 20:26 UTC (permalink / raw)
To: parisc-linux
On Mon December 27 2004 09:08, James Bottomley wrote:
> On Mon, 2004-12-27 at 10:40 +0000, Joel Soete wrote:
> > Should be why it was removed but as far as I didn't find any explanation (that's obvious: that's nearly impossible to explain all
> > details of implementation ;-)
>
> I haven't time to look through the patch, but I can explain what the
> pdtlb's are about in pacache.S.
>
> Both copy_user_page_asm and __clear_user_page_asm use something called
> the tmpalias mapping. This is a 8MB reserved area that's used to prime
> the user space cache. What you do is to set up a temporary mapping for
> the target of the copy which is congruent to the user space address
> somewhere in the tmpalias region. Then when you do the copy, the user
> alias is automatically up to date as well (because the cache sees the
> collision by virtue of its congruence properties).
>
> It's a nice idea, but we've never been able to make it work in practise,
> because the user page we're copying can be an executable page, and this
> scheme only makes the d-cache correct. If we had a way of telling
> whether it's a data page or and instruction page, we could make it work.
> That's why the mechanism is #if 0'd out.
>
Group,
I have been following this thread with interest. Let me share my observations.
Changes in the instruction sequence of this kernel code path makes a user
observable difference in execution timings.
<bold-statement attribute="General-OS-Design">
This path should not be within the set of user observable execution times.
</bold-statement>
Conditions, general:
The copy of a "user page" :: presumed to mean "copy of a page assigned
to user space". Possible refinement: "copy of a page assigned to a specific
user's space".
Page must contain zeros on return.
Contents of system caches must correspond to contents of page (zeros).
On entry, it is unknown if page is currently Data, Executable (Instruction),
Both, or Neither.
Having a means to determine the exact, prior, usages of a page on entry
to this path would be nice; but logic and design can overcome this lack.
HP, PA-RISC has only i-cache and d-cache hardware. It does not have
s-cache hardware.
A page assigned to user space may be assigned to more than one,
specific, user's space.
A page assigned to user space may also be assigned to kernel space.
For a 'dual assigned' page (assigned to both user space and kernel space)
the following must hold:
A) (Kernel Instruction) and (User Instruction)::
MUST NOT also be assigned: (User Data)
MAY OPTIONALLY be assigned: (Kernel Data)
B) (Kernel Data) and (User Data)::
MUST NOT also be assigned: (Kernel Instruction)
MAY OPTIONALLY be assigned: (User Instruction)
The above requirements are independent of the implementation of
such assignments.
Memory management hardware that allows 'dual assignment' is rare.
Memory management software that allows 'dual assignment' by
constructing a 'page alias' is common.
Condition (A :: 'MUST NOT') protects kernel provided, common code,
from user modification.
Condition (A :: 'MAY OPTIONALLY') allows the kernel to:
1) Dynamically alter the code provided to user space in general.
2) Dynamically alter the code provided to a specific user's space.
NOTE: Such operation would trigger a 'copy on write' code path.
NOTE: The (shared) source page of 'copy on write' is not modified.
NOTE: The destination page of 'copy on write' comes from the free pool.
Condition (B :: 'MUST NOT') protects the kernel from user insertion or
modification of kernel code.
Condition (B :: 'MAY OPTIONALLY') supports the provision of 'executable
stack' in user space in the absence of s-cache hardware.
For a system that supports the provision of 'user, executable stack' the
following must hold:
C) (User Instruction) and (User Data) and (User Stack)::
MUST meet condition (B)
MUST NOT be shared among users: thou shall not share your stack.
D) (User Data) and (User Heap)::
MUST NOT also be assigned: (Kernel Instruction)
MUST NOT also be assigned: (User Instruction)
MAY OPTIONALLY share disjoint address sub-ranges of the overall
address range '((User Instruction) and (User Data) and (User Stack))'
ON EITHER CONDITION OF:
1) Attributes of the disjoint address sub-ranges are also disjoint.
2) Software design can guarantee behavior the same as sub-condition(1).
Condition (C :: 'MUST NOT') 'copy on write' code path is never used.
Condition (D :: 'MUST NOT') Differs from (Condition C) by non-compliance with
(Condition B).
Condition (D :: 'MAY OPTIONALLY') Guarantees the distinction between (Condition
C) and (Condition D) when (Condition D) address area is shared among users in
the absence of separate (Condition C) and (Condition D) address spaces.
NOTE: A (Condition D) area my trigger a 'copy on write' code path; A (Condition
C) area MUST NOT trigger a 'copy on write" code path.
<All-Other-Combinations>
1) A page received from (any) free pool is guaranteed to contain only zeros.
2) A page received from (any) free pool is guaranteed to not have any 'user
space' cache representations.
</All-Other-Combinations>
NOTE: Zeroing a page received from (any) free pool is not 'user observable'
for the simple reason that it never happens.
<Page-Return-To-Free-Pool>
Pages which are intended to be added to the free pool, are not directly returned
to the free pool.
Instead they are returned to a kernel space, free pool management, daemon. It
is this daemon that makes the <All-Other-Combinations> guarantee.
NOTE: Zeroing a page on return to (any) free pool is not 'user observable' only
the 'add to free pool incoming queue' is in the 'user observable' code path.
NOTE: Pages handled by this daemon may have both d-cache and i-cache
representations. But the code which deals with this situation is not 'user
observable' because the entire 'return to free pool' operation is not 'user
observable'.
</Page-Return-To-Free-Pool>
<Non-Free-Pool-Pages>
<Non-rhetorical Question="What user pages can be both Instruction and Data?" />
(Condition B - 'MAY OPTIONALLY') pages:
Dual Assigned : (I.E: Transition from 'shared' to 'private')
In-Use portion is copied ('user observable') - Not-Used portion is not copied.
It can be guaranteed to already be zero since it hasn't been used.
The 'write' side of the copy instructions does any 'cache priming'.
(Condition C) pages:
NOTE: Never shared, therefore never copied.
NOTE: Extending the pages present for an executable stack does not
have 'user observable' zeroing since the new page source is the free pool.
NOTE: Trimming 'zombie' stack extensions under general memory pressure
(I.E: Free pool exhausted @ new page request pending) would generate 'user
observable' execution time while a page on the 'add to free pool incoming
queue' was cleared.
This corner case can be postponed by using 'preemptive trimming' implemented
in the free pool management daemon.
</Non-Free-Pool-Pages>
Q.E.D: Zeroing a page with the destination of user space assignment need not
be a 'user observable' execution time.
There should be additional gains made in 'copy-[to|from]-user' when these four
conditions are enforced.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-31 20:26 ` Michael S. Zick
@ 2004-12-31 20:56 ` Grant Grundler
2004-12-31 21:35 ` Michael S. Zick
2004-12-31 21:21 ` James Bottomley
1 sibling, 1 reply; 18+ messages in thread
From: Grant Grundler @ 2004-12-31 20:56 UTC (permalink / raw)
To: Michael S. Zick; +Cc: parisc-linux
On Fri, Dec 31, 2004 at 02:26:13PM -0600, Michael S. Zick wrote:
> This path should not be within the set of user observable execution times.
...
> NOTE: Zeroing a page on return to (any) free pool is not 'user observable' only
> the 'add to free pool incoming queue' is in the 'user observable' code path.
>
> NOTE: Pages handled by this daemon may have both d-cache and i-cache
> representations. But the code which deals with this situation is not 'user
> observable' because the entire 'return to free pool' operation is not 'user
> observable'.
...
> Q.E.D: Zeroing a page with the destination of user space assignment need not
> be a 'user observable' execution time.
Mike,
The copy_user_page and zero_page functions *are* observable since they
affect metrics reported by "time" and readprofile. I don't care if they
are in invoked in the application context or some other context.
Certainly, it would reduce startup latency to pre-zero the pages in
the kernel (daemon) and have them ready when apps want them.
But on a loaded system, I expect this will be slightly less efficient
and more complex since one doesn't know how many need to be pre-zero'd
or when to steal pre-zero'd pages for other uses (e.g. load in an
executable).
> There should be additional gains made in 'copy-[to|from]-user' when these four
> conditions are enforced.
I read the conditions and thought "neat".
I don't pretend to understand all of them or what they mean.
But instead of trying to explain them, could you send me a patch that works?
Maybe something that has a chance of going back upstream to linus?
thanks,
grant
>
> Mike
> _______________________________________________
> parisc-linux mailing list
> parisc-linux@lists.parisc-linux.org
> http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-31 20:26 ` Michael S. Zick
2004-12-31 20:56 ` Grant Grundler
@ 2004-12-31 21:21 ` James Bottomley
1 sibling, 0 replies; 18+ messages in thread
From: James Bottomley @ 2004-12-31 21:21 UTC (permalink / raw)
To: Michael S. Zick; +Cc: PARISC list
On Fri, 2004-12-31 at 14:26 -0600, Michael S. Zick wrote:
> Page must contain zeros on return.
>
> Contents of system caches must correspond to contents of page (zeros).
Actually, no, this is precisely what we don't do for performance
reasons. If we just wanted to the caches and main memory in sync, we
wouldn't need to muck with the tmpalias space.
What clear_user_page_asm does is to prime the cache covering the page
with zeros, but return the page to user space with a dirty cache (i.e.
with the real memory not necessarily zero'd but with the cache in a
state to zero it on a flush). The reason for using the tmpalias space
is so that the user's VIPT cache lines covering the page are congruent
and thus the same ones the kernel wrote the zeros to.
This means that if the user is simply going to fill the page again, we
stand a good chance of *not* having to write the zeros to main memory in
the first place (this saves us quite a bit of execution time because
writing to main memory is an expensive operation).
James
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2004-12-31 20:56 ` Grant Grundler
@ 2004-12-31 21:35 ` Michael S. Zick
[not found] ` <20041231225447.GC23592@colo.lackof.org>
0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Zick @ 2004-12-31 21:35 UTC (permalink / raw)
To: parisc-linux
On Fri December 31 2004 14:56, Grant Grundler wrote:
>
> I read the conditions and thought "neat".
> I don't pretend to understand all of them or what they mean.
> But instead of trying to explain them, could you send me a patch that works?
> Maybe something that has a chance of going back upstream to linus?
>
I tried the 'patch that works' route with a similar suggestion for sched.c
Based on that experience...
I suspect that perhaps pictures (diagrams? flow charts? dependency
graphs?) might stand a better chance of conveying what I can't explain
in English. I'll put that (drawing pictures) on my todo list.
Let me attempt an abstract in words:
The *nix philosophy is two part drivers.
The 'top part' can be viewed as a 'client' that makes requests on
behalf of the hardware.
The 'bottom part' can be viewed as a 'host' that services 'client'
requests.
Nothing new there.
What I proposed was:
The memory page free pool be defined as a 'virtual device' with
a two part driver.
The 'top part' is executed by the 'client' (kernel).
The 'bottom part' is executed by the 'host' (kernel daemon).
The only thing different than usual here is that a real hardware
device is (in most cases) the 'client' and the kernel is the 'host'.
In this virtual free pool device, the kernel is the 'client' and the
daemon is the 'host' (which only happens to be part of the kernel).
Only the 'client' code is in the user's execution path.
Should be interesting to consider.
I wouldn't expect the idea to be adopted any quicker than my
description (and patch that works) that the scheduler should be
a virtual device with a two part driver.
Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
[not found] ` <20041231225447.GC23592@colo.lackof.org>
@ 2004-12-31 23:56 ` Michael S. Zick
2005-01-12 13:52 ` Michael S. Zick
1 sibling, 0 replies; 18+ messages in thread
From: Michael S. Zick @ 2004-12-31 23:56 UTC (permalink / raw)
To: parisc-linux
On Fri December 31 2004 16:54, Grant Grundler wrote:
> On Fri, Dec 31, 2004 at 03:35:28PM -0600, Michael S. Zick wrote:
> > I tried the 'patch that works' route with a similar suggestion for sched.c
> > Based on that experience...
>
> ah good. You learned something. :^)
>
Sometimes.
>
> > What I proposed was:
> > The memory page free pool be defined as a 'virtual device' with
> > a two part driver.
> ...
> > In this virtual free pool device, the kernel is the 'client' and the
> > daemon is the 'host' (which only happens to be part of the kernel).
> > Only the 'client' code is in the user's execution path.
>
> This sounds neat and "clean". But things could get very ugly
> when one needs to "steal" zero'd pages for other uses.
>
> > Should be interesting to consider.
>
> Yes, Agreed.
>
Better the discussion first - code optionally later.
>
> > I wouldn't expect the idea to be adopted any quicker than my
> > description (and patch that works) that the scheduler should be
> > a virtual device with a two part driver.
>
> I don't know what happened to your scheduler idea specifically (or
> how it was presented), but making something a driver means
> giving up something else. Been there done that.
>
Overly radical at the time of presentation compared with
other 'work in progress'.
Managing the free page pool (only) as a virtual device would lead
to much-oh (scientific term ;) glue code. Not much of an improvement
over current practice.
Managing over-all memory resources as a virtual device is the answer;
but that is hardly a 'patch'.
Even so, glue code would be required unless the resource of
cpu-cycles was also managed as a virtual device.
Now the topic is definitely out of the 'patch' scope, providing both
virtual devices would need to be a kernel branch devoted to the
project.
That in turn would require that a whole lot of people 'get on board'
with the ideas behind the design change.
The only practical means to accomplish that brings us full circle
back to the observation above: "Discussion First".
Mike
(PS: None of this is academic, just a clean re-write of code
written in the past for proprietary operating systems.)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
[not found] ` <20041231225447.GC23592@colo.lackof.org>
2004-12-31 23:56 ` Michael S. Zick
@ 2005-01-12 13:52 ` Michael S. Zick
2005-01-12 15:32 ` Joel Soete
1 sibling, 1 reply; 18+ messages in thread
From: Michael S. Zick @ 2005-01-12 13:52 UTC (permalink / raw)
To: parisc-linux
On Fri December 31 2004 16:54, Grant Grundler wrote:
> On Fri, Dec 31, 2004 at 03:35:28PM -0600, Michael S. Zick wrote:
> > I tried the 'patch that works' route with a similar suggestion for sched.c
> > Based on that experience...
>
> ah good. You learned something. :^)
>
> > What I proposed was:
> > The memory page free pool be defined as a 'virtual device' with
> > a two part driver.
> ...
> > In this virtual free pool device, the kernel is the 'client' and the
> > daemon is the 'host' (which only happens to be part of the kernel).
> > Only the 'client' code is in the user's execution path.
>
> This sounds neat and "clean". But things could get very ugly
> when one needs to "steal" zero'd pages for other uses.
>
> > Should be interesting to consider.
>
> Yes, Agreed.
>
Design seems to be drifting in that general direction.
See: change log on 2.6.11-rc1
More details at:
<http://seclists.org/lists/linux-kernel/2005/Jan/0888.html>
Mike (with Joel's help).
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
2005-01-12 13:52 ` Michael S. Zick
@ 2005-01-12 15:32 ` Joel Soete
0 siblings, 0 replies; 18+ messages in thread
From: Joel Soete @ 2005-01-12 15:32 UTC (permalink / raw)
To: Michael S. Zick, parisc-linux
[...]
> >
> Design seems to be drifting in that general direction.
>
> See: change log on 2.6.11-rc1
>
> More details at:
> <http://seclists.org/lists/linux-kernel/2005/Jan/0888.html>
>
The last v4 release thread start here:
<http://seclists.org/lists/linux-kernel/2005/Jan/2931.html>
and also
<http://www.gelato.unsw.edu.au/linux-ia64/0501/12468.html>
I tried to applying those patch but I do have miss against which kernel t=
his
patch was build: a big hunk of patch [2/4] failled :-(
Having a quick look is supposed to rename severall function in mm/page_al=
loc.c
as page_order() into page_zorder() but I didn't find it and not more in t=
he
vanilla 2.6.10?
Joel
-------------------------------------------------------------------------=
--
Tiscali solde! 1 mois et activation Gratuits, modem =E0 9,99=80
http://reg.tiscali.be/adsl/default.asp?lg=3DFR
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-01-12 15:32 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <418A80E8000124B5@mail-6-bnl.tiscali.it>
2004-12-27 7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-27 10:40 ` Joel Soete
2004-12-27 15:08 ` James Bottomley
2004-12-31 20:26 ` Michael S. Zick
2004-12-31 20:56 ` Grant Grundler
2004-12-31 21:35 ` Michael S. Zick
[not found] ` <20041231225447.GC23592@colo.lackof.org>
2004-12-31 23:56 ` Michael S. Zick
2005-01-12 13:52 ` Michael S. Zick
2005-01-12 15:32 ` Joel Soete
2004-12-31 21:21 ` James Bottomley
2004-12-27 17:34 ` Joel Soete
2004-12-27 18:32 ` Joel Soete
2004-12-28 16:25 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment (Test case) Joel Soete
2004-12-29 5:46 ` Grant Grundler
2004-12-29 11:36 ` Joel Soete
2004-12-30 8:10 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-30 17:04 ` [parisc-linux] Re: copy_user_page_asm suggested 64bit improvment [Was: [parisc-l John David Anglin
[not found] <20041210190333.GC6653@colo.lackof.org>
[not found] ` <418A811700010466@mail-8-bnl.mail.tiscali.sys>
[not found] ` <20041213180758.GA8705@colo.lackof.org>
[not found] ` <41C34C56.4080508@tiscali.be>
[not found] ` <20041218073036.GA29003@colo.lackof.org>
[not found] ` <41C440A3.6060708@tiscali.be>
[not found] ` <41C4872D.6010705@tiscali.be>
[not found] ` <41C4A35A.7010003@tiscali.be>
[not found] ` <20041219042528.GB15282@colo.lackof.org>
[not found] ` <41C5D761.4030004@tiscali.be>
2004-12-19 20:27 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Joel Soete
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.