From: Joel Soete <soete.joel@tiscali.be>
To: parisc-linux <parisc-linux@lists.parisc-linux.org>,
Randolph Chung <randolph@tausq.org>
Subject: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
Date: Sun, 19 Dec 2004 20:27:26 +0000 [thread overview]
Message-ID: <41C5E42E.9020602@tiscali.be> (raw)
In-Reply-To: <41C5D761.4030004@tiscali.be>
Hello pa*,
Here is the last exchange I had with ggg:
Joel Soete wrote:
>
>
> Grant Grundler wrote:
>
>> On Sat, Dec 18, 2004 at 09:38:34PM +0000, Joel Soete wrote:
>>
>>> interesting results:
>>> on b2k 64bit kernel:
>>> user 18m52.169s
>>> # including all those patch
>>> user 18m46.224s
>>> much better then clear_user_page_asm :-)
>>
>>
>>
>> "much better" is slightly overstated for a 0.62 % improvement.
>> That's a 7 second improvement over a 1132 second time frame.
>>
> Ok with the previous changes of clear_user_page_asm() the test case let
> us expected some overall improvement (less ticks consumption :))
> but the final time results were absolutly desapointing istr:
> I test it on my b2k and here are the results:
> time make V=1 vmlinux (under 2.6.10-rc3-pa3-cvs)
> real 23m23.239s
> user 18m50.141s
> sys 4m21.203s
>
> time make V=1 vmlinux (under 2.6.10-rc3-pa4-patch clear_user_...)
> real 23m18.552s
> user 18m50.534s
> sys 4m16.903s
>
> :_(
>
>> Sorry - I can't see what changed to bring about this improvement.
>> Can you point that out to me?
>
> that make sense as you rejected previous patche, I will re-write it as
> #ifdef __LP64__ so; NP
>
>> I suspect there might be something else involved - perhaps this
>> difference is just within the noise of the test.
>>
> Ok I will make much more run ;-)
>
>> BTW, I'll be on the road for the rest of the week...I won't be
>> able to pursue this until after I get back. If there are more
>> changes you want to the file, please post them to the list.
>>
> NP ;-) (have a nice travel and be carefull)
>
> Thanks again for all,
> Joel
>
regarding this subject:
Joel Soete wrote:
> Grant,
>
> still some more thought (and btw additional questions)
>
> As we choose to keep the same number of insn in this loop, I was looking
> for something like:
> #ifdef __LP64__
> #define STR std
> #else
> #define STR stw
> #endif
>
> but figure out that already exist and so why not using addtional
> displacement #define and re-write loop like:
> --- arch/parisc/kernel/pacache.S.New 2004-12-18 15:39:11.000000000 +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 19:19:53.862854692 +0100
> @@ -288,6 +288,49 @@
>
> .procend
>
> +#ifdef __LP64__
> + /* PREFETCH (Write) has not (yet) been proven to help here */
> +/* # define PREFETCHW_OP ldd 256(%0), %r0 */
> +
> +#define INCR 128 /* Loop's INCRement */
> +#define LN 32 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
> +#define D0 0 /* 1st insn displacement */
> +#define D1 8 /* 2d insn displacement */
> +#define D2 16
> +#define D3 24
> +#define D4 32
> +#define D5 40
> +#define D6 48
> +#define D7 56
> +#define D8 64
> +#define D9 72
> +#define D10 80
> +#define D11 88
> +#define D12 96
> +#define D13 104
> +#define D14 112
> +#define D15 120 /* last insn displacement */
> +#else /* !__LP64__ */
> +#define INCR 64 /* Loop's INCRement */
> +#define LN 64 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 64 */
> +#define D0 0 /* 1st insn displacement */
> +#define D1 4 /* 2d insn displacement */
> +#define D2 8
> +#define D3 12
> +#define D4 16
> +#define D5 20
> +#define D6 24
> +#define D7 28
> +#define D8 32
> +#define D9 36
> +#define D10 40
> +#define D11 44
> +#define D12 48
> +#define D13 52
> +#define D14 56
> +#define D15 60 /* last insn displacement */
> +#endif /* __LP64__ */
> +
> .export copy_user_page_asm,code
>
> copy_user_page_asm:
> @@ -502,55 +545,26 @@
>
> pdtlb 0(%r28)
>
> -#ifdef __LP64__
> - ldi 32, %r1 /* PAGE_SIZE/128 == 32 */
> -
> - /* PREFETCH (Write) has not (yet) been proven to help here */
> -/* #define PREFETCHW_OP ldd 256(%0), %r0 */
> + ldi LN, %r1 /* PAGE_SIZE/INCR == LN */
>
> -1: std %r0, 0(%r28)
> - std %r0, 8(%r28)
> - std %r0, 16(%r28)
> - std %r0, 24(%r28)
> - std %r0, 32(%r28)
> - std %r0, 40(%r28)
> - std %r0, 48(%r28)
> - std %r0, 56(%r28)
> - std %r0, 64(%r28)
> - std %r0, 72(%r28)
> - std %r0, 80(%r28)
> - std %r0, 88(%r28)
> - std %r0, 96(%r28)
> - std %r0, 104(%r28)
> - std %r0, 112(%r28)
> - std %r0, 120(%r28)
> +1: STREG %r0, D0(%r28)
> + STREG %r0, D1(%r28)
> + STREG %r0, D2(%r28)
> + STREG %r0, D3(%r28)
> + STREG %r0, D4(%r28)
> + STREG %r0, D5(%r28)
> + STREG %r0, D6(%r28)
> + STREG %r0, D7(%r28)
> + STREG %r0, D8(%r28)
> + STREG %r0, D9(%r28)
> + STREG %r0, D10(%r28)
> + STREG %r0, D11(%r28)
> + STREG %r0, D12(%r28)
> + STREG %r0, D13(%r28)
> + STREG %r0, D14(%r28)
> + STREG %r0, D15(%r28)
> ADDIB> -1, %r1, 1b
> - ldo 128(%r28), %r28
> -
> -#else /* ! __LP64 */
> -
> - ldi 64, %r1 /* PAGE_SIZE/64 == 64 */
> -
> -1:
> - stw %r0, 0(%r28)
> - stw %r0, 4(%r28)
> - stw %r0, 8(%r28)
> - stw %r0, 12(%r28)
> - stw %r0, 16(%r28)
> - stw %r0, 20(%r28)
> - stw %r0, 24(%r28)
> - stw %r0, 28(%r28)
> - stw %r0, 32(%r28)
> - stw %r0, 36(%r28)
> - stw %r0, 40(%r28)
> - stw %r0, 44(%r28)
> - stw %r0, 48(%r28)
> - stw %r0, 52(%r28)
> - stw %r0, 56(%r28)
> - stw %r0, 60(%r28)
> - ADDIB> -1, %r1, 1b
> - ldo 64(%r28), %r28
> -#endif /* __LP64 */
> + ldo INCR(%r28), %r28
>
> bv %r0(%r2)
> nop
> =========> arch_parisc_kernel_pacache.S.diff4 <=========
this was definitely rejected:
Grant Grundler wrote:
> On Sat, Dec 18, 2004 at 07:38:21PM +0000, Joel Soete wrote:
>
>>Grant,
>>
>>still some more thought (and btw additional questions)
>>
>>As we choose to keep the same number of insn in this loop, I was looking
>>for something like:
>>#ifdef __LP64__
>>#define STR std
>>#else
>>#define STR stw
>>#endif
>>
>>but figure out that already exist and so why not using addtional
>>displacement #define and re-write loop like:
>>--- arch/parisc/kernel/pacache.S.New 2004-12-18 15:39:11.000000000 +0100
>>+++ arch/parisc/kernel/pacache.S 2004-12-18 19:19:53.862854692 +0100
>>@@ -288,6 +288,49 @@
>>
>> .procend
>>
>>+#ifdef __LP64__
>>+ /* PREFETCH (Write) has not (yet) been proven to help here */
>>+/* # define PREFETCHW_OP ldd 256(%0), %r0 */
>>+
>>+#define INCR 128 /* Loop's INCRement */
>>+#define LN 32 /* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
>>+#define D0 0 /* 1st insn displacement */
>>+#define D1 8 /* 2d insn displacement */
>
> ...
>
>
> Sorry - definitely not. That just obscures whats going on.
>
> grant
>
So will have to rewrite this one too
>
> mmm but copy_user_page_asm has the same structure and so why not using
> the same schema:
> --- arch/parisc/kernel/pacache.S.New1 2004-12-18 19:21:25.503229430
> +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 19:43:33.184799992 +0100
> @@ -338,7 +338,7 @@
> .callinfo NO_CALLS
> .entry
>
> - ldi 64, %r1
> + ldi LN, %r1
>
> /*
> * This loop is optimized for PCXL/PCXL2 ldw/ldw and stw/stw
> @@ -349,43 +349,41 @@
> * use ldd/std on a 32 bit kernel.
> */
>
> -
> -1:
> - ldw 0(%r25), %r19
> - ldw 4(%r25), %r20
> - ldw 8(%r25), %r21
> - ldw 12(%r25), %r22
> - stw %r19, 0(%r26)
> - stw %r20, 4(%r26)
> - stw %r21, 8(%r26)
> - stw %r22, 12(%r26)
> - ldw 16(%r25), %r19
> - ldw 20(%r25), %r20
> - ldw 24(%r25), %r21
> - ldw 28(%r25), %r22
> - stw %r19, 16(%r26)
> - stw %r20, 20(%r26)
> - stw %r21, 24(%r26)
> - stw %r22, 28(%r26)
> - ldw 32(%r25), %r19
> - ldw 36(%r25), %r20
> - ldw 40(%r25), %r21
> - ldw 44(%r25), %r22
> - stw %r19, 32(%r26)
> - stw %r20, 36(%r26)
> - stw %r21, 40(%r26)
> - stw %r22, 44(%r26)
> - ldw 48(%r25), %r19
> - ldw 52(%r25), %r20
> - ldw 56(%r25), %r21
> - ldw 60(%r25), %r22
> - stw %r19, 48(%r26)
> - stw %r20, 52(%r26)
> - stw %r21, 56(%r26)
> - stw %r22, 60(%r26)
> - ldo 64(%r26), %r26
> +1: LDREG D0(%r25), %r19
> + LDREG D1(%r25), %r20
> + LDREG D2(%r25), %r21
> + LDREG D3(%r25), %r22
> + STREG %r19, D0(%r26)
> + STREG %r20, D1(%r26)
> + STREG %r21, D2(%r26)
> + STREG %r22, D3(%r26)
> + LDREG D4(%r25), %r19
> + LDREG D5(%r25), %r20
> + LDREG D6(%r25), %r21
> + LDREG D7(%r25), %r22
> + STREG %r19, D4(%r26)
> + STREG %r20, D5(%r26)
> + STREG %r21, D6(%r26)
> + STREG %r22, D7(%r26)
> + LDREG D8(%r25), %r19
> + LDREG D9(%r25), %r20
> + LDREG D10(%r25), %r21
> + LDREG D11(%r25), %r22
> + STREG %r19, D8(%r26)
> + STREG %r20, D9(%r26)
> + STREG %r21, D10(%r26)
> + STREG %r22, D11(%r26)
> + LDREG D12(%r25), %r19
> + LDREG D13(%r25), %r20
> + LDREG D14(%r25), %r21
> + LDREG D15(%r25), %r22
> + STREG %r19, D12(%r26)
> + STREG %r20, D13(%r26)
> + STREG %r21, D14(%r26)
> + STREG %r22, D15(%r26)
> + ldo INCR(%r26), %r26
> ADDIB> -1, %r1, 1b
> - ldo 64(%r25), %r25
> + ldo INCR(%r25), %r25
>
> bv %r0(%r2)
> nop
> =========> arch_parisc_kernel_pacache.S.diff5 <=========
>
> And finaly why didn't we have to purge related dtlb entries as we did
> (see #if 0 below) and we do in clear_user_page_asm:
> --- arch/parisc/kernel/pacache.S.New2 2004-12-18 19:44:46.684937345
> +0100
> +++ arch/parisc/kernel/pacache.S 2004-12-18 20:19:51.113544409 +0100
> @@ -338,6 +338,11 @@
> .callinfo NO_CALLS
> .entry
>
> + /* Purge any old translations */
> +
> + pdtlb 0(%r25)
> + pdtlb 0(%r26)
> +
> ldi LN, %r1
>
> /*
> =========> arch_parisc_kernel_pacache.S.diff6 <=========
>
But what's about this one?
What you opinion?
Thanks,
Joel
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
next parent reply other threads:[~2004-12-19 20:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20041210190333.GC6653@colo.lackof.org>
[not found] ` <418A811700010466@mail-8-bnl.mail.tiscali.sys>
[not found] ` <20041213180758.GA8705@colo.lackof.org>
[not found] ` <41C34C56.4080508@tiscali.be>
[not found] ` <20041218073036.GA29003@colo.lackof.org>
[not found] ` <41C440A3.6060708@tiscali.be>
[not found] ` <41C4872D.6010705@tiscali.be>
[not found] ` <41C4A35A.7010003@tiscali.be>
[not found] ` <20041219042528.GB15282@colo.lackof.org>
[not found] ` <41C5D761.4030004@tiscali.be>
2004-12-19 20:27 ` Joel Soete [this message]
[not found] <418A80E8000124B5@mail-6-bnl.tiscali.it>
2004-12-27 7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-27 10:40 ` Joel Soete
2004-12-27 15:08 ` James Bottomley
2004-12-31 20:26 ` Michael S. Zick
2004-12-31 20:56 ` Grant Grundler
2004-12-31 21:35 ` Michael S. Zick
[not found] ` <20041231225447.GC23592@colo.lackof.org>
2004-12-31 23:56 ` Michael S. Zick
2005-01-12 13:52 ` Michael S. Zick
2005-01-12 15:32 ` Joel Soete
2004-12-31 21:21 ` James Bottomley
2004-12-27 17:34 ` Joel Soete
2004-12-27 18:32 ` Joel Soete
2004-12-30 8:10 ` Grant Grundler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41C5E42E.9020602@tiscali.be \
--to=soete.joel@tiscali.be \
--cc=parisc-linux@lists.parisc-linux.org \
--cc=randolph@tausq.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.