All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Soete <soete.joel@tiscali.be>
To: parisc-linux <parisc-linux@lists.parisc-linux.org>,
	Randolph Chung <randolph@tausq.org>
Subject: copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test]
Date: Sun, 19 Dec 2004 20:27:26 +0000	[thread overview]
Message-ID: <41C5E42E.9020602@tiscali.be> (raw)
In-Reply-To: <41C5D761.4030004@tiscali.be>

Hello pa*,

Here is the last exchange I had with ggg:

Joel Soete wrote:
> 
> 
> Grant Grundler wrote:
> 
>> On Sat, Dec 18, 2004 at 09:38:34PM +0000, Joel Soete wrote:
>>
>>> interesting results:
>>> on b2k 64bit kernel:
>>> user    18m52.169s
>>> # including all those patch
>>> user    18m46.224s
>>> much better then clear_user_page_asm :-)
>>
>>
>>
>> "much better" is slightly overstated for a 0.62 % improvement.
>> That's a 7 second improvement over a 1132 second time frame.
>>
> Ok with the previous changes of clear_user_page_asm() the test case let 
> us expected some overall improvement (less ticks consumption :))
> but the final time results were absolutly desapointing istr:
> I test it on my b2k and here are the results:
> time make V=1 vmlinux (under 2.6.10-rc3-pa3-cvs)
> real 23m23.239s
> user 18m50.141s
> sys 4m21.203s
> 
> time make V=1 vmlinux (under 2.6.10-rc3-pa4-patch clear_user_...)
> real 23m18.552s
> user 18m50.534s
> sys 4m16.903s
> 
> :_(
> 
>> Sorry - I can't see what changed to bring about this improvement.
>> Can you point that out to me?
> 
> that make sense as you rejected previous patche, I will re-write it as 
> #ifdef __LP64__ so; NP
> 
>> I suspect there might be something else involved - perhaps this
>> difference is just within the noise of the test.
>>
> Ok I will make much more run ;-)
> 
>> BTW, I'll be on the road for the rest of the week...I won't be
>> able to pursue this until after I get back. If there are more
>> changes you want to the file, please post them to the list.
>>
> NP ;-) (have a nice travel and be carefull)
> 
> Thanks again for all,
>     Joel
> 
regarding this subject:
Joel Soete wrote:
 > Grant,
 >
 > still some more thought (and btw additional questions)
 >
 > As we choose to keep the same number of insn in this loop, I was looking
 > for something like:
 > #ifdef __LP64__
 > #define STR    std
 > #else
 > #define STR    stw
 > #endif
 >
 > but figure out that already exist and so why not using addtional
 > displacement #define and re-write loop like:
 > --- arch/parisc/kernel/pacache.S.New    2004-12-18 15:39:11.000000000 +0100
 > +++ arch/parisc/kernel/pacache.S    2004-12-18 19:19:53.862854692 +0100
 > @@ -288,6 +288,49 @@
 >
 >      .procend
 >
 > +#ifdef __LP64__
 > +    /* PREFETCH (Write) has not (yet) been proven to help here */
 > +/* #    define    PREFETCHW_OP    ldd        256(%0), %r0 */
 > +
 > +#define INCR    128    /* Loop's INCRement */
 > +#define LN    32    /* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
 > +#define D0    0    /* 1st insn displacement */
 > +#define D1    8    /* 2d insn displacement */
 > +#define D2    16
 > +#define D3    24
 > +#define D4    32
 > +#define D5    40
 > +#define D6    48
 > +#define D7    56
 > +#define D8    64
 > +#define D9    72
 > +#define D10    80
 > +#define D11    88
 > +#define D12    96
 > +#define D13    104
 > +#define D14    112
 > +#define D15    120    /* last insn displacement */
 > +#else    /* !__LP64__ */
 > +#define INCR    64    /* Loop's INCRement */
 > +#define LN    64    /* Loop's Number i.e. /* PAGE_SIZE/INCR == 64 */
 > +#define D0    0    /* 1st insn displacement */
 > +#define D1    4    /* 2d insn displacement */
 > +#define D2    8
 > +#define D3    12
 > +#define D4    16
 > +#define D5    20
 > +#define D6    24
 > +#define D7    28
 > +#define D8    32
 > +#define D9    36
 > +#define D10    40
 > +#define D11    44
 > +#define D12    48
 > +#define D13    52
 > +#define D14    56
 > +#define D15    60    /* last insn displacement */
 > +#endif    /* __LP64__ */
 > +
 >      .export copy_user_page_asm,code
 >
 >  copy_user_page_asm:
 > @@ -502,55 +545,26 @@
 >
 >      pdtlb        0(%r28)
 >
 > -#ifdef __LP64__
 > -    ldi        32, %r1            /* PAGE_SIZE/128 == 32 */
 > -
 > -    /* PREFETCH (Write) has not (yet) been proven to help here */
 > -/* #define    PREFETCHW_OP    ldd        256(%0), %r0 */
 > +    ldi        LN, %r1            /* PAGE_SIZE/INCR == LN */
 >
 > -1:    std        %r0, 0(%r28)
 > -    std        %r0, 8(%r28)
 > -    std        %r0, 16(%r28)
 > -    std        %r0, 24(%r28)
 > -    std        %r0, 32(%r28)
 > -    std        %r0, 40(%r28)
 > -    std        %r0, 48(%r28)
 > -    std        %r0, 56(%r28)
 > -    std        %r0, 64(%r28)
 > -    std        %r0, 72(%r28)
 > -    std        %r0, 80(%r28)
 > -    std        %r0, 88(%r28)
 > -    std        %r0, 96(%r28)
 > -    std        %r0, 104(%r28)
 > -    std        %r0, 112(%r28)
 > -    std        %r0, 120(%r28)
 > +1:    STREG        %r0, D0(%r28)
 > +    STREG        %r0, D1(%r28)
 > +    STREG        %r0, D2(%r28)
 > +    STREG        %r0, D3(%r28)
 > +    STREG        %r0, D4(%r28)
 > +    STREG        %r0, D5(%r28)
 > +    STREG        %r0, D6(%r28)
 > +    STREG        %r0, D7(%r28)
 > +    STREG        %r0, D8(%r28)
 > +    STREG        %r0, D9(%r28)
 > +    STREG        %r0, D10(%r28)
 > +    STREG        %r0, D11(%r28)
 > +    STREG        %r0, D12(%r28)
 > +    STREG        %r0, D13(%r28)
 > +    STREG        %r0, D14(%r28)
 > +    STREG        %r0, D15(%r28)
 >      ADDIB>        -1, %r1, 1b
 > -    ldo        128(%r28), %r28
 > -
 > -#else    /* ! __LP64 */
 > -
 > -    ldi        64, %r1            /* PAGE_SIZE/64 == 64 */
 > -
 > -1:
 > -    stw        %r0, 0(%r28)
 > -    stw        %r0, 4(%r28)
 > -    stw        %r0, 8(%r28)
 > -    stw        %r0, 12(%r28)
 > -    stw        %r0, 16(%r28)
 > -    stw        %r0, 20(%r28)
 > -    stw        %r0, 24(%r28)
 > -    stw        %r0, 28(%r28)
 > -    stw        %r0, 32(%r28)
 > -    stw        %r0, 36(%r28)
 > -    stw        %r0, 40(%r28)
 > -    stw        %r0, 44(%r28)
 > -    stw        %r0, 48(%r28)
 > -    stw        %r0, 52(%r28)
 > -    stw        %r0, 56(%r28)
 > -    stw        %r0, 60(%r28)
 > -    ADDIB>        -1, %r1, 1b
 > -    ldo        64(%r28), %r28
 > -#endif    /* __LP64 */
 > +    ldo        INCR(%r28), %r28
 >
 >      bv        %r0(%r2)
 >      nop
 > =========> arch_parisc_kernel_pacache.S.diff4 <=========
this was definitely rejected:

Grant Grundler wrote:
 > On Sat, Dec 18, 2004 at 07:38:21PM +0000, Joel Soete wrote:
 >
 >>Grant,
 >>
 >>still some more thought (and btw additional questions)
 >>
 >>As we choose to keep the same number of insn in this loop, I was looking
 >>for something like:
 >>#ifdef __LP64__
 >>#define STR	std
 >>#else
 >>#define STR	stw
 >>#endif
 >>
 >>but figure out that already exist and so why not using addtional
 >>displacement #define and re-write loop like:
 >>--- arch/parisc/kernel/pacache.S.New	2004-12-18 15:39:11.000000000 +0100
 >>+++ arch/parisc/kernel/pacache.S	2004-12-18 19:19:53.862854692 +0100
 >>@@ -288,6 +288,49 @@
 >>
 >> 	.procend
 >>
 >>+#ifdef __LP64__
 >>+	/* PREFETCH (Write) has not (yet) been proven to help here */
 >>+/* #	define	PREFETCHW_OP	ldd		256(%0), %r0 */
 >>+
 >>+#define INCR	128	/* Loop's INCRement */
 >>+#define LN	32	/* Loop's Number i.e. /* PAGE_SIZE/INCR == 32 */
 >>+#define D0	0	/* 1st insn displacement */
 >>+#define D1	8	/* 2d insn displacement */
 >
 > ...
 >
 >
 > Sorry - definitely not. That just obscures whats going on.
 >
 > grant
 >

So will have to rewrite this one too
 >
 > mmm but copy_user_page_asm has the same structure and so why not using
 > the same schema:
 > --- arch/parisc/kernel/pacache.S.New1    2004-12-18 19:21:25.503229430
 > +0100
 > +++ arch/parisc/kernel/pacache.S    2004-12-18 19:43:33.184799992 +0100
 > @@ -338,7 +338,7 @@
 >      .callinfo NO_CALLS
 >      .entry
 >
 > -    ldi        64, %r1
 > +    ldi        LN, %r1
 >
 >      /*
 >       * This loop is optimized for PCXL/PCXL2 ldw/ldw and stw/stw
 > @@ -349,43 +349,41 @@
 >       * use ldd/std on a 32 bit kernel.
 >       */
 >
 > -
 > -1:
 > -    ldw        0(%r25), %r19
 > -    ldw        4(%r25), %r20
 > -    ldw        8(%r25), %r21
 > -    ldw        12(%r25), %r22
 > -    stw        %r19, 0(%r26)
 > -    stw        %r20, 4(%r26)
 > -    stw        %r21, 8(%r26)
 > -    stw        %r22, 12(%r26)
 > -    ldw        16(%r25), %r19
 > -    ldw        20(%r25), %r20
 > -    ldw        24(%r25), %r21
 > -    ldw        28(%r25), %r22
 > -    stw        %r19, 16(%r26)
 > -    stw        %r20, 20(%r26)
 > -    stw        %r21, 24(%r26)
 > -    stw        %r22, 28(%r26)
 > -    ldw        32(%r25), %r19
 > -    ldw        36(%r25), %r20
 > -    ldw        40(%r25), %r21
 > -    ldw        44(%r25), %r22
 > -    stw        %r19, 32(%r26)
 > -    stw        %r20, 36(%r26)
 > -    stw        %r21, 40(%r26)
 > -    stw        %r22, 44(%r26)
 > -    ldw        48(%r25), %r19
 > -    ldw        52(%r25), %r20
 > -    ldw        56(%r25), %r21
 > -    ldw        60(%r25), %r22
 > -    stw        %r19, 48(%r26)
 > -    stw        %r20, 52(%r26)
 > -    stw        %r21, 56(%r26)
 > -    stw        %r22, 60(%r26)
 > -    ldo        64(%r26), %r26
 > +1:    LDREG        D0(%r25), %r19
 > +    LDREG        D1(%r25), %r20
 > +    LDREG        D2(%r25), %r21
 > +    LDREG        D3(%r25), %r22
 > +    STREG        %r19, D0(%r26)
 > +    STREG        %r20, D1(%r26)
 > +    STREG        %r21, D2(%r26)
 > +    STREG        %r22, D3(%r26)
 > +    LDREG        D4(%r25), %r19
 > +    LDREG        D5(%r25), %r20
 > +    LDREG        D6(%r25), %r21
 > +    LDREG        D7(%r25), %r22
 > +    STREG        %r19, D4(%r26)
 > +    STREG        %r20, D5(%r26)
 > +    STREG        %r21, D6(%r26)
 > +    STREG        %r22, D7(%r26)
 > +    LDREG        D8(%r25), %r19
 > +    LDREG        D9(%r25), %r20
 > +    LDREG        D10(%r25), %r21
 > +    LDREG        D11(%r25), %r22
 > +    STREG        %r19, D8(%r26)
 > +    STREG        %r20, D9(%r26)
 > +    STREG        %r21, D10(%r26)
 > +    STREG        %r22, D11(%r26)
 > +    LDREG        D12(%r25), %r19
 > +    LDREG        D13(%r25), %r20
 > +    LDREG        D14(%r25), %r21
 > +    LDREG        D15(%r25), %r22
 > +    STREG        %r19, D12(%r26)
 > +    STREG        %r20, D13(%r26)
 > +    STREG        %r21, D14(%r26)
 > +    STREG        %r22, D15(%r26)
 > +    ldo        INCR(%r26), %r26
 >      ADDIB>        -1, %r1, 1b
 > -    ldo        64(%r25), %r25
 > +    ldo        INCR(%r25), %r25
 >
 >      bv        %r0(%r2)
 >      nop
 > =========> arch_parisc_kernel_pacache.S.diff5 <=========
 >
 > And finaly why didn't we have to purge related dtlb entries as we did
 > (see #if 0 below) and we do in clear_user_page_asm:
 > --- arch/parisc/kernel/pacache.S.New2    2004-12-18 19:44:46.684937345
 > +0100
 > +++ arch/parisc/kernel/pacache.S    2004-12-18 20:19:51.113544409 +0100
 > @@ -338,6 +338,11 @@
 >      .callinfo NO_CALLS
 >      .entry
 >
 > +    /* Purge any old translations */
 > +
 > +    pdtlb        0(%r25)
 > +    pdtlb        0(%r26)
 > +
 >      ldi        LN, %r1
 >
 >      /*
 > =========> arch_parisc_kernel_pacache.S.diff6 <=========
 >
But what's about this one?

What you opinion?

Thanks,
	Joel

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

       reply	other threads:[~2004-12-19 20:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20041210190333.GC6653@colo.lackof.org>
     [not found] ` <418A811700010466@mail-8-bnl.mail.tiscali.sys>
     [not found]   ` <20041213180758.GA8705@colo.lackof.org>
     [not found]     ` <41C34C56.4080508@tiscali.be>
     [not found]       ` <20041218073036.GA29003@colo.lackof.org>
     [not found]         ` <41C440A3.6060708@tiscali.be>
     [not found]           ` <41C4872D.6010705@tiscali.be>
     [not found]             ` <41C4A35A.7010003@tiscali.be>
     [not found]               ` <20041219042528.GB15282@colo.lackof.org>
     [not found]                 ` <41C5D761.4030004@tiscali.be>
2004-12-19 20:27                   ` Joel Soete [this message]
     [not found] <418A80E8000124B5@mail-6-bnl.tiscali.it>
2004-12-27  7:36 ` copy_user_page_asm suggested 64bit improvment [Was: [parisc-linux] clear user page test] Grant Grundler
2004-12-27 10:40   ` Joel Soete
2004-12-27 15:08     ` James Bottomley
2004-12-31 20:26       ` Michael S. Zick
2004-12-31 20:56         ` Grant Grundler
2004-12-31 21:35           ` Michael S. Zick
     [not found]             ` <20041231225447.GC23592@colo.lackof.org>
2004-12-31 23:56               ` Michael S. Zick
2005-01-12 13:52               ` Michael S. Zick
2005-01-12 15:32                 ` Joel Soete
2004-12-31 21:21         ` James Bottomley
2004-12-27 17:34     ` Joel Soete
2004-12-27 18:32     ` Joel Soete
2004-12-30  8:10   ` Grant Grundler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41C5E42E.9020602@tiscali.be \
    --to=soete.joel@tiscali.be \
    --cc=parisc-linux@lists.parisc-linux.org \
    --cc=randolph@tausq.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.