* [PATCH 0/2] *** SUBJECT HERE *** @ 2016-03-13 19:50 Andrew Pinski 2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski 2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski 0 siblings, 2 replies; 6+ messages in thread From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw) To: linux-arm-kernel *** BLURB HERE *** Andrew Pinski (2): ARM64:VDSO: Improve gettimeofday, don't use udiv ARM64:VDSO: Improve __do_get_tspec, don't use udiv arch/arm64/kernel/vdso/gettimeofday.S | 47 ++++++++++++++++++++++++-------- 1 files changed, 35 insertions(+), 12 deletions(-) -- 1.7.2.5 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv 2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski @ 2016-03-13 19:50 ` Andrew Pinski 2016-03-14 6:55 ` Ard Biesheuvel 2016-03-24 14:36 ` Christopher Covington 2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski 1 sibling, 2 replies; 6+ messages in thread From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw) To: linux-arm-kernel On many cores, udiv with a large value is slow, expand instead the division out to be what GCC would have generated for the divide by 1000. On ThunderX, the speeds up gettimeofday by 5%. Signed-off-by: Andrew Pinski <apinski@cavium.com> --- arch/arm64/kernel/vdso/gettimeofday.S | 20 ++++++++++++++++---- 1 files changed, 16 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index efa79e8..e5caef9 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -64,10 +64,22 @@ ENTRY(__kernel_gettimeofday) bl __do_get_tspec seqcnt_check w9, 1b - /* Convert ns to us. */ - mov x13, #1000 - lsl x13, x13, x12 - udiv x11, x11, x13 + /* Undo the shift. */ + lsr x11, x11, x12 + + /* Convert ns to us (division by 1000 by using multiply high). + * This is how GCC converts the division by 1000 into. + * This is faster than divide on most cores. + */ + mov x13, 63439 + movk x13, 0xe353, lsl 16 + lsr x11, x11, 3 + movk x13, 0x9ba5, lsl 32 + movk x13, 0x20c4, lsl 48 + /* x13 = 0x20c49ba5e353f7cf */ + umulh x11, x11, x13 + lsr x11, x11, 4 + stp x10, x11, [x0, #TVAL_TV_SEC] 2: /* If tz is NULL, return 0. */ -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv 2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski @ 2016-03-14 6:55 ` Ard Biesheuvel 2016-03-10 17:00 ` Mark Rutland 2016-03-24 14:36 ` Christopher Covington 1 sibling, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2016-03-14 6:55 UTC (permalink / raw) To: linux-arm-kernel On 13 March 2016 at 20:50, Andrew Pinski <apinski@cavium.com> wrote: > On many cores, udiv with a large value is slow, expand instead > the division out to be what GCC would have generated for the > divide by 1000. > > On ThunderX, the speeds up gettimeofday by 5%. > > Signed-off-by: Andrew Pinski <apinski@cavium.com> > --- > arch/arm64/kernel/vdso/gettimeofday.S | 20 ++++++++++++++++---- > 1 files changed, 16 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S > index efa79e8..e5caef9 100644 > --- a/arch/arm64/kernel/vdso/gettimeofday.S > +++ b/arch/arm64/kernel/vdso/gettimeofday.S > @@ -64,10 +64,22 @@ ENTRY(__kernel_gettimeofday) > bl __do_get_tspec > seqcnt_check w9, 1b > > - /* Convert ns to us. */ > - mov x13, #1000 > - lsl x13, x13, x12 > - udiv x11, x11, x13 > + /* Undo the shift. */ > + lsr x11, x11, x12 > + > + /* Convert ns to us (division by 1000 by using multiply high). > + * This is how GCC converts the division by 1000 into. > + * This is faster than divide on most cores. > + */ > + mov x13, 63439 Please don't mix hex and decimal constants > + movk x13, 0xe353, lsl 16 > + lsr x11, x11, 3 > + movk x13, 0x9ba5, lsl 32 > + movk x13, 0x20c4, lsl 48 > + /* x13 = 0x20c49ba5e353f7cf */ Could we clean this up a bit? Something along the lines of .set m, 0x20c49ba5e353f7cf movz x13,#:abs_g3:m movk x13, #:abs:g2_nc:m movk x13, #:abs_g1_nc:m movk x13, #:abs_g0_nc:m Actually, the movz/movk sequence should probably be implemented as a macro in asm/assembler.h, with parameters for the register and the symbol name. I think Mark proposed such a patch at some point > + umulh x11, x11, x13 > + lsr x11, x11, 4 > + > stp x10, x11, [x0, #TVAL_TV_SEC] > 2: > /* If tz is NULL, return 0. */ > -- > 1.7.2.5 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv 2016-03-14 6:55 ` Ard Biesheuvel @ 2016-03-10 17:00 ` Mark Rutland 0 siblings, 0 replies; 6+ messages in thread From: Mark Rutland @ 2016-03-10 17:00 UTC (permalink / raw) To: linux-arm-kernel On Mon, Mar 14, 2016 at 07:55:38AM +0100, Ard Biesheuvel wrote: > On 13 March 2016 at 20:50, Andrew Pinski <apinski@cavium.com> wrote: > > + movk x13, 0xe353, lsl 16 > > + lsr x11, x11, 3 > > + movk x13, 0x9ba5, lsl 32 > > + movk x13, 0x20c4, lsl 48 > > + /* x13 = 0x20c49ba5e353f7cf */ > > Could we clean this up a bit? Something along the lines of > > .set m, 0x20c49ba5e353f7cf > movz x13,#:abs_g3:m > movk x13, #:abs:g2_nc:m > movk x13, #:abs_g1_nc:m > movk x13, #:abs_g0_nc:m > > Actually, the movz/movk sequence should probably be implemented as a > macro in asm/assembler.h, with parameters for the register and the > symbol name. Agreed. > I think Mark proposed such a patch at some point That would be [1], which needs the relocations fixed up [2,3] to match the above. I didn't respin that as it turned out to be unnecessary at the time, but I'm more than happy for someone to pick it up. Mark. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397563.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397572.html [3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397573.html ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv 2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski 2016-03-14 6:55 ` Ard Biesheuvel @ 2016-03-24 14:36 ` Christopher Covington 1 sibling, 0 replies; 6+ messages in thread From: Christopher Covington @ 2016-03-24 14:36 UTC (permalink / raw) To: linux-arm-kernel On 03/13/2016 03:50 PM, Andrew Pinski wrote: > On many cores, udiv with a large value is slow, expand instead > the division out to be what GCC would have generated for the > divide by 1000. This like checking object code into version control. Why not write in C and let GCC perform the generation? Thanks, Cov -- Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, don't use udiv 2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski 2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski @ 2016-03-13 19:50 ` Andrew Pinski 1 sibling, 0 replies; 6+ messages in thread From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw) To: linux-arm-kernel In most other targets (x86/tile for an example), the division in __do_get_tspec is converted into a simple loop. The main reason for this is because the result of this division is going to be either 0 or 1. This changes the division to the simple loop and thus speeding up gettimeofday. On ThunderX, this speeds up gettimeofday by 16.6%. Signed-off-by: Andrew Pinski <apinski@cavium.com> --- arch/arm64/kernel/vdso/gettimeofday.S | 27 +++++++++++++++++++-------- 1 files changed, 19 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index e5caef9..28f4da7 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -246,14 +246,25 @@ ENTRY(__do_get_tspec) mul x10, x10, x11 /* Use the kernel time to calculate the new timespec. */ - mov x11, #NSEC_PER_SEC_LO16 - movk x11, #NSEC_PER_SEC_HI16, lsl #16 - lsl x11, x11, x12 - add x15, x10, x14 - udiv x14, x15, x11 - add x10, x13, x14 - mul x13, x14, x11 - sub x11, x15, x13 + mov x15, #NSEC_PER_SEC_LO16 + movk x15, #NSEC_PER_SEC_HI16, lsl #16 + lsl x15, x15, x12 + add x11, x10, x14 + mov x10, x13 + + /* + * Use a loop instead of a division as this is most + * likely going to be only giving a 1 or 0 and that is faster + * than a division. + */ + cmp x11, x15 + b.lt 1f +2: + sub x11, x11, x15 + add x10, x10, 1 + cmp x11, x15 + b.ge 2b +1: ret .cfi_endproc -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-24 14:36 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski 2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski 2016-03-14 6:55 ` Ard Biesheuvel 2016-03-10 17:00 ` Mark Rutland 2016-03-24 14:36 ` Christopher Covington 2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).