linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv
  2016-03-14  6:55   ` Ard Biesheuvel
@ 2016-03-10 17:00     ` Mark Rutland
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Rutland @ 2016-03-10 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 14, 2016 at 07:55:38AM +0100, Ard Biesheuvel wrote:
> On 13 March 2016 at 20:50, Andrew Pinski <apinski@cavium.com> wrote:
> > +        movk    x13, 0xe353, lsl 16
> > +        lsr     x11, x11, 3
> > +        movk    x13, 0x9ba5, lsl 32
> > +        movk    x13, 0x20c4, lsl 48
> > +       /* x13 = 0x20c49ba5e353f7cf */
> 
> Could we clean this up a bit? Something along the lines of
> 
> .set m, 0x20c49ba5e353f7cf
> movz x13,#:abs_g3:m
> movk x13, #:abs:g2_nc:m
> movk x13, #:abs_g1_nc:m
> movk x13, #:abs_g0_nc:m
> 
> Actually, the movz/movk sequence should probably be implemented as a
> macro in asm/assembler.h, with parameters for the register and the
> symbol name.

Agreed.

> I think Mark proposed such a patch at some point

That would be [1], which needs the relocations fixed up [2,3] to match the
above.

I didn't respin that as it turned out to be unnecessary at the time, but I'm
more than happy for someone to pick it up.

Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397563.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397572.html
[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/397573.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 0/2] *** SUBJECT HERE ***
@ 2016-03-13 19:50 Andrew Pinski
  2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski
  2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

*** BLURB HERE ***

Andrew Pinski (2):
  ARM64:VDSO: Improve gettimeofday, don't use udiv
  ARM64:VDSO: Improve __do_get_tspec, don't use udiv

 arch/arm64/kernel/vdso/gettimeofday.S |   47 ++++++++++++++++++++++++--------
 1 files changed, 35 insertions(+), 12 deletions(-)

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv
  2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski
@ 2016-03-13 19:50 ` Andrew Pinski
  2016-03-14  6:55   ` Ard Biesheuvel
  2016-03-24 14:36   ` Christopher Covington
  2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski
  1 sibling, 2 replies; 7+ messages in thread
From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On many cores, udiv with a large value is slow, expand instead
the division out to be what GCC would have generated for the
divide by 1000.

On ThunderX, the speeds up gettimeofday by 5%.

Signed-off-by: Andrew Pinski <apinski@cavium.com>
---
 arch/arm64/kernel/vdso/gettimeofday.S |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index efa79e8..e5caef9 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -64,10 +64,22 @@ ENTRY(__kernel_gettimeofday)
 	bl	__do_get_tspec
 	seqcnt_check w9, 1b
 
-	/* Convert ns to us. */
-	mov	x13, #1000
-	lsl	x13, x13, x12
-	udiv	x11, x11, x13
+	/* Undo the shift. */
+        lsr     x11, x11, x12
+
+	/* Convert ns to us (division by 1000 by using multiply high).
+	 * This is how GCC converts the division by 1000 into.
+	 * This is faster than divide on most cores.
+	 */
+        mov     x13, 63439
+        movk    x13, 0xe353, lsl 16
+        lsr     x11, x11, 3
+        movk    x13, 0x9ba5, lsl 32
+        movk    x13, 0x20c4, lsl 48
+	/* x13 = 0x20c49ba5e353f7cf */
+        umulh   x11, x11, x13
+        lsr     x11, x11, 4
+
 	stp	x10, x11, [x0, #TVAL_TV_SEC]
 2:
 	/* If tz is NULL, return 0. */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, don't use udiv
  2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski
  2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski
@ 2016-03-13 19:50 ` Andrew Pinski
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Pinski @ 2016-03-13 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

In most other targets (x86/tile for an example),
the division in __do_get_tspec is converted into
a simple loop.  The main reason for this is
because the result of this division is going
to be either 0 or 1.
This changes the division to the simple loop
and thus speeding up gettimeofday.

On ThunderX, this speeds up gettimeofday by 16.6%.

Signed-off-by: Andrew Pinski <apinski@cavium.com>
---
 arch/arm64/kernel/vdso/gettimeofday.S |   27 +++++++++++++++++++--------
 1 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index e5caef9..28f4da7 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -246,14 +246,25 @@ ENTRY(__do_get_tspec)
 	mul	x10, x10, x11
 
 	/* Use the kernel time to calculate the new timespec. */
-	mov	x11, #NSEC_PER_SEC_LO16
-	movk	x11, #NSEC_PER_SEC_HI16, lsl #16
-	lsl	x11, x11, x12
-	add	x15, x10, x14
-	udiv	x14, x15, x11
-	add	x10, x13, x14
-	mul	x13, x14, x11
-	sub	x11, x15, x13
+	mov x15, #NSEC_PER_SEC_LO16
+	movk x15, #NSEC_PER_SEC_HI16, lsl #16
+	lsl x15, x15, x12
+	add x11, x10, x14
+	mov x10, x13
+
+	/*
+	 * Use a loop instead of a division as this is most
+	 * likely going to be only giving a 1 or 0 and that is faster
+	 * than a division.
+	 */
+	cmp x11, x15
+	b.lt 1f
+2:
+	sub x11, x11, x15
+	add x10, x10, 1
+	cmp x11, x15
+	b.ge 2b
+1:
 
 	ret
 	.cfi_endproc
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv
  2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski
@ 2016-03-14  6:55   ` Ard Biesheuvel
  2016-03-10 17:00     ` Mark Rutland
  2016-03-24 14:36   ` Christopher Covington
  1 sibling, 1 reply; 7+ messages in thread
From: Ard Biesheuvel @ 2016-03-14  6:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 March 2016 at 20:50, Andrew Pinski <apinski@cavium.com> wrote:
> On many cores, udiv with a large value is slow, expand instead
> the division out to be what GCC would have generated for the
> divide by 1000.
>
> On ThunderX, the speeds up gettimeofday by 5%.
>
> Signed-off-by: Andrew Pinski <apinski@cavium.com>
> ---
>  arch/arm64/kernel/vdso/gettimeofday.S |   20 ++++++++++++++++----
>  1 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
> index efa79e8..e5caef9 100644
> --- a/arch/arm64/kernel/vdso/gettimeofday.S
> +++ b/arch/arm64/kernel/vdso/gettimeofday.S
> @@ -64,10 +64,22 @@ ENTRY(__kernel_gettimeofday)
>         bl      __do_get_tspec
>         seqcnt_check w9, 1b
>
> -       /* Convert ns to us. */
> -       mov     x13, #1000
> -       lsl     x13, x13, x12
> -       udiv    x11, x11, x13
> +       /* Undo the shift. */
> +        lsr     x11, x11, x12
> +
> +       /* Convert ns to us (division by 1000 by using multiply high).
> +        * This is how GCC converts the division by 1000 into.
> +        * This is faster than divide on most cores.
> +        */
> +        mov     x13, 63439

Please don't mix hex and decimal constants

> +        movk    x13, 0xe353, lsl 16
> +        lsr     x11, x11, 3
> +        movk    x13, 0x9ba5, lsl 32
> +        movk    x13, 0x20c4, lsl 48
> +       /* x13 = 0x20c49ba5e353f7cf */

Could we clean this up a bit? Something along the lines of

.set m, 0x20c49ba5e353f7cf
movz x13,#:abs_g3:m
movk x13, #:abs:g2_nc:m
movk x13, #:abs_g1_nc:m
movk x13, #:abs_g0_nc:m

Actually, the movz/movk sequence should probably be implemented as a
macro in asm/assembler.h, with parameters for the register and the
symbol name. I think Mark proposed such a patch at some point

> +        umulh   x11, x11, x13
> +        lsr     x11, x11, 4
> +
>         stp     x10, x11, [x0, #TVAL_TV_SEC]
>  2:
>         /* If tz is NULL, return 0. */
> --
> 1.7.2.5
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv
  2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski
  2016-03-14  6:55   ` Ard Biesheuvel
@ 2016-03-24 14:36   ` Christopher Covington
  1 sibling, 0 replies; 7+ messages in thread
From: Christopher Covington @ 2016-03-24 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/13/2016 03:50 PM, Andrew Pinski wrote:
> On many cores, udiv with a large value is slow, expand instead
> the division out to be what GCC would have generated for the
> divide by 1000.

This like checking object code into version control. Why not write in C
and let GCC perform the generation?

Thanks,
Cov

-- 
Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] *** SUBJECT HERE ***
       [not found] <1618560476-28908-1-git-send-email-taozha@codeaurora.org>
@ 2021-04-16  8:11 ` Greg Kroah-Hartman
  0 siblings, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2021-04-16  8:11 UTC (permalink / raw)
  To: Tao Zhang
  Cc: Mathieu Poirier, Suzuki K Poulose, Alexander Shishkin, Mike Leach,
	Leo Yan, coresight, linux-arm-kernel, linux-kernel, Tingwei Zhang,
	Mao Jinlong, Yuanfang Zhang

On Fri, Apr 16, 2021 at 04:07:54PM +0800, Tao Zhang wrote:
> *** BLURB HERE ***

Where is the blurb?

And your subject is not ok :(


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-16  8:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-13 19:50 [PATCH 0/2] *** SUBJECT HERE *** Andrew Pinski
2016-03-13 19:50 ` [PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv Andrew Pinski
2016-03-14  6:55   ` Ard Biesheuvel
2016-03-10 17:00     ` Mark Rutland
2016-03-24 14:36   ` Christopher Covington
2016-03-13 19:50 ` [PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, " Andrew Pinski
     [not found] <1618560476-28908-1-git-send-email-taozha@codeaurora.org>
2021-04-16  8:11 ` [PATCH 0/2] *** SUBJECT HERE *** Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).