From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/2] arm64: Refactor vDSO time functions
Date: Fri, 8 Jul 2016 15:11:44 +0100 [thread overview]
Message-ID: <20160708141143.GB6493@arm.com> (raw)
In-Reply-To: <1462797421-33103-2-git-send-email-kevin.brodsky@arm.com>
Hi Kevin,
On Mon, May 09, 2016 at 01:37:00PM +0100, Kevin Brodsky wrote:
> Time functions are directly implemented in assembly in arm64, and it
> is desirable to keep it this way for performance reasons (everything
> fits in registers, so that the stack is not used at all). However, the
> current implementation is quite difficult to read and understand (even
> considering it's assembly). Additionally, due to the structure of
> __kernel_clock_gettime, which heavily uses conditional branches to
> share code between the different clocks, it is difficult to support a
> new clock without making the branches even harder to follow.
>
> This commit completely refactors the structure of clock_gettime (and
> gettimeofday along the way) while keeping exactly the same algorithms.
> We no longer try to share code; instead, macros provide common
> operations. This new approach comes with a number of advantages:
> - In clock_gettime, clock implementations are no longer interspersed,
> making them much more readable. Additionally, macros only use
> registers passed as arguments or reserved with .req, this way it is
> easy to make sure that registers are properly allocated. To avoid a
> large number of branches in a given execution path, a jump table is
> used; a normal execution uses 3 unconditional branches.
> - __do_get_tspec has been replaced with 2 macros (get_ts_clock_mono,
> get_clock_shifted_nsec) and explicit loading of data from the vDSO
> page. Consequently, clock_gettime and gettimeofday are now leaf
> functions, and saving x30 (lr) is no longer necessary.
> - Variables protected by tb_seq_count are now loaded all at once,
> allowing to merge the seqcnt_read macro into seqcnt_check.
> - For CLOCK_REALTIME_COARSE, removed an unused load of the wall to
> monotonic timespec.
> - For CLOCK_MONOTONIC_COARSE, removed a few shift instructions.
>
> Obviously, the downside of sharing less code is an increase in code
> size. However since the vDSO has its own code page, this does not
> really matter, as long as the size of the DSO remains below 4 kB. For
> now this should be all right:
> Before After
> vdso.so size (B) 2776 2936
>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
> ---
> arch/arm64/kernel/vdso/gettimeofday.S | 282 +++++++++++++++++++---------------
> 1 file changed, 162 insertions(+), 120 deletions(-)
This patch is really hard to read, but after applying it the resulting
code looks really good. Thanks! Just a couple of minor comments inline.
> ENTRY(__kernel_clock_gettime)
> .cfi_startproc
> - cmp w0, #CLOCK_REALTIME
> - ccmp w0, #CLOCK_MONOTONIC, #0x4, ne
> - b.ne 2f
> + cmp w0, #JUMPSLOT_MAX
> + b.hi syscall
> + adr vdso_data, _vdso_data
> + adr x_tmp, jumptable
> + add x_tmp, x_tmp, w0, uxtw #2
> + br x_tmp
> +
> +jumptable:
> + jump_slot jumptable, CLOCK_REALTIME, realtime
> + jump_slot jumptable, CLOCK_MONOTONIC, monotonic
> + b syscall
> + b syscall
> + b syscall
> + jump_slot jumptable, CLOCK_REALTIME_COARSE, realtime_coarse
> + jump_slot jumptable, CLOCK_MONOTONIC_COARSE, monotonic_coarse
The branchiness of this code (into __kernel_clock_gettime, then not taking
the b.hi, the br following by the b in the jumptable) is likely to hurt
most branch predictors. I found that aligning the jumptable and its
subsequent targets helped a bit here.
> +shift_store:
> lsr x11, x11, x12
> +store:
> stp x10, x11, [x1, #TSPEC_TV_SEC]
> mov x0, xzr
> ret
I think it's simpler just to macroise this, which also avoids some of
the branches given that it ends in a ret anyway.
Fixup patch below. If you fold it in, then:
Acked-by: Will Deacon <will.deacon@arm.com>
for the series.
Will
--->8
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index f49b6755058a..85969cd2b2f7 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -115,6 +115,15 @@ x_tmp .req x8
9998:
.endm
+ .macro clock_gettime_return, shift=0
+ .if \shift == 1
+ lsr x11, x11, x12
+ .endif
+ stp x10, x11, [x1, #TSPEC_TV_SEC]
+ mov x0, xzr
+ ret
+ .endm
+
.macro jump_slot jumptable, index, label
.if (. - \jumptable) != 4 * (\index)
.error "Jump slot index mismatch"
@@ -180,6 +189,7 @@ ENTRY(__kernel_clock_gettime)
add x_tmp, x_tmp, w0, uxtw #2
br x_tmp
+ ALIGN
jumptable:
jump_slot jumptable, CLOCK_REALTIME, realtime
jump_slot jumptable, CLOCK_MONOTONIC, monotonic
@@ -193,6 +203,7 @@ jumptable:
.error "Wrong jumptable size"
.endif
+ ALIGN
realtime:
seqcnt_acquire
syscall_check fail=syscall
@@ -209,9 +220,9 @@ realtime:
get_clock_shifted_nsec res=x15, cycle_last=x10, mult=x11
get_ts_realtime res_sec=x10, res_nsec=x11, \
clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
+ clock_gettime_return, shift=1
- b shift_store
-
+ ALIGN
monotonic:
seqcnt_acquire
syscall_check fail=syscall
@@ -232,9 +243,9 @@ monotonic:
clock_nsec=x15, xtime_sec=x13, xtime_nsec=x14, nsec_to_sec=x9
add_ts sec=x10, nsec=x11, ts_sec=x3, ts_nsec=x4, nsec_to_sec=x9
+ clock_gettime_return, shift=1
- b shift_store
-
+ ALIGN
monotonic_raw:
seqcnt_acquire
syscall_check fail=syscall
@@ -254,16 +265,16 @@ monotonic_raw:
clock_nsec=x15, nsec_to_sec=x9
add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9
+ clock_gettime_return, shift=1
- b shift_store
-
+ ALIGN
realtime_coarse:
seqcnt_acquire
ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC]
seqcnt_check fail=realtime_coarse
+ clock_gettime_return
- b store
-
+ ALIGN
monotonic_coarse:
seqcnt_acquire
ldp x10, x11, [vdso_data, #VDSO_XTIME_CRS_SEC]
@@ -273,16 +284,9 @@ monotonic_coarse:
/* Computations are done in (non-shifted) nsecs. */
get_nsec_per_sec res=x9
add_ts sec=x10, nsec=x11, ts_sec=x13, ts_nsec=x14, nsec_to_sec=x9
+ clock_gettime_return
- b store
-
-shift_store:
- lsr x11, x11, x12
-store:
- stp x10, x11, [x1, #TSPEC_TV_SEC]
- mov x0, xzr
- ret
-
+ ALIGN
syscall: /* Syscall fallback. */
mov x8, #__NR_clock_gettime
svc #0
next prev parent reply other threads:[~2016-07-08 14:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-09 12:36 [PATCH 0/2] arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO Kevin Brodsky
2016-05-09 12:37 ` [PATCH 1/2] arm64: Refactor vDSO time functions Kevin Brodsky
2016-06-22 13:24 ` Christopher Covington
2016-07-01 13:46 ` Dave Martin
2016-07-04 17:12 ` Will Deacon
2016-07-08 14:11 ` Will Deacon [this message]
2016-07-11 17:31 ` Kevin Brodsky
2016-07-11 17:42 ` Will Deacon
2016-07-12 9:10 ` Kevin Brodsky
2016-05-09 12:37 ` [PATCH 2/2] arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO Kevin Brodsky
2016-07-01 13:48 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160708141143.GB6493@arm.com \
--to=will.deacon@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.