[PATCH 0/4] clock_gettime_ns and x86-64 optimizations

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] clock_gettime_ns and x86-64 optimizations
@ 2011-12-25 16:50 Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-25 16:50 UTC (permalink / raw)
  To: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Richard Cochran,
	Andy Lutomirski

On x86-64, clock_gettime is so fast that the overhead converting to and
from nanoseconds is non-negligible.  clock_gettime_ns is a different and
faster interface.

Patch 1 adds the syscall and wires it up on x86-64.  Patch 2 implements
the corresponding vdso entry on x86-64.  Patch 3 optimizes the vdso
call, and patch 4 is a trivial change that speeds up the vdso
clock_gettime and clock_gettime_ns implementations.

The vdso timings are (on an 800MHz Sandy Bridge mobile):

Basic implementation:

realtime 77.4ns
monotonic 79.2ns
realtime_coarse 18.1ns
monotonic_coarse 22.0ns

realtime_ns 84.9ns
monotonic_ns 85.1ns
realtime_coarse_ns 19.49
monotonic_coarse_ns 27.32

Optimized implementation:

realtime 78.5ns
monotonic 77.4ns [a little faster -- maybe significant]
realtime_coarse 18.4ns
monotonic_coarse 19.4ns

realtime_ns 77.85ns [a nice improvement]
monotonic_ns 77.75ns [ditto]
realtime_coarse_ns 18.2ns
monotonic_coarse_ns 18.2ns [a lot faster]

Inlined (patch 4): [everything is improved]

realtime 73.4ns
monotonic 72.1ns
realtime_coarse 13.2ns
monotonic_coarse 15.8ns

realtime_ns 73.15ns
monotonic_ns 72.1ns
realtime_coarse_ns 14.1ns
monotonic_coarse_ns 15.6ns

This being the middle of the holidays, I reserve the right to have made
mistakes.

For the git-inclined, this series is at
https://git.kernel.org/?p=linux/kernel/git/luto/linux.git;a=shortlog;h=refs/heads/timing/clock_gettime_ns/patch_v1

Andy Lutomirski (4):
  Add clock_gettime_ns syscall
  x86-64: Add __vdso_clock_gettime_ns vsyscall
  x86-64: Optimize vdso clock_gettime
  x86-64: Inline vdso clock_gettime helpers

 arch/x86/include/asm/unistd_64.h |    2 +
 arch/x86/include/asm/vgtod.h     |   21 ++++--
 arch/x86/kernel/vsyscall_64.c    |   25 +++++++-
 arch/x86/vdso/vclock_gettime.c   |  136 ++++++++++++++++++++++++++------------
 arch/x86/vdso/vdso.lds.S         |    7 ++
 include/linux/syscalls.h         |    3 +
 include/linux/time.h             |    5 ++
 kernel/posix-timers.c            |   30 ++++++++
 8 files changed, 179 insertions(+), 50 deletions(-)

-- 
1.7.7.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] Add clock_gettime_ns syscall
  2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
@ 2011-12-25 16:51 ` Andy Lutomirski
  2011-12-27  7:25   ` Richard Cochran
  2011-12-28 19:02   ` Arun Sharma
  2011-12-25 16:51 ` [PATCH 2/4] x86-64: Add __vdso_clock_gettime_ns vsyscall Andy Lutomirski
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-25 16:51 UTC (permalink / raw)
  To: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Richard Cochran,
	Andy Lutomirski

On some architectures, clock_gettime is fast enough that converting
between nanoseconds and struct timespec takes a significant amount
of time.  Introduce a new syscall that does the same thing but
returns the answer in nanoseconds.  2^64 nanoseconds since the epoch
won't wrap around until the year 2554, and by then we can use
128-bit types.

clock_gettime_ns returns an unsigned nanosecond count.  It will wrap
when the time from whatever clock is being read exceeds about 584
years.  For CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc, this is unlikely
to be a problem.  For CLOCK_REALTIME, either user code can check for
wraparound or can switch to 128-bit integers in a little over 500
years.

This interface intentionally does not support sub-nanosecond
precision.  For one thing, light only travels about a foot per
nanosecond, so nanoseconds are really pretty good for networking
purposes.  For another, 2^64 picoseconds (say) is a short enough
interval to be inconvenient.  If anyone needs sub-nanosecond
precision for anything other than profiling, they're welcome to
figure out an appropriate interface.  For very precise profiling,
"what time is it" is the wrong question, anyway -- modern CPUs can
reorder things across time scales much longer than a nanosecond.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/unistd_64.h |    2 ++
 include/linux/syscalls.h         |    3 +++
 include/linux/time.h             |    5 +++++
 kernel/posix-timers.c            |   30 ++++++++++++++++++++++++++++++
 4 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
index 2010405..3a48069 100644
--- a/arch/x86/include/asm/unistd_64.h
+++ b/arch/x86/include/asm/unistd_64.h
@@ -683,6 +683,8 @@ __SYSCALL(__NR_sendmmsg, sys_sendmmsg)
 __SYSCALL(__NR_setns, sys_setns)
 #define __NR_getcpu				309
 __SYSCALL(__NR_getcpu, sys_getcpu)
+#define __NR_clock_gettime_ns			310
+__SYSCALL(__NR_clock_gettime_ns, sys_clock_gettime_ns)
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 1ff0ec2..89cb897 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -49,6 +49,7 @@ struct statfs;
 struct statfs64;
 struct __sysctl_args;
 struct sysinfo;
+struct timens;
 struct timespec;
 struct timeval;
 struct timex;
@@ -316,6 +317,8 @@ asmlinkage long sys_clock_settime(clockid_t which_clock,
 				const struct timespec __user *tp);
 asmlinkage long sys_clock_gettime(clockid_t which_clock,
 				struct timespec __user *tp);
+asmlinkage long sys_clock_gettime_ns(clockid_t which_clock,
+				struct timens __user *tp);
 asmlinkage long sys_clock_adjtime(clockid_t which_clock,
 				struct timex __user *tx);
 asmlinkage long sys_clock_getres(clockid_t which_clock,
diff --git a/include/linux/time.h b/include/linux/time.h
index b306178..d4488b1 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -27,6 +27,11 @@ struct timezone {
 	int	tz_dsttime;	/* type of dst correction */
 };
 
+struct timens {
+	u64	ns;		/* nanoseconds since the relevant epoch */
+	u64	padding;	/* for future expansion (UTC offset? sub-ns?) */
+};
+
 #ifdef __KERNEL__
 
 extern struct timezone sys_tz;
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 4556182..43bc842 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -980,6 +980,36 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
 	return error;
 }
 
+SYSCALL_DEFINE2(clock_gettime_ns, const clockid_t, which_clock,
+		struct timens __user *, tp)
+{
+	/*
+	 * This implementation isn't as fast as it could be, but the syscall
+	 * entry will take much longer than the unnecessary division and
+	 * multiplication.  Arch-specific implementations can be made faster.
+	 */
+
+	struct k_clock *kc = clockid_to_kclock(which_clock);
+	struct timespec kernel_timespec;
+	struct timens timens;
+	int error;
+
+	if (!kc)
+		return -EINVAL;
+
+	error = kc->clock_get(which_clock, &kernel_timespec);
+
+	if (!error) {
+		timens.ns = kernel_timespec.tv_sec * NSEC_PER_SEC
+			+ kernel_timespec.tv_nsec;
+		timens.padding = 0;
+
+		error = copy_to_user(tp, &timens, sizeof(timens));
+	}
+
+	return error;
+}
+
 SYSCALL_DEFINE2(clock_adjtime, const clockid_t, which_clock,
 		struct timex __user *, utx)
 {
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] Add clock_gettime_ns syscall
  2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
@ 2011-12-27  7:25   ` Richard Cochran
  2011-12-28 19:02   ` Arun Sharma
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Cochran @ 2011-12-27  7:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner

On Sun, Dec 25, 2011 at 08:51:00AM -0800, Andy Lutomirski wrote:
> 
> This interface intentionally does not support sub-nanosecond
> precision.  For one thing, light only travels about a foot per
> nanosecond, so nanoseconds are really pretty good for networking
> purposes.  For another, 2^64 picoseconds (say) is a short enough
> interval to be inconvenient.  If anyone needs sub-nanosecond
> precision for anything other than profiling, they're welcome to
> figure out an appropriate interface.  For very precise profiling,
> "what time is it" is the wrong question, anyway -- modern CPUs can
> reorder things across time scales much longer than a nanosecond.

This paragraph probably should be updated, since you did add a padding
field for future sub-nanosecond reporting.

Also, on re-reading this, it seems silly to be to introduce a function
called "clock_gettime_ns" and then to say that this function cannot
answer the question, what time is it?

If the interface is a profiling only interface, then the name should
reflect this fact.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] Add clock_gettime_ns syscall
  2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
  2011-12-27  7:25   ` Richard Cochran
@ 2011-12-28 19:02   ` Arun Sharma
       [not found]     ` <CALCETrVz1ADNxeLzPmeWXPU5ApfKURH2vnged2A2Vng8-hUxcw@mail.gmail.com>
  1 sibling, 1 reply; 11+ messages in thread
From: Arun Sharma @ 2011-12-28 19:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kumar Sundararajan, john stultz, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Richard Cochran

On 12/25/11 8:51 AM, Andy Lutomirski wrote:

> +struct timens {
> +	u64	ns;		/* nanoseconds since the relevant epoch */
> +	u64	padding;	/* for future expansion (UTC offset? sub-ns?) */
> +};
>..
> +SYSCALL_DEFINE2(clock_gettime_ns, const clockid_t, which_clock,
> +		struct timens __user *, tp)
> +{

How about returning a (signed) long as the time in ns? This way, we save 
a store and a load and the value can be passed in registers.

This shouldn't preclude future expansion via extra args.

  -Arun

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <CALCETrVz1ADNxeLzPmeWXPU5ApfKURH2vnged2A2Vng8-hUxcw@mail.gmail.com>]

* Re: [PATCH 1/4] Add clock_gettime_ns syscall
       [not found]     ` <CALCETrVz1ADNxeLzPmeWXPU5ApfKURH2vnged2A2Vng8-hUxcw@mail.gmail.com>
@ 2011-12-28 22:45       ` Arun Sharma
  2011-12-28 23:42         ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Sharma @ 2011-12-28 22:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Arun Sharma, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Kumar Sundararajan, linux-kernel, john stultz, Richard Cochran

On Wed, Dec 28, 2011 at 12:13:37PM -0800, Andy Lutomirski wrote:
> > How about returning a (signed) long as the time in ns? This way, we save
> a store and a load and the value can be passed in registers.
> >
> > This shouldn't preclude future expansion via extra args.
> 
> With an unconditional store to a pointer?  If a null pointer is allowed,
> the branch will probably kill any performance gain.

No - please see the patch below.

> 
> The downside is that this is probably a non-starter for a syscall on 32-bit
> architectures.  I'll see what the i386 ABI says.  I wonder if returning a
> struct will use registers for future expansion.
> 

I was thinking of doing something similar to lseek() on 32 bit archs
(i.e. by using a type similar to off_t that maps to the right thing for
both 32 and 64 bit).

I used the code below to benchmark the performance of clock_gettime()
vs clock_gettime_ns() when the client is interested in a nanosec based
interface.

gettimespec: 3.19 secs
getns: 2.54 secs (21% faster)

 -Arun

PS: I didn't have to delete struct timens. I meant to drop timens.ns
(since its the return value now).

>From 2a9bb81b56c2034f444c3caa6cf8fbfd47f1d888 Mon Sep 17 00:00:00 2001
From: Arun Sharma <asharma@fb.com>
Date: Wed, 28 Dec 2011 11:10:37 -0800
Subject: [PATCH] Make clock_gettime_ns return nanosecs

This should speed things up a bit, while leaving room for
future expansion (eg: return additional info in a struct
passed in as an extra arg).

Signed-off-by: Arun Sharma <asharma@fb.com>
---
 arch/x86/vdso/vclock_gettime.c |   22 ++++++----------------
 include/linux/syscalls.h       |    3 +--
 include/linux/time.h           |    5 -----
 kernel/posix-timers.c          |   11 ++++-------
 4 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index f9c08b2..41f613c 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -219,7 +219,7 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
-notrace int __vdso_clock_gettime_ns(clockid_t clock, struct timens *t)
+notrace long __vdso_clock_gettime_ns(clockid_t clock)
 {
 	struct timespec ts;
 	int error;
@@ -227,35 +227,25 @@ notrace int __vdso_clock_gettime_ns(clockid_t clock, struct timens *t)
 	switch (clock) {
 	case CLOCK_REALTIME:
 		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
-			t->ns = do_realtime_ns();
-			t->padding = 0;
-			return 0;
+			return do_realtime_ns();
 		}
 		break;
 	case CLOCK_MONOTONIC:
 		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
-			t->ns = do_monotonic_ns();
-			t->padding = 0;
-			return 0;
+			return do_monotonic_ns();
 		}
 		break;
 	case CLOCK_REALTIME_COARSE:
-		t->ns = do_realtime_coarse_ns();
-		t->padding = 0;
-		return 0;
+		return do_realtime_coarse_ns();
 	case CLOCK_MONOTONIC_COARSE:
-		t->ns = do_monotonic_coarse_ns();
-		t->padding = 0;
-		return 0;
+		return do_monotonic_coarse_ns();
 	}
 
 	error = vdso_fallback_gettime(clock, &ts);
 	if (error)
 		return error;
 
-	t->ns = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
-	t->padding = 0;
-	return 0;
+	return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
 }
 
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b351ab6..63759aa 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -317,8 +317,7 @@ asmlinkage long sys_clock_settime(clockid_t which_clock,
 				const struct timespec __user *tp);
 asmlinkage long sys_clock_gettime(clockid_t which_clock,
 				struct timespec __user *tp);
-asmlinkage long sys_clock_gettime_ns(clockid_t which_clock,
-				struct timens __user *tp);
+asmlinkage long sys_clock_gettime_ns(clockid_t which_clock);
 asmlinkage long sys_clock_adjtime(clockid_t which_clock,
 				struct timex __user *tx);
 asmlinkage long sys_clock_getres(clockid_t which_clock,
diff --git a/include/linux/time.h b/include/linux/time.h
index d4488b1..b306178 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -27,11 +27,6 @@ struct timezone {
 	int	tz_dsttime;	/* type of dst correction */
 };
 
-struct timens {
-	u64	ns;		/* nanoseconds since the relevant epoch */
-	u64	padding;	/* for future expansion (UTC offset? sub-ns?) */
-};
-
 #ifdef __KERNEL__
 
 extern struct timezone sys_tz;
diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 1b6ad2d..b87e3dc 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -980,8 +980,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
 	return error;
 }
 
-SYSCALL_DEFINE2(clock_gettime_ns, const clockid_t, which_clock,
-		struct timens __user *, tp)
+SYSCALL_DEFINE1(clock_gettime_ns, const clockid_t, which_clock)
 {
 	/*
 	 * This implementation isn't as fast as it could be, but the syscall
@@ -991,8 +990,8 @@ SYSCALL_DEFINE2(clock_gettime_ns, const clockid_t, which_clock,
 
 	struct k_clock *kc = clockid_to_kclock(which_clock);
 	struct timespec kernel_timespec;
-	struct timens timens;
 	int error;
+	long ns;
 
 	if (!kc)
 		return -EINVAL;
@@ -1000,11 +999,9 @@ SYSCALL_DEFINE2(clock_gettime_ns, const clockid_t, which_clock,
 	error = kc->clock_get(which_clock, &kernel_timespec);
 
 	if (!error) {
-		timens.ns = kernel_timespec.tv_sec * NSEC_PER_SEC
+		ns = kernel_timespec.tv_sec * NSEC_PER_SEC
 			+ kernel_timespec.tv_nsec;
-		timens.padding = 0;
-
-		error = copy_to_user(tp, &timens, sizeof(timens));
+		return ns;
 	}
 
 	return error;
-- 
1.7.4

==== getns.c ====

#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/syscall.h>

volatile int sum;

int
main(int argc, char *argv[])
{
        unsigned long v;
        void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
        unsigned long (*vdso_func)();
        void *vp;
	int i;

        if (!vdso) {
                printf("Warning: failed to find vDSO\n");
                return;
        }

        vdso_func = dlsym(vdso, "__vdso_clock_gettime_ns");
        if (!vdso_func) {
                printf("Warning: failed to find vdso_func in vDSO\n");
                return;
        }
        v = (*vdso_func)(CLOCK_REALTIME);
	printf("%lx %ld\n", v, v);
	for (i = 0; i < 100000000; i++) {
		sum += (*vdso_func)(CLOCK_REALTIME);
	}
        v = (*vdso_func)(CLOCK_REALTIME);
	printf("%lx %ld\n", v, v);
}

=== gettimespec.c ===

#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/syscall.h>

#define NSECS_PER_SEC 1000000000

volatile int sum;

int
main(int argc, char *argv[])
{
        unsigned long v;
        void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL | RTLD_NOLOAD);
        unsigned long (*vdso_func)();
        void *vp;
	int i;
	struct timespec ts;

        if (!vdso) {
                printf("Warning: failed to find vDSO\n");
                return;
        }

        vdso_func = dlsym(vdso, "__vdso_clock_gettime");
        if (!vdso_func) {
                printf("Warning: failed to find vdso_func in vDSO\n");
                return;
        }
        v = (*vdso_func)(CLOCK_REALTIME, &ts);
        v = ts.tv_sec * NSECS_PER_SEC + ts.tv_nsec;
	printf("%lx %ld\n", v, v);
	for (i = 0; i < 100000000; i++) {
		(*vdso_func)(CLOCK_REALTIME, &ts);
		 v = ts.tv_sec * NSECS_PER_SEC + ts.tv_nsec;
		 sum +=v;
	}
        v = (*vdso_func)(CLOCK_REALTIME, &ts);
	v = ts.tv_sec * NSECS_PER_SEC + ts.tv_nsec;
	printf("%lx %ld\n", v, v);
}


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] Add clock_gettime_ns syscall
  2011-12-28 22:45       ` Arun Sharma
@ 2011-12-28 23:42         ` Andy Lutomirski
  2011-12-29  0:19           ` Arun Sharma
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-28 23:42 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Kumar Sundararajan,
	linux-kernel, john stultz, Richard Cochran

On Wed, Dec 28, 2011 at 2:45 PM, Arun Sharma <asharma@fb.com> wrote:
> On Wed, Dec 28, 2011 at 12:13:37PM -0800, Andy Lutomirski wrote:
>> > How about returning a (signed) long as the time in ns? This way, we save
>> a store and a load and the value can be passed in registers.
>> >
>> > This shouldn't preclude future expansion via extra args.
>>
>> With an unconditional store to a pointer?  If a null pointer is allowed,
>> the branch will probably kill any performance gain.
>
> No - please see the patch below.
>
>>
>> The downside is that this is probably a non-starter for a syscall on 32-bit
>> architectures.  I'll see what the i386 ABI says.  I wonder if returning a
>> struct will use registers for future expansion.
>>
>
> I was thinking of doing something similar to lseek() on 32 bit archs
> (i.e. by using a type similar to off_t that maps to the right thing for
> both 32 and 64 bit).

Huh?  That nanosecond count really needs to be 64 bits (or more).
2^32 nanoseconds is a rather short time.

>
> I used the code below to benchmark the performance of clock_gettime()
> vs clock_gettime_ns() when the client is interested in a nanosec based
> interface.
>
> gettimespec: 3.19 secs
> getns: 2.54 secs (21% faster)
>
>  -Arun
>
> PS: I didn't have to delete struct timens. I meant to drop timens.ns
> (since its the return value now).

How are you keeping the ability to add extra fields?

FWIW, returning a long long on i386 generates nice code (two registers
used), but returning a struct bigger than 64 bits uses a pointer
passed in via the stack (i.e. worse than passing a pointer as a
parameter).  Returning a struct with two 64-bit ints on x86-64 uses
registers, which is rather unfortunate.

Something like:

u64 __vdso_clock_gettime_ns(clock_t clockid, struct timens_extra *extra)
{
  extra->padding = 0;
  return <the time>;
}


might be okay.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] Add clock_gettime_ns syscall
  2011-12-28 23:42         ` Andy Lutomirski
@ 2011-12-29  0:19           ` Arun Sharma
  0 siblings, 0 replies; 11+ messages in thread
From: Arun Sharma @ 2011-12-29  0:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Kumar Sundararajan,
	linux-kernel, john stultz, Richard Cochran

On 12/28/11 3:42 PM, Andy Lutomirski wrote:

> Something like:
>
> u64 __vdso_clock_gettime_ns(clock_t clockid, struct timens_extra *extra)
> {
>    extra->padding = 0;
>    return<the time>;
> }
>
>
> might be okay.

I think you want s64 as the return type (for both 32 and 64 bit). This 
is what I was trying to suggest as well.

  -Arun (Clicks the virtually non-existent like button)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] x86-64: Add __vdso_clock_gettime_ns vsyscall
  2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
@ 2011-12-25 16:51 ` Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 3/4] x86-64: Optimize vdso clock_gettime Andy Lutomirski
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-25 16:51 UTC (permalink / raw)
  To: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Richard Cochran,
	Andy Lutomirski

This is just for the ABI.  The next patch optimizes the implementation.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/vdso/vclock_gettime.c |   70 ++++++++++++++++++++++++++++++---------
 arch/x86/vdso/vdso.lds.S       |    7 ++++
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 6bc0e72..c8a2b46 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -82,7 +82,7 @@ notrace static inline long vgetns(void)
 	return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-notrace static noinline int do_realtime(struct timespec *ts)
+notrace static noinline void do_realtime(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	do {
@@ -92,10 +92,9 @@ notrace static noinline int do_realtime(struct timespec *ts)
 		ns = vgetns();
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
 	timespec_add_ns(ts, ns);
-	return 0;
 }
 
-notrace static noinline int do_monotonic(struct timespec *ts)
+notrace static noinline void do_monotonic(struct timespec *ts)
 {
 	unsigned long seq, ns, secs;
 	do {
@@ -115,11 +114,9 @@ notrace static noinline int do_monotonic(struct timespec *ts)
 	}
 	ts->tv_sec = secs;
 	ts->tv_nsec = ns;
-
-	return 0;
 }
 
-notrace static noinline int do_realtime_coarse(struct timespec *ts)
+notrace static noinline void do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
@@ -127,10 +124,9 @@ notrace static noinline int do_realtime_coarse(struct timespec *ts)
 		ts->tv_sec = gtod->wall_time_coarse.tv_sec;
 		ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
-	return 0;
 }
 
-notrace static noinline int do_monotonic_coarse(struct timespec *ts)
+notrace static noinline void do_monotonic_coarse(struct timespec *ts)
 {
 	unsigned long seq, ns, secs;
 	do {
@@ -150,25 +146,29 @@ notrace static noinline int do_monotonic_coarse(struct timespec *ts)
 	}
 	ts->tv_sec = secs;
 	ts->tv_nsec = ns;
-
-	return 0;
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 {
 	switch (clock) {
 	case CLOCK_REALTIME:
-		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE))
-			return do_realtime(ts);
+		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
+			do_realtime(ts);
+			return 0;
+		}
 		break;
 	case CLOCK_MONOTONIC:
-		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE))
-			return do_monotonic(ts);
+		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
+			do_monotonic(ts);
+			return 0;
+		}
 		break;
 	case CLOCK_REALTIME_COARSE:
-		return do_realtime_coarse(ts);
+		do_realtime_coarse(ts);
+		return 0;
 	case CLOCK_MONOTONIC_COARSE:
-		return do_monotonic_coarse(ts);
+		do_monotonic_coarse(ts);
+		return 0;
 	}
 
 	return vdso_fallback_gettime(clock, ts);
@@ -176,6 +176,44 @@ notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
 int clock_gettime(clockid_t, struct timespec *)
 	__attribute__((weak, alias("__vdso_clock_gettime")));
 
+notrace int __vdso_clock_gettime_ns(clockid_t clock, struct timens *t)
+{
+	/* This implementation is slow.  It will be improved later. */
+
+	struct timespec ts;
+	int error;
+
+	switch (clock) {
+	case CLOCK_REALTIME:
+		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
+			do_realtime(&ts);
+			goto done;
+		}
+		break;
+	case CLOCK_MONOTONIC:
+		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
+			do_monotonic(&ts);
+			goto done;
+		}
+		break;
+	case CLOCK_REALTIME_COARSE:
+		do_realtime_coarse(&ts);
+		goto done;
+	case CLOCK_MONOTONIC_COARSE:
+		do_monotonic_coarse(&ts);
+		goto done;
+	}
+
+	error = vdso_fallback_gettime(clock, &ts);
+	if (error)
+		return error;
+
+done:
+	t->ns = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+	t->padding = 0;
+	return 0;
+}
+
 notrace int __vdso_gettimeofday(struct timeval *tv, struct timezone *tz)
 {
 	long ret;
diff --git a/arch/x86/vdso/vdso.lds.S b/arch/x86/vdso/vdso.lds.S
index b96b267..238f500 100644
--- a/arch/x86/vdso/vdso.lds.S
+++ b/arch/x86/vdso/vdso.lds.S
@@ -17,6 +17,10 @@
 VERSION {
 	LINUX_2.6 {
 	global:
+		/*
+		 * These are the original vsyscalls.  They have weak symbols
+		 * without the __vdso_ prefix for ABI compatibility.
+		 */
 		clock_gettime;
 		__vdso_clock_gettime;
 		gettimeofday;
@@ -25,6 +29,9 @@ VERSION {
 		__vdso_getcpu;
 		time;
 		__vdso_time;
+
+		/* New vsyscalls are just plain functions. */
+		__vdso_clock_gettime_ns;
 	local: *;
 	};
 }
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] x86-64: Optimize vdso clock_gettime
  2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 2/4] x86-64: Add __vdso_clock_gettime_ns vsyscall Andy Lutomirski
@ 2011-12-25 16:51 ` Andy Lutomirski
  2011-12-25 16:51 ` [PATCH 4/4] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
  2011-12-27  7:46 ` [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Richard Cochran
  4 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-25 16:51 UTC (permalink / raw)
  To: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Richard Cochran,
	Andy Lutomirski

This is a small improvement.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/include/asm/vgtod.h   |   21 +++++++---
 arch/x86/kernel/vsyscall_64.c  |   25 +++++++++++-
 arch/x86/vdso/vclock_gettime.c |   89 ++++++++++++++++++++++-----------------
 3 files changed, 89 insertions(+), 46 deletions(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 815285b..03b4999 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -7,11 +7,6 @@
 struct vsyscall_gtod_data {
 	seqlock_t	lock;
 
-	/* open coded 'struct timespec' */
-	time_t		wall_time_sec;
-	u32		wall_time_nsec;
-
-	struct timezone sys_tz;
 	struct { /* extract of a clocksource struct */
 		int vclock_mode;
 		cycle_t	cycle_last;
@@ -19,8 +14,22 @@ struct vsyscall_gtod_data {
 		u32	mult;
 		u32	shift;
 	} clock;
-	struct timespec wall_to_monotonic;
+
+	/* open coded 'struct timespec' */
+	time_t		wall_time_sec;
+	u32		wall_time_nsec;
+	u32		monotonic_time_nsec;
+	time_t		monotonic_time_sec;
+
+	/* Flat counts for clock_ns_get */
+	u64		wall_time_flat_ns;
+	u64		monotonic_time_flat_ns;
+	u64		wall_time_coarse_flat_ns;
+	u64		monotonic_time_coarse_flat_ns;
+
+	struct timezone sys_tz;
 	struct timespec wall_time_coarse;
+	struct timespec monotonic_time_coarse;
 };
 extern struct vsyscall_gtod_data vsyscall_gtod_data;
 
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index b56c65de..9c2e148 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -91,6 +91,7 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 			struct clocksource *clock, u32 mult)
 {
 	unsigned long flags;
+	struct timespec monotonic;
 
 	write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
 
@@ -100,10 +101,32 @@ void update_vsyscall(struct timespec *wall_time, struct timespec *wtm,
 	vsyscall_gtod_data.clock.mask		= clock->mask;
 	vsyscall_gtod_data.clock.mult		= mult;
 	vsyscall_gtod_data.clock.shift		= clock->shift;
+
 	vsyscall_gtod_data.wall_time_sec	= wall_time->tv_sec;
 	vsyscall_gtod_data.wall_time_nsec	= wall_time->tv_nsec;
-	vsyscall_gtod_data.wall_to_monotonic	= *wtm;
+
+	monotonic = timespec_add(*wall_time, *wtm);
+	vsyscall_gtod_data.monotonic_time_sec	= monotonic.tv_sec;
+	vsyscall_gtod_data.monotonic_time_nsec	= monotonic.tv_nsec;
+
 	vsyscall_gtod_data.wall_time_coarse	= __current_kernel_time();
+	vsyscall_gtod_data.monotonic_time_coarse =
+		timespec_add(vsyscall_gtod_data.wall_time_coarse, *wtm);
+
+	/* generate flat data for clock_ns_get */
+	vsyscall_gtod_data.wall_time_flat_ns =
+		vsyscall_gtod_data.wall_time_sec * NSEC_PER_SEC +
+		vsyscall_gtod_data.wall_time_nsec;
+	vsyscall_gtod_data.monotonic_time_flat_ns =
+		vsyscall_gtod_data.monotonic_time_sec * NSEC_PER_SEC +
+		vsyscall_gtod_data.monotonic_time_nsec;
+
+	vsyscall_gtod_data.wall_time_coarse_flat_ns =
+		vsyscall_gtod_data.wall_time_coarse.tv_sec * NSEC_PER_SEC +
+		vsyscall_gtod_data.wall_time_coarse.tv_nsec;
+	vsyscall_gtod_data.monotonic_time_coarse_flat_ns =
+		vsyscall_gtod_data.monotonic_time_coarse.tv_sec * NSEC_PER_SEC +
+		vsyscall_gtod_data.monotonic_time_coarse.tv_nsec;
 
 	write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
 }
diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index c8a2b46..fe960ac 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -94,26 +94,36 @@ notrace static noinline void do_realtime(struct timespec *ts)
 	timespec_add_ns(ts, ns);
 }
 
+notrace static noinline u64 do_realtime_ns(void)
+{
+	unsigned long seq, ns;
+	do {
+		seq = read_seqbegin(&gtod->lock);
+		ns = gtod->wall_time_flat_ns + vgetns();
+	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	return ns;
+}
+
 notrace static noinline void do_monotonic(struct timespec *ts)
 {
-	unsigned long seq, ns, secs;
+	unsigned long seq, ns;
 	do {
 		seq = read_seqbegin(&gtod->lock);
-		secs = gtod->wall_time_sec;
-		ns = gtod->wall_time_nsec + vgetns();
-		secs += gtod->wall_to_monotonic.tv_sec;
-		ns += gtod->wall_to_monotonic.tv_nsec;
+		ts->tv_sec = gtod->monotonic_time_sec;
+		ts->tv_nsec = gtod->monotonic_time_nsec;
+		ns = vgetns();
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	timespec_add_ns(ts, ns);
+}
 
-	/* wall_time_nsec, vgetns(), and wall_to_monotonic.tv_nsec
-	 * are all guaranteed to be nonnegative.
-	 */
-	while (ns >= NSEC_PER_SEC) {
-		ns -= NSEC_PER_SEC;
-		++secs;
-	}
-	ts->tv_sec = secs;
-	ts->tv_nsec = ns;
+notrace static noinline u64 do_monotonic_ns(void)
+{
+	unsigned long seq, ns;
+	do {
+		seq = read_seqbegin(&gtod->lock);
+		ns = gtod->monotonic_time_flat_ns + vgetns();
+	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+	return ns;
 }
 
 notrace static noinline void do_realtime_coarse(struct timespec *ts)
@@ -126,26 +136,26 @@ notrace static noinline void do_realtime_coarse(struct timespec *ts)
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
 }
 
+notrace static noinline u64 do_realtime_coarse_ns(void)
+{
+	/* This is atomic on x86-64. */
+	return ACCESS_ONCE(gtod->wall_time_coarse_flat_ns);
+}
+
 notrace static noinline void do_monotonic_coarse(struct timespec *ts)
 {
-	unsigned long seq, ns, secs;
+	unsigned long seq;
 	do {
 		seq = read_seqbegin(&gtod->lock);
-		secs = gtod->wall_time_coarse.tv_sec;
-		ns = gtod->wall_time_coarse.tv_nsec;
-		secs += gtod->wall_to_monotonic.tv_sec;
-		ns += gtod->wall_to_monotonic.tv_nsec;
+		ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
+		ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
+}
 
-	/* wall_time_nsec and wall_to_monotonic.tv_nsec are
-	 * guaranteed to be between 0 and NSEC_PER_SEC.
-	 */
-	if (ns >= NSEC_PER_SEC) {
-		ns -= NSEC_PER_SEC;
-		++secs;
-	}
-	ts->tv_sec = secs;
-	ts->tv_nsec = ns;
+notrace static noinline u64 do_monotonic_coarse_ns(void)
+{
+	/* This is atomic on x86-64. */
+	return ACCESS_ONCE(gtod->monotonic_time_coarse_flat_ns);
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
@@ -178,37 +188,38 @@ int clock_gettime(clockid_t, struct timespec *)
 
 notrace int __vdso_clock_gettime_ns(clockid_t clock, struct timens *t)
 {
-	/* This implementation is slow.  It will be improved later. */
-
 	struct timespec ts;
 	int error;
 
 	switch (clock) {
 	case CLOCK_REALTIME:
 		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
-			do_realtime(&ts);
-			goto done;
+			t->ns = do_realtime_ns();
+			t->padding = 0;
+			return 0;
 		}
 		break;
 	case CLOCK_MONOTONIC:
 		if (likely(gtod->clock.vclock_mode != VCLOCK_NONE)) {
-			do_monotonic(&ts);
-			goto done;
+			t->ns = do_monotonic_ns();
+			t->padding = 0;
+			return 0;
 		}
 		break;
 	case CLOCK_REALTIME_COARSE:
-		do_realtime_coarse(&ts);
-		goto done;
+		t->ns = do_realtime_coarse_ns();
+		t->padding = 0;
+		return 0;
 	case CLOCK_MONOTONIC_COARSE:
-		do_monotonic_coarse(&ts);
-		goto done;
+		t->ns = do_monotonic_coarse_ns();
+		t->padding = 0;
+		return 0;
 	}
 
 	error = vdso_fallback_gettime(clock, &ts);
 	if (error)
 		return error;
 
-done:
 	t->ns = ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
 	t->padding = 0;
 	return 0;
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] x86-64: Inline vdso clock_gettime helpers
  2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
                   ` (2 preceding siblings ...)
  2011-12-25 16:51 ` [PATCH 3/4] x86-64: Optimize vdso clock_gettime Andy Lutomirski
@ 2011-12-25 16:51 ` Andy Lutomirski
  2011-12-27  7:46 ` [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Richard Cochran
  4 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2011-12-25 16:51 UTC (permalink / raw)
  To: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Richard Cochran,
	Andy Lutomirski

This is about a 6% speedup on Sandy Bridge.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/vdso/vclock_gettime.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index fe960ac..40eead5 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -82,7 +82,8 @@ notrace static inline long vgetns(void)
 	return (v * gtod->clock.mult) >> gtod->clock.shift;
 }
 
-notrace static noinline void do_realtime(struct timespec *ts)
+/* Code size doesn't matter (vdso is 4k anyway) and this is faster. */
+notrace static void __always_inline do_realtime(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	do {
@@ -94,7 +95,7 @@ notrace static noinline void do_realtime(struct timespec *ts)
 	timespec_add_ns(ts, ns);
 }
 
-notrace static noinline u64 do_realtime_ns(void)
+notrace static u64 do_realtime_ns(void)
 {
 	unsigned long seq, ns;
 	do {
@@ -104,7 +105,7 @@ notrace static noinline u64 do_realtime_ns(void)
 	return ns;
 }
 
-notrace static noinline void do_monotonic(struct timespec *ts)
+notrace static void do_monotonic(struct timespec *ts)
 {
 	unsigned long seq, ns;
 	do {
@@ -116,7 +117,7 @@ notrace static noinline void do_monotonic(struct timespec *ts)
 	timespec_add_ns(ts, ns);
 }
 
-notrace static noinline u64 do_monotonic_ns(void)
+notrace static u64 do_monotonic_ns(void)
 {
 	unsigned long seq, ns;
 	do {
@@ -126,7 +127,7 @@ notrace static noinline u64 do_monotonic_ns(void)
 	return ns;
 }
 
-notrace static noinline void do_realtime_coarse(struct timespec *ts)
+notrace static void do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
@@ -136,13 +137,13 @@ notrace static noinline void do_realtime_coarse(struct timespec *ts)
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
 }
 
-notrace static noinline u64 do_realtime_coarse_ns(void)
+notrace static u64 do_realtime_coarse_ns(void)
 {
 	/* This is atomic on x86-64. */
 	return ACCESS_ONCE(gtod->wall_time_coarse_flat_ns);
 }
 
-notrace static noinline void do_monotonic_coarse(struct timespec *ts)
+notrace static void do_monotonic_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
@@ -152,7 +153,7 @@ notrace static noinline void do_monotonic_coarse(struct timespec *ts)
 	} while (unlikely(read_seqretry(&gtod->lock, seq)));
 }
 
-notrace static noinline u64 do_monotonic_coarse_ns(void)
+notrace static u64 do_monotonic_coarse_ns(void)
 {
 	/* This is atomic on x86-64. */
 	return ACCESS_ONCE(gtod->monotonic_time_coarse_flat_ns);
-- 
1.7.7.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] clock_gettime_ns and x86-64 optimizations
  2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
                   ` (3 preceding siblings ...)
  2011-12-25 16:51 ` [PATCH 4/4] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
@ 2011-12-27  7:46 ` Richard Cochran
  4 siblings, 0 replies; 11+ messages in thread
From: Richard Cochran @ 2011-12-27  7:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Kumar Sundararajan, john stultz, Arun Sharma,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner

On Sun, Dec 25, 2011 at 08:50:59AM -0800, Andy Lutomirski wrote:
> On x86-64, clock_gettime is so fast that the overhead converting to and
> from nanoseconds is non-negligible.  clock_gettime_ns is a different and
> faster interface.

But your data contradict this statement. See below.

> Patch 1 adds the syscall and wires it up on x86-64.  Patch 2 implements
> the corresponding vdso entry on x86-64.  Patch 3 optimizes the vdso
> call, and patch 4 is a trivial change that speeds up the vdso
> clock_gettime and clock_gettime_ns implementations.
> 
> The vdso timings are (on an 800MHz Sandy Bridge mobile):

It would be more informative to describe how you made the measurements
in more detail, for example, pseudo code, number of trials, min, max,
mean, std. dev.

> Basic implementation:
> 
> realtime 77.4ns
> monotonic 79.2ns
> realtime_coarse 18.1ns
> monotonic_coarse 22.0ns

In order to better understand what you are reporting, I arranged your
numbers into a table:

1. Basic implementation
2. Optimized implementation
3. Inlined (patch 4)

|---------------------+-------+-------+-------|
|                     |    1. |    2. |    3. |
|---------------------+-------+-------+-------|
| realtime            | 77.40 | 78.50 | 73.40 |
| realtime_ns         | 84.90 | 77.85 | 73.15 |
|---------------------+-------+-------+-------|
| monotonic           | 79.20 | 77.40 | 72.10 |
| monotonic_ns        | 85.10 | 77.75 | 72.10 |
|---------------------+-------+-------+-------|
| realtime_coarse     | 18.10 | 18.40 | 13.20 |
| realtime_coarse_ns  | 19.49 | 18.20 | 14.10 |
|---------------------+-------+-------+-------|
| monotonic_coarse    | 22.00 | 19.40 | 15.80 |
| monotonic_coarse_ns | 27.32 | 18.20 | 15.60 |
|---------------------+-------+-------+-------|

Looking down column 3, it appears that the _ns calls are no faster
than their plain counterparts.

So, while the inline patch does improve performance, the new _ns
functions do not really seem worth the trouble.

Thanks,
Richard



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-12-29  0:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-25 16:50 [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Andy Lutomirski
2011-12-25 16:51 ` [PATCH 1/4] Add clock_gettime_ns syscall Andy Lutomirski
2011-12-27  7:25   ` Richard Cochran
2011-12-28 19:02   ` Arun Sharma
     [not found]     ` <CALCETrVz1ADNxeLzPmeWXPU5ApfKURH2vnged2A2Vng8-hUxcw@mail.gmail.com>
2011-12-28 22:45       ` Arun Sharma
2011-12-28 23:42         ` Andy Lutomirski
2011-12-29  0:19           ` Arun Sharma
2011-12-25 16:51 ` [PATCH 2/4] x86-64: Add __vdso_clock_gettime_ns vsyscall Andy Lutomirski
2011-12-25 16:51 ` [PATCH 3/4] x86-64: Optimize vdso clock_gettime Andy Lutomirski
2011-12-25 16:51 ` [PATCH 4/4] x86-64: Inline vdso clock_gettime helpers Andy Lutomirski
2011-12-27  7:46 ` [PATCH 0/4] clock_gettime_ns and x86-64 optimizations Richard Cochran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox