linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf: POSIX CLOCK_PERF to report current time value
@ 2013-12-10 20:27 David Ahern
  2013-12-10 20:44 ` John Stultz
  2013-12-11 11:25 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: David Ahern @ 2013-12-10 20:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Ahern, Pawel Moll, Ingo Molnar, Frederic Weisbecker,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, John Stultz, Sonny Rao,
	Thomas Gleixner

From: Pawel Moll <pawel.moll@arm.com>

To co-relate user space events with the perf events stream
a current (as in: "what time(stamp) is it now?") time value
must be made available.

This patch adds a POSIX clock returning the perf_clock()
value and accesible from userspace:

    #include <time.h>

    struct timespec ts;

    clock_gettime(CLOCK_PERF, &ts);

Updated to 3.13

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/uapi/linux/time.h |  1 +
 kernel/events/core.c      | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index e75e1b6ff27f..b07f07914a13 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -56,6 +56,7 @@ struct itimerval {
 #define CLOCK_BOOTTIME_ALARM		9
 #define CLOCK_SGI_CYCLE			10	/* Hardware specific */
 #define CLOCK_TAI			11
+#define CLOCK_PERF            12
 
 #define MAX_CLOCKS			16
 #define CLOCKS_MASK			(CLOCK_REALTIME | CLOCK_MONOTONIC)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 72348dc192c1..87aa36ac68f7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
 #include <linux/hw_breakpoint.h>
 #include <linux/mm_types.h>
 #include <linux/cgroup.h>
+#include <linux/posix-timers.h>
 
 #include "internal.h"
 
@@ -295,6 +296,19 @@ static inline u64 perf_clock(void)
 	return local_clock();
 }
 
+static int perf_posix_clock_getres(const clockid_t which_clock,
+		struct timespec *tp)
+{
+	*tp = ns_to_timespec(TICK_NSEC);
+	return 0;
+}
+
+static int perf_posix_clock_get(clockid_t which_clock, struct timespec *tp)
+{
+	*tp = ns_to_timespec(perf_clock());
+	return 0;
+}
+
 static inline struct perf_cpu_context *
 __get_cpu_context(struct perf_event_context *ctx)
 {
@@ -7904,6 +7918,10 @@ perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 
 void __init perf_event_init(void)
 {
+	struct k_clock perf_posix_clock = {
+		.clock_getres = perf_posix_clock_getres,
+		.clock_get = perf_posix_clock_get,
+	};
 	int ret;
 
 	idr_init(&pmu_idr);
@@ -7920,6 +7938,8 @@ void __init perf_event_init(void)
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
 
+	posix_timers_register_clock(CLOCK_PERF, &perf_posix_clock);
+
 	/* do not patch jump label more than once per second */
 	jump_label_rate_limit(&perf_sched_events, HZ);
 
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
  2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
@ 2013-12-10 20:44 ` John Stultz
  2013-12-11 12:07   ` Ingo Molnar
  2013-12-11 11:25 ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: John Stultz @ 2013-12-10 20:44 UTC (permalink / raw)
  To: David Ahern, linux-kernel
  Cc: Pawel Moll, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, Sonny Rao, Thomas Gleixner

On 12/10/2013 12:27 PM, David Ahern wrote:
> From: Pawel Moll <pawel.moll@arm.com>
>
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.
>
> This patch adds a POSIX clock returning the perf_clock()
> value and accesible from userspace:
>
>     #include <time.h>
>
>     struct timespec ts;
>
>     clock_gettime(CLOCK_PERF, &ts);

As stated earlier on a few occasions, I really don't like exposing the
perf clock directly to userland like this, because the perf clock is not
a well defined time domain. Instead its basically defined as "whatever
sched_clock does". Over the last few years, the behavior of sched_clock
has changed, so I'm not confident it won't change in the future.

I do understand the desire, but to export a poorly defined clockid to
userland has ABI stability concerns for me.  Particularly when the
performance characteristics will make it prone to abuse.

I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps, since
that clockid is well defined. Even if the underlying clocksource is
different, I suspect the perf clock could interpolate fairly closely to
CLOCK_MONOTONIC_RAW, as it already does similar interpolation on top of
CLOCK_MONOTONIC in some cases.

Other then this, I've said I'd not object other approaches, such as the
ioctl method or the dynamic clockid. These solutions also have the same
ABI stability concern, but the responsibility for that can be kept
outside the timekeeping code, so the perf folks can handle the potential
ABI stability downside. Additionally the few extra steps required to get
to the time stamping method make it less likely to be abused.

Again, I really do understand the need, and I don't mean to be a pain
here, but as a maintainer I have to push back here.

thanks
-john


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
  2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
  2013-12-10 20:44 ` John Stultz
@ 2013-12-11 11:25 ` Ingo Molnar
  2013-12-11 11:40   ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 11:25 UTC (permalink / raw)
  To: David Ahern
  Cc: linux-kernel, Pawel Moll, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, John Stultz, Sonny Rao, Thomas Gleixner


* David Ahern <dsahern@gmail.com> wrote:

> From: Pawel Moll <pawel.moll@arm.com>
> 
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.

So I'm wondering about your earlier approach posted here:

  https://lkml.org/lkml/2011/6/7/636

I'd modify that patch the following way: instead of tracking each 
separate reason, perhaps only track timekeeping_update().

Such a tracepoint, combined with PERF_SAMPLE_TIME, would very 
accurately track external changes to GTOD (xtime).

That leaves us with tracking/correlating the regular flow of time, 
which could be achieved by another tracepoint in:

   kernel/time/timekeeping.c::do_timer()

So only two new tracepoints are needed AFAICS - and the tracepoints 
would obviously be useful for other (debugging) purposes as well. 
Would that solve the wall-clock correlation problem adequately?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
  2013-12-11 11:25 ` Ingo Molnar
@ 2013-12-11 11:40   ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 11:40 UTC (permalink / raw)
  To: David Ahern
  Cc: linux-kernel, Pawel Moll, Frederic Weisbecker, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
	Stephane Eranian, John Stultz, Sonny Rao, Thomas Gleixner


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * David Ahern <dsahern@gmail.com> wrote:
> 
> > From: Pawel Moll <pawel.moll@arm.com>
> > 
> > To co-relate user space events with the perf events stream
> > a current (as in: "what time(stamp) is it now?") time value
> > must be made available.
> 
> So I'm wondering about your earlier approach posted here:
> 
>   https://lkml.org/lkml/2011/6/7/636
> 
> I'd modify that patch the following way: instead of tracking each 
> separate reason, perhaps only track timekeeping_update().
> 
> Such a tracepoint, combined with PERF_SAMPLE_TIME, would very 
> accurately track external changes to GTOD (xtime).
> 
> That leaves us with tracking/correlating the regular flow of time, 
> which could be achieved by another tracepoint in:
> 
>    kernel/time/timekeeping.c::do_timer()
> 
> So only two new tracepoints are needed AFAICS - and the tracepoints 
> would obviously be useful for other (debugging) purposes as well. 
> Would that solve the wall-clock correlation problem adequately?

I forgot to mention what data the new tracepoints would trace:

   - timekeeping_update() would trace the 'before' and 'after' 
     (==current) GTOD values

   - do_timer() would trace 'ticks' and the current GTOD value.

'GTOD value' is what gettimeofday() would return, which in the 
do_timer() case ought to be something like this:

   tk->xtime_sec
  (tk->xtime_nsec >> tk->shift) + get_arch_timeoffset()

(Note that this is simple and fast as there's no need to read the 
clock once again, it was just read.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
  2013-12-10 20:44 ` John Stultz
@ 2013-12-11 12:07   ` Ingo Molnar
  2013-12-11 19:37     ` John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 12:07 UTC (permalink / raw)
  To: John Stultz
  Cc: David Ahern, linux-kernel, Pawel Moll, Frederic Weisbecker,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Sonny Rao,
	Thomas Gleixner


* John Stultz <john.stultz@linaro.org> wrote:

> [...]
> 
> I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps, 
> since that clockid is well defined. [...]

So the problem with that clock is that it does the following for every 
timestamp:

        cycle_now = clock->read(clock);

... which is impossibly slow if something like the HPET is used, which 
is rather common - so this is a non-starter to timestamp perf events 
with. We use the scheduler clock as a reasonable compromise between 
scalability and clock globality.

I can see two solutions:

1)

One approach is what I described in my other reply a few minutes ago: 
track the flow of GTOD, timestamped with the fast perf timestamps, so 
that GTOD can be correlated to the perf clock, if user-space so 
wishes. The correlation is simple so this gets close to the ease of 
use of being able to timestamp GTOD directly.

(That would be useful for other purposes as well, such as 
instrumenting GTOD updates.)

2)

An alternate, rather interesting approach would be to change the 
scheduler clock offset to be influenced by the above events, so that 
it quasi-approximates GTOD and emits natural time of day timestamps.

This already happens partially in the sched-clock slow path, 
kernel/sched/clock.c's sched_clock_local(), it uses scd->tick_gtod 
timestamps to correlate to the monotonic clock. This could be changed 
over to use not get_ktime() but getnstimeofday(), to get true TOD 
timestamps.

The trickier bit is the x86 fast-path, in arch/x86/kernel/tsc.c's 
native_sched_clock(). That relies on __cycles_2_ns() to transform a 
CPU cycles timestamp into (boot time offset) nanoseconds. For that it 
uses the cyc2ns_offset percpu variable. That variable could be updated 
periodically so that it's TOD offset.

My (strong!) preference would be #2, for the simple reason that it 
would make perf timestamps instantly usable and tooling wouldn't have 
to do anything to get true timestamps. We could add a new 
PERF_SAMPLE_TIME_OF_DAY feature bit so that user-space can consciously 
request GTOD timestamps. This feature bit could even be arch 
influenced, so that architectures could convert their perf clocks at 
the pace they desire - which tooling can detect and handle safely.

Thoughts?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
  2013-12-11 12:07   ` Ingo Molnar
@ 2013-12-11 19:37     ` John Stultz
  0 siblings, 0 replies; 6+ messages in thread
From: John Stultz @ 2013-12-11 19:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, linux-kernel, Pawel Moll, Frederic Weisbecker,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
	Jiri Olsa, Namhyung Kim, Stephane Eranian, Sonny Rao,
	Thomas Gleixner

On 12/11/2013 04:07 AM, Ingo Molnar wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> [...]
>>
>> I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps, 
>> since that clockid is well defined. [...]
> So the problem with that clock is that it does the following for every 
> timestamp:
>
>         cycle_now = clock->read(clock);
>
> ... which is impossibly slow if something like the HPET is used, which 
> is rather common - so this is a non-starter to timestamp perf events 
> with. We use the scheduler clock as a reasonable compromise between 
> scalability and clock globality.
>
> I can see two solutions:
>
> 1)
>
> One approach is what I described in my other reply a few minutes ago: 
> track the flow of GTOD, timestamped with the fast perf timestamps, so 
> that GTOD can be correlated to the perf clock, if user-space so 
> wishes. The correlation is simple so this gets close to the ease of 
> use of being able to timestamp GTOD directly.
>
> (That would be useful for other purposes as well, such as 
> instrumenting GTOD updates.)
>
> 2)
>
> An alternate, rather interesting approach would be to change the 
> scheduler clock offset to be influenced by the above events, so that 
> it quasi-approximates GTOD and emits natural time of day timestamps.
>
> This already happens partially in the sched-clock slow path, 
> kernel/sched/clock.c's sched_clock_local(), it uses scd->tick_gtod 
> timestamps to correlate to the monotonic clock. This could be changed 
> over to use not get_ktime() but getnstimeofday(), to get true TOD 
> timestamps.
>
> The trickier bit is the x86 fast-path, in arch/x86/kernel/tsc.c's 
> native_sched_clock(). That relies on __cycles_2_ns() to transform a 
> CPU cycles timestamp into (boot time offset) nanoseconds. For that it 
> uses the cyc2ns_offset percpu variable. That variable could be updated 
> periodically so that it's TOD offset.
>
> My (strong!) preference would be #2, for the simple reason that it 
> would make perf timestamps instantly usable and tooling wouldn't have 
> to do anything to get true timestamps.

Right. #2 is basically what I was (probably poorly) trying to describe
as my ideal solution, making perf export what is essentially
CLOCK_MONOTONIC_RAW time (using CLOCK_MONOTONIC_RAW is more likely to be
easier to match then CLOCK_REALTIME/getnstimeofday(), since you don't
have to deal with anyone setting the clock, or frequency adjustments
from NTP).

thanks
-john


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-11 19:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
2013-12-10 20:44 ` John Stultz
2013-12-11 12:07   ` Ingo Molnar
2013-12-11 19:37     ` John Stultz
2013-12-11 11:25 ` Ingo Molnar
2013-12-11 11:40   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).