* [PATCH] perf: POSIX CLOCK_PERF to report current time value
@ 2013-12-10 20:27 David Ahern
2013-12-10 20:44 ` John Stultz
2013-12-11 11:25 ` Ingo Molnar
0 siblings, 2 replies; 6+ messages in thread
From: David Ahern @ 2013-12-10 20:27 UTC (permalink / raw)
To: linux-kernel
Cc: David Ahern, Pawel Moll, Ingo Molnar, Frederic Weisbecker,
Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
Jiri Olsa, Namhyung Kim, Stephane Eranian, John Stultz, Sonny Rao,
Thomas Gleixner
From: Pawel Moll <pawel.moll@arm.com>
To co-relate user space events with the perf events stream
a current (as in: "what time(stamp) is it now?") time value
must be made available.
This patch adds a POSIX clock returning the perf_clock()
value and accesible from userspace:
#include <time.h>
struct timespec ts;
clock_gettime(CLOCK_PERF, &ts);
Updated to 3.13
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
include/uapi/linux/time.h | 1 +
kernel/events/core.c | 20 ++++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index e75e1b6ff27f..b07f07914a13 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -56,6 +56,7 @@ struct itimerval {
#define CLOCK_BOOTTIME_ALARM 9
#define CLOCK_SGI_CYCLE 10 /* Hardware specific */
#define CLOCK_TAI 11
+#define CLOCK_PERF 12
#define MAX_CLOCKS 16
#define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 72348dc192c1..87aa36ac68f7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
#include <linux/hw_breakpoint.h>
#include <linux/mm_types.h>
#include <linux/cgroup.h>
+#include <linux/posix-timers.h>
#include "internal.h"
@@ -295,6 +296,19 @@ static inline u64 perf_clock(void)
return local_clock();
}
+static int perf_posix_clock_getres(const clockid_t which_clock,
+ struct timespec *tp)
+{
+ *tp = ns_to_timespec(TICK_NSEC);
+ return 0;
+}
+
+static int perf_posix_clock_get(clockid_t which_clock, struct timespec *tp)
+{
+ *tp = ns_to_timespec(perf_clock());
+ return 0;
+}
+
static inline struct perf_cpu_context *
__get_cpu_context(struct perf_event_context *ctx)
{
@@ -7904,6 +7918,10 @@ perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
void __init perf_event_init(void)
{
+ struct k_clock perf_posix_clock = {
+ .clock_getres = perf_posix_clock_getres,
+ .clock_get = perf_posix_clock_get,
+ };
int ret;
idr_init(&pmu_idr);
@@ -7920,6 +7938,8 @@ void __init perf_event_init(void)
ret = init_hw_breakpoint();
WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
+ posix_timers_register_clock(CLOCK_PERF, &perf_posix_clock);
+
/* do not patch jump label more than once per second */
jump_label_rate_limit(&perf_sched_events, HZ);
--
1.8.3.4 (Apple Git-47)
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
@ 2013-12-10 20:44 ` John Stultz
2013-12-11 12:07 ` Ingo Molnar
2013-12-11 11:25 ` Ingo Molnar
1 sibling, 1 reply; 6+ messages in thread
From: John Stultz @ 2013-12-10 20:44 UTC (permalink / raw)
To: David Ahern, linux-kernel
Cc: Pawel Moll, Ingo Molnar, Frederic Weisbecker, Peter Zijlstra,
Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
Stephane Eranian, Sonny Rao, Thomas Gleixner
On 12/10/2013 12:27 PM, David Ahern wrote:
> From: Pawel Moll <pawel.moll@arm.com>
>
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.
>
> This patch adds a POSIX clock returning the perf_clock()
> value and accesible from userspace:
>
> #include <time.h>
>
> struct timespec ts;
>
> clock_gettime(CLOCK_PERF, &ts);
As stated earlier on a few occasions, I really don't like exposing the
perf clock directly to userland like this, because the perf clock is not
a well defined time domain. Instead its basically defined as "whatever
sched_clock does". Over the last few years, the behavior of sched_clock
has changed, so I'm not confident it won't change in the future.
I do understand the desire, but to export a poorly defined clockid to
userland has ABI stability concerns for me. Particularly when the
performance characteristics will make it prone to abuse.
I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps, since
that clockid is well defined. Even if the underlying clocksource is
different, I suspect the perf clock could interpolate fairly closely to
CLOCK_MONOTONIC_RAW, as it already does similar interpolation on top of
CLOCK_MONOTONIC in some cases.
Other then this, I've said I'd not object other approaches, such as the
ioctl method or the dynamic clockid. These solutions also have the same
ABI stability concern, but the responsibility for that can be kept
outside the timekeeping code, so the perf folks can handle the potential
ABI stability downside. Additionally the few extra steps required to get
to the time stamping method make it less likely to be abused.
Again, I really do understand the need, and I don't mean to be a pain
here, but as a maintainer I have to push back here.
thanks
-john
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
2013-12-10 20:44 ` John Stultz
@ 2013-12-11 12:07 ` Ingo Molnar
2013-12-11 19:37 ` John Stultz
0 siblings, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 12:07 UTC (permalink / raw)
To: John Stultz
Cc: David Ahern, linux-kernel, Pawel Moll, Frederic Weisbecker,
Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
Jiri Olsa, Namhyung Kim, Stephane Eranian, Sonny Rao,
Thomas Gleixner
* John Stultz <john.stultz@linaro.org> wrote:
> [...]
>
> I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps,
> since that clockid is well defined. [...]
So the problem with that clock is that it does the following for every
timestamp:
cycle_now = clock->read(clock);
... which is impossibly slow if something like the HPET is used, which
is rather common - so this is a non-starter to timestamp perf events
with. We use the scheduler clock as a reasonable compromise between
scalability and clock globality.
I can see two solutions:
1)
One approach is what I described in my other reply a few minutes ago:
track the flow of GTOD, timestamped with the fast perf timestamps, so
that GTOD can be correlated to the perf clock, if user-space so
wishes. The correlation is simple so this gets close to the ease of
use of being able to timestamp GTOD directly.
(That would be useful for other purposes as well, such as
instrumenting GTOD updates.)
2)
An alternate, rather interesting approach would be to change the
scheduler clock offset to be influenced by the above events, so that
it quasi-approximates GTOD and emits natural time of day timestamps.
This already happens partially in the sched-clock slow path,
kernel/sched/clock.c's sched_clock_local(), it uses scd->tick_gtod
timestamps to correlate to the monotonic clock. This could be changed
over to use not get_ktime() but getnstimeofday(), to get true TOD
timestamps.
The trickier bit is the x86 fast-path, in arch/x86/kernel/tsc.c's
native_sched_clock(). That relies on __cycles_2_ns() to transform a
CPU cycles timestamp into (boot time offset) nanoseconds. For that it
uses the cyc2ns_offset percpu variable. That variable could be updated
periodically so that it's TOD offset.
My (strong!) preference would be #2, for the simple reason that it
would make perf timestamps instantly usable and tooling wouldn't have
to do anything to get true timestamps. We could add a new
PERF_SAMPLE_TIME_OF_DAY feature bit so that user-space can consciously
request GTOD timestamps. This feature bit could even be arch
influenced, so that architectures could convert their perf clocks at
the pace they desire - which tooling can detect and handle safely.
Thoughts?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
2013-12-11 12:07 ` Ingo Molnar
@ 2013-12-11 19:37 ` John Stultz
0 siblings, 0 replies; 6+ messages in thread
From: John Stultz @ 2013-12-11 19:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Ahern, linux-kernel, Pawel Moll, Frederic Weisbecker,
Peter Zijlstra, Arnaldo Carvalho de Melo, Mike Galbraith,
Jiri Olsa, Namhyung Kim, Stephane Eranian, Sonny Rao,
Thomas Gleixner
On 12/11/2013 04:07 AM, Ingo Molnar wrote:
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> [...]
>>
>> I'd much rather see perf export CLOCK_MONOTONIC_RAW timestamps,
>> since that clockid is well defined. [...]
> So the problem with that clock is that it does the following for every
> timestamp:
>
> cycle_now = clock->read(clock);
>
> ... which is impossibly slow if something like the HPET is used, which
> is rather common - so this is a non-starter to timestamp perf events
> with. We use the scheduler clock as a reasonable compromise between
> scalability and clock globality.
>
> I can see two solutions:
>
> 1)
>
> One approach is what I described in my other reply a few minutes ago:
> track the flow of GTOD, timestamped with the fast perf timestamps, so
> that GTOD can be correlated to the perf clock, if user-space so
> wishes. The correlation is simple so this gets close to the ease of
> use of being able to timestamp GTOD directly.
>
> (That would be useful for other purposes as well, such as
> instrumenting GTOD updates.)
>
> 2)
>
> An alternate, rather interesting approach would be to change the
> scheduler clock offset to be influenced by the above events, so that
> it quasi-approximates GTOD and emits natural time of day timestamps.
>
> This already happens partially in the sched-clock slow path,
> kernel/sched/clock.c's sched_clock_local(), it uses scd->tick_gtod
> timestamps to correlate to the monotonic clock. This could be changed
> over to use not get_ktime() but getnstimeofday(), to get true TOD
> timestamps.
>
> The trickier bit is the x86 fast-path, in arch/x86/kernel/tsc.c's
> native_sched_clock(). That relies on __cycles_2_ns() to transform a
> CPU cycles timestamp into (boot time offset) nanoseconds. For that it
> uses the cyc2ns_offset percpu variable. That variable could be updated
> periodically so that it's TOD offset.
>
> My (strong!) preference would be #2, for the simple reason that it
> would make perf timestamps instantly usable and tooling wouldn't have
> to do anything to get true timestamps.
Right. #2 is basically what I was (probably poorly) trying to describe
as my ideal solution, making perf export what is essentially
CLOCK_MONOTONIC_RAW time (using CLOCK_MONOTONIC_RAW is more likely to be
easier to match then CLOCK_REALTIME/getnstimeofday(), since you don't
have to deal with anyone setting the clock, or frequency adjustments
from NTP).
thanks
-john
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
2013-12-10 20:44 ` John Stultz
@ 2013-12-11 11:25 ` Ingo Molnar
2013-12-11 11:40 ` Ingo Molnar
1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 11:25 UTC (permalink / raw)
To: David Ahern
Cc: linux-kernel, Pawel Moll, Frederic Weisbecker, Peter Zijlstra,
Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
Stephane Eranian, John Stultz, Sonny Rao, Thomas Gleixner
* David Ahern <dsahern@gmail.com> wrote:
> From: Pawel Moll <pawel.moll@arm.com>
>
> To co-relate user space events with the perf events stream
> a current (as in: "what time(stamp) is it now?") time value
> must be made available.
So I'm wondering about your earlier approach posted here:
https://lkml.org/lkml/2011/6/7/636
I'd modify that patch the following way: instead of tracking each
separate reason, perhaps only track timekeeping_update().
Such a tracepoint, combined with PERF_SAMPLE_TIME, would very
accurately track external changes to GTOD (xtime).
That leaves us with tracking/correlating the regular flow of time,
which could be achieved by another tracepoint in:
kernel/time/timekeeping.c::do_timer()
So only two new tracepoints are needed AFAICS - and the tracepoints
would obviously be useful for other (debugging) purposes as well.
Would that solve the wall-clock correlation problem adequately?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] perf: POSIX CLOCK_PERF to report current time value
2013-12-11 11:25 ` Ingo Molnar
@ 2013-12-11 11:40 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2013-12-11 11:40 UTC (permalink / raw)
To: David Ahern
Cc: linux-kernel, Pawel Moll, Frederic Weisbecker, Peter Zijlstra,
Arnaldo Carvalho de Melo, Mike Galbraith, Jiri Olsa, Namhyung Kim,
Stephane Eranian, John Stultz, Sonny Rao, Thomas Gleixner
* Ingo Molnar <mingo@kernel.org> wrote:
>
> * David Ahern <dsahern@gmail.com> wrote:
>
> > From: Pawel Moll <pawel.moll@arm.com>
> >
> > To co-relate user space events with the perf events stream
> > a current (as in: "what time(stamp) is it now?") time value
> > must be made available.
>
> So I'm wondering about your earlier approach posted here:
>
> https://lkml.org/lkml/2011/6/7/636
>
> I'd modify that patch the following way: instead of tracking each
> separate reason, perhaps only track timekeeping_update().
>
> Such a tracepoint, combined with PERF_SAMPLE_TIME, would very
> accurately track external changes to GTOD (xtime).
>
> That leaves us with tracking/correlating the regular flow of time,
> which could be achieved by another tracepoint in:
>
> kernel/time/timekeeping.c::do_timer()
>
> So only two new tracepoints are needed AFAICS - and the tracepoints
> would obviously be useful for other (debugging) purposes as well.
> Would that solve the wall-clock correlation problem adequately?
I forgot to mention what data the new tracepoints would trace:
- timekeeping_update() would trace the 'before' and 'after'
(==current) GTOD values
- do_timer() would trace 'ticks' and the current GTOD value.
'GTOD value' is what gettimeofday() would return, which in the
do_timer() case ought to be something like this:
tk->xtime_sec
(tk->xtime_nsec >> tk->shift) + get_arch_timeoffset()
(Note that this is simple and fast as there's no need to read the
clock once again, it was just read.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-11 19:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-10 20:27 [PATCH] perf: POSIX CLOCK_PERF to report current time value David Ahern
2013-12-10 20:44 ` John Stultz
2013-12-11 12:07 ` Ingo Molnar
2013-12-11 19:37 ` John Stultz
2013-12-11 11:25 ` Ingo Molnar
2013-12-11 11:40 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).