From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757618Ab3BEW2v (ORCPT ); Tue, 5 Feb 2013 17:28:51 -0500 Received: from mail-ia0-f177.google.com ([209.85.210.177]:58968 "EHLO mail-ia0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757324Ab3BEW2o (ORCPT ); Tue, 5 Feb 2013 17:28:44 -0500 Message-ID: <51118797.9080800@linaro.org> Date: Tue, 05 Feb 2013 14:28:39 -0800 From: John Stultz User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 MIME-Version: 1.0 To: Stephane Eranian CC: Pawel Moll , Peter Zijlstra , LKML , "mingo@elte.hu" , Paul Mackerras , Anton Blanchard , Will Deacon , "ak@linux.intel.com" , Pekka Enberg , Steven Rostedt , Robert Richter , tglx Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples References: <1350408232.2336.42.camel@laptop> <1359728280.8360.15.camel@hornet> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/05/2013 02:13 PM, Stephane Eranian wrote: > On Fri, Feb 1, 2013 at 3:18 PM, Pawel Moll wrote: >> Hello, >> >> I'd like to revive the topic... >> >> On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote: >>> On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote: >>>> Hi, >>>> >>>> There are many situations where we want to correlate events happening at >>>> the user level with samples recorded in the perf_event kernel sampling buffer. >>>> For instance, we might want to correlate the call to a function or creation of >>>> a file with samples. Similarly, when we want to monitor a JVM with jitted code, >>>> we need to be able to correlate jitted code mappings with perf event samples >>>> for symbolization. >>>> >>>> Perf_events allows timestamping of samples with PERF_SAMPLE_TIME. >>>> That causes each PERF_RECORD_SAMPLE to include a timestamp >>>> generated by calling the local_clock() -> sched_clock_cpu() function. >>>> >>>> To make correlating user vs. kernel samples easy, we would need to >>>> access that sched_clock() functionality. However, none of the existing >>>> clock calls permit this at this point. They all return timestamps which are >>>> not using the same source and/or offset as sched_clock. >>>> >>>> I believe a similar issue exists with the ftrace subsystem. >>>> >>>> The problem needs to be adressed in a portable manner. Solutions >>>> based on reading TSC for the user level to reconstruct sched_clock() >>>> don't seem appropriate to me. >>>> >>>> One possibility to address this limitation would be to extend clock_gettime() >>>> with a new clock time, e.g., CLOCK_PERF. >>>> >>>> However, I understand that sched_clock_cpu() provides ordering guarantees only >>>> when invoked on the same CPU repeatedly, i.e., it's not globally synchronized. >>>> But we already have to deal with this problem when merging samples obtained >>>> from different CPU sampling buffer in per-thread mode. So this is not >>>> necessarily >>>> a showstopper. >>>> >>>> Alternatives could be to use uprobes but that's less practical to setup. >>>> >>>> Anyone with better ideas? >>> You forgot to CC the time people ;-) >>> >>> I've no problem with adding CLOCK_PERF (or another/better name). >>> >>> Thomas, John? >> I've just faced the same issue - correlating an event in userspace with >> data from the perf stream, but to my mind what I want to get is a value >> returned by perf_clock() _in the current "session" context_. >> >> Stephane didn't like the idea of opening a "fake" perf descriptor in >> order to get the timestamp, but surely one must have the "session" >> already running to be interested in such data in the first place? So I >> think the ioctl() idea is not out of place here... How about the simple >> change below? >> > The app requesting the timestamp may not necessarily have an active > perf session. And by that I mean, it may not be self-monitoring. But it > could be monitored by an external tool such as perf, without necessary > knowing it. > > The timestamp is global or at least per-cpu. It is not tied to a particular > active event. > > The thing I did not like about ioctl() is that it now means that the app > needs to become a user of the perf_event API. It needs to program > a dummy event just to get a timestamp. As opposed to just calling > a clock_gettime(CLOCK_PERF) function which guarantees a clock > source identical to that used by perf_events. In that case, the app > timestamps its events in such a way that if it was monitored externally, > that external tool would be able to correlate all the samples because they > would all have the same time source. > > But if people are strongly opposed to the clock_gettime() approach, then > I can go with the ioctl() because the functionality is definitively needed > ASAP. I prefer the ioctl method, since its less likely to be re-purposed/misused. Though I'd be most comfortable with finding some way for perf-timestamps to be CLOCK_MONOTONIC based (or maybe CLOCK_MONOTONIC_RAW if it would be easier), and just avoid all together adding another time domain that doesn't really have clear definition (other then "what perf uses"). thanks -john