Re: [PATCH 5/5] [WIP] trace-cmd: Add new subcomand "trace-cmd perf"

linux-trace-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Linux Trace Devel <linux-trace-devel@vger.kernel.org>
Subject: Re: [PATCH 5/5] [WIP] trace-cmd: Add new subcomand "trace-cmd perf"
Date: Fri, 19 Feb 2021 09:16:26 +0200	[thread overview]
Message-ID: <CAPpZLN6nPbJoDBHe0Fm8+BECfXHdpNjQFt9gK9mHGAQriiGjEw@mail.gmail.com> (raw)
In-Reply-To: <20210218210352.61470b93@oasis.local.home>

Hi Steven,

On Fri, Feb 19, 2021 at 4:03 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu,  3 Dec 2020 08:02:26 +0200
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > +static int perf_mmap(struct perf_cpu *perf)
> > +{
> > +     mmap_mask = NUM_PAGES * getpagesize() - 1;
> > +
> > +     /* associate a buffer with the file */
> > +     perf->mpage = mmap(NULL, (NUM_PAGES + 1) * getpagesize(),
> > +                     PROT_READ | PROT_WRITE, MAP_SHARED, perf->perf_fd, 0);
> > +     if (perf->mpage == MAP_FAILED)
> > +             return -1;
> > +     return 0;
> > +}
>
> BTW, I found that the above holds the conversions we need for the local
> clock!
>
>         printf("time_shift=%d\n", perf->mpage->time_shift);
>         printf("time_mult=%d\n", perf->mpage->time_mult);
>         printf("time_offset=%lld\n", perf->mpage->time_offset);
>
> Which gives me:
>
> time_shift=31
> time_mult=633046315
> time_offset=-115773323084683
>
> [ one for each CPU ]

This will give us time shift/mult/offset for each host CPU, right ? Is
the local trace clock
different for each CPU ? Currently, the time offset is calculated per
VCPU, assuming
that the host CPU on which this VCPU runs has no impact on the
timestamp synchronization.
If the local clock depends on the CPU, then we should calculate the
time offset of each guest
event individually, depending on host CPU and VCPU the event happened
- as the host task which runs
the VCPU can migrate between CPUs at any time. So, we need to:
  1. Add timesync information for each host CPU in the trace.dat file.
  2. Track the migration between CPUs of each task that runs VCPU and
save that information
    in the trace.dat file.
  2. When calculating the new timestamp of each guest event
(individually) - somehow find out on
     which host CPU that guest event happened ?

Points 1 and 2 are doable, but will break the current trace.dat file
option that holds the timesync information.
Point 3 is not clear to me, how we can get such information before the
host and guest events are synchronised ?

>
> The ftrace local clock is defined by:
>
> u64 notrace trace_clock_local(void)
> {
>         u64 clock;
>         preempt_disable_notrace();
>         clock = sched_clock();
>         preempt_enable_notrace();
>         return clock;
> }
>
> Where
>
> u64 sched_clock(void)
> {
>         if (static_branch_likely(&__use_tsc)) { // true
>                 u64 tsc_now = rdtsc();
>
>                 /* return the value in ns */
>                 return cycles_2_ns(tsc_now);
>         }
>
> and
>
> static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
> {
>         struct cyc2ns_data data;
>         unsigned long long ns;
>
>         cyc2ns_read_begin(&data); // <- this is where the data comes from
>
>         ns = data.cyc2ns_offset;
>         ns += mul_u64_u32_shr(cyc, data.cyc2ns_mul, data.cyc2ns_shift);
>
>         cyc2ns_read_end();
>
>         return ns;
> }
>
> __always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
> {
>         int seq, idx;
>
>         preempt_disable_notrace();
>
>         do {
>                 seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
>                 idx = seq & 1;
>
>                 data->cyc2ns_offset = this_cpu_read(cyc2ns.data[idx].cyc2ns_offset);
>                 data->cyc2ns_mul    = this_cpu_read(cyc2ns.data[idx].cyc2ns_mul);
>                 data->cyc2ns_shift  = this_cpu_read(cyc2ns.data[idx].cyc2ns_shift);
>
>         } while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
> }
>
> The offset, multiplier and shift are from the cyc2ns.data[idx] (per
> cpu) is what determines the conversion from x86 cycles to nanoseconds.
>
> Does user space have access to that? Yes! Via perf!
>
> void arch_perf_update_userpage(struct perf_event *event,
>                                struct perf_event_mmap_page *userpg, u64 now)
> {
> [..]
>         cyc2ns_read_begin(&data);
>
>         offset = data.cyc2ns_offset + __sched_clock_offset;
>
>         /*
>          * Internal timekeeping for enabled/running/stopped times
>          * is always in the local_clock domain.
>          */
>         userpg->cap_user_time = 1;
>         userpg->time_mult = data.cyc2ns_mul;
>         userpg->time_shift = data.cyc2ns_shift;
>         userpg->time_offset = offset - now;
>
> Those above values are the ones I printed at the beginning of this
> email.
>
> Hence, we can use x86-tsc as the clock for both the host and guest, and
> then using perf find out how to convert that to what the 'local' clock
> would produce. At least the multiplier and the shfit.
>
> -- Steve



-- 
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center

next prev parent reply	other threads:[~2021-02-19  7:17 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03  6:02 [PATCH 0/5] Initial trace-cmd perf support Tzvetomir Stoyanov (VMware)
2020-12-03  6:02 ` [PATCH 1/5] trace-cmd: Internal refactoring of pid address map logic Tzvetomir Stoyanov (VMware)
2020-12-03  6:02 ` [PATCH 2/5] trace-cmd: Make read_file_string() non static Tzvetomir Stoyanov (VMware)
2020-12-03  6:02 ` [PATCH 3/5] trace-cmd: New internal APIs for reading ELF header Tzvetomir Stoyanov (VMware)
2020-12-03  6:02 ` [PATCH 4/5] trace-cmd: Add a new option in trace.dat file for the address to function name mapping Tzvetomir Stoyanov (VMware)
2020-12-03  6:02 ` [PATCH 5/5] [WIP] trace-cmd: Add new subcomand "trace-cmd perf" Tzvetomir Stoyanov (VMware)
2021-02-19  2:03   ` Steven Rostedt
2021-02-19  3:53     ` Steven Rostedt
2021-02-19 17:51       ` Dario Faggioli
2021-02-19  7:16     ` Tzvetomir Stoyanov [this message]
2021-02-19 14:36       ` Steven Rostedt
2021-02-19 17:56         ` Tzvetomir Stoyanov
2021-02-19 19:11           ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPpZLN6nPbJoDBHe0Fm8+BECfXHdpNjQFt9gK9mHGAQriiGjEw@mail.gmail.com \
    --to=tz.stoyanov@gmail.com \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).