From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>, bpf <bpf@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
Nitesh Narayan Lal <nitesh@redhat.com>,
Nicolas Saenz Julienne <nsaenzju@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Xu <peterx@redhat.com>, Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH bpf-next] bpf: introduce helper bpf_raw_read_cpu_clock
Date: Thu, 7 Oct 2021 06:10:03 -0300 [thread overview]
Message-ID: <20211007091003.GA337010@fuller.cnet> (raw)
In-Reply-To: <20211007071856.GM174703@worktop.programming.kicks-ass.net>
Hi Peter, Song,
On Thu, Oct 07, 2021 at 09:18:56AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 06, 2021 at 02:37:09PM -0700, Song Liu wrote:
> > On Wed, Oct 6, 2021 at 10:52 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > >
> > >
> > > Add bpf_raw_read_cpu_clock helper, to read architecture specific
> > > CPU clock. In x86's case, this is the TSC.
> > >
> > > This is necessary to synchronize bpf traces from host and guest bpf-programs
> > > (after subtracting guest tsc-offset from guest timestamps).
> >
> > Trying to understand the use case. So in a host-guest scenario,
> > bpf_ktime_get_ns()
> > will return different values in host and guest, but rdtsc() will give
> > the same value.
> > Is this correct?
>
> No, it will not.
No, but we can find out the delta between host and guest TSCs.
On x86, you can read the offset through debugfs file:
debugfs_create_file("tsc-offset", 0444, debugfs_dentry, vcpu,
&vcpu_tsc_offset_fops);
Other architectures can expose that offset.
> Also, please explain if any of this stands a chance of
> working for anything other than x86.
Yes, the same pattern repeats
ARM:
With offset between guest and host:
https://developer.arm.com/documentation/ddi0595/2020-12/AArch64-Registers/CNTVCT-EL0--Counter-timer-Virtual-Count-register?lang=en
Without offset:
commit 051ff581ce70e822729e9474941f3c206cbf7436
PPC:
https://yhbt.net/lore/all/5f267a8aec5b8199a580c96ab2b1a3c27de4eb09.camel@gmail.com/T/
(Time Base Register is read through mftb instruction).
> Or even on x86 in the face of
> guest migration.
It won't, but honestly we don't care about tracing at this level across
migration.
> Also, please explain, again, what's wrong with dumping snapshots of
> CLOCK_MONOTONIC{,_RAW} from host and guest and correlating time that
> way?
You can't read the guest and the host clock at the same time (there will always
be some variable delay between reading the two clocks). And that delay
is not fixed, but variable (depending on scheduling of the guest vcpus,
for example). So you will need an algorithm to estimate their differences,
with non zero error bounds:
"
Add a driver with gettime method returning hosts realtime clock.
This allows Chrony to synchronize host and guest clocks with
high precision (see results below).
chronyc> sources
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
#* PHC0 0 3 377 6 +4ns[ +4ns] +/- 3ns
"
Now with the hardware clock (which is usually the base for CLOCK_MONOTONIC_RAW),
there are no errors (offset will be 0 ns, rather than 3/4ns).
> And also explain why BPF needs to do this differently than all the other
> tracers.
For x86 we use:
echo "x86-tsc" > /sys/kernel/debug/tracing/trace_clock
For this purpose, on x86, so its not like anything different is being
done?
next prev parent reply other threads:[~2021-10-07 9:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-06 17:51 [PATCH bpf-next] bpf: introduce helper bpf_raw_read_cpu_clock Marcelo Tosatti
2021-10-06 21:37 ` Song Liu
2021-10-07 7:18 ` Peter Zijlstra
2021-10-07 9:10 ` Marcelo Tosatti [this message]
2021-10-07 17:50 ` [PATCH v2 " Marcelo Tosatti
2021-10-07 18:58 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211007091003.GA337010@fuller.cnet \
--to=mtosatti@redhat.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nitesh@redhat.com \
--cc=nsaenzju@redhat.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox