From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: John Stultz <john.stultz@linaro.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
stable <stable@vger.kernel.org>,
Brendan Gregg <bgregg@netflix.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression
Date: Wed, 5 Oct 2016 10:57:12 +0000 (UTC) [thread overview]
Message-ID: <473423924.46781.1475665032457.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <1475636148-26539-1-git-send-email-john.stultz@linaro.org>
----- On Oct 4, 2016, at 10:55 PM, John Stultz john.stultz@linaro.org wrote:
> In commit 27727df240c7 ("Avoid taking lock in NMI path with
> CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
> the timekeeping_get_ns() function, but I forgot to include
> the unit conversion from cycles to nanoseconds, breaking the
> function's output, which impacts users like perf.
It also impacts LTTng. I've had reports of timestamp borkage from
LTTng end users yesterday with kernels 4.7.5 and 4.7.4.
Some stable branches sleuthing indicates that the following
kernel versions are affected: 4.8, 4.7.4+, 4.4.20+, 4.1.32+
I plan to add kernel version range checks in lttng-modules to
work-around this mess, but I need an upper bound. Is there any way
we can get this upstream and cherry-picked into stable branches
ASAP ?
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LTTng kernel traces with Linux 4.8:
[06:37:40.248222037] (+0.000001057) compudjdev power_cpu_idle: { cpu_id = 10 }, { state = 1, cpu_id = 10 }
[06:37:40.248222037] (+0.000000000) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 10 }, { vector = 239 }
[06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_cancel: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700 }
[06:37:40.248222037] (+0.000000000) compudjdev timer_hrtimer_expire_entry: { cpu_id = 10 }, { hrtimer = 0xFFFF880FC1692700, now = 211036043790, function = 0xFFFFFFFF8116FD40 }
[06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 1 }
[06:37:40.248222037] (+0.000000000) compudjdev rcu_utilization: { cpu_id = 10 }, { s = "Start scheduler-tick" }
[06:37:40.248222037] (+0.000000000) compudjdev irq_softirq_raise: { cpu_id = 10 }, { vec = 9 }
[...]
% lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l
76156
LTTng kernel traces with Linux 4.8 + this fix applied:
[06:46:52.648061087] (+0.000001455) compudjdev power_cpu_idle: { cpu_id = 23 }, { state = 1, cpu_id = 23 }
[06:46:52.652011654] (+0.003950567) compudjdev x86_irq_vectors_local_timer_entry: { cpu_id = 9 }, { vector = 239 }
[06:46:52.652012324] (+0.000000670) compudjdev timer_hrtimer_cancel: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700 }
[06:46:52.652012654] (+0.000000330) compudjdev timer_hrtimer_expire_entry: { cpu_id = 9 }, { hrtimer = 0xFFFF880FC1652700, now = 91808012323, function = 0xFFFFFFFF8116FD80 }
[06:46:52.652013354] (+0.000000700) compudjdev irq_softirq_raise: { cpu_id = 9 }, { vec = 1 }
[06:46:52.652013784] (+0.000000430) compudjdev rcu_utilization: { cpu_id = 9 }, { s = "Start scheduler-tick" }
% lttng view | grep "cpu_id = 0" | grep "(+0.000000000)" |wc -l
2
Which confirms that this fix addresses the issue.
This issue seems to affect perf, ftrace "mono" clock source, and
eBPF in Linux.
Thanks,
Mathieu
>
> This would result in bogus perf timestamps like:
> swapper 0 [000] 253.427536: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.426573: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.426687: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.426800: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.426905: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.427022: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.427127: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.427239: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.427346: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 254.427463: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 255.426572: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
>
> Instead of more reasonable expected timestamps like:
> swapper 0 [000] 39.953768: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.064839: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.175956: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.287103: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.398217: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.509324: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.620437: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.731546: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.842654: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 40.953772: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
> swapper 0 [000] 41.064881: 111111111 cpu-clock: ffffffff810a0de6
> native_safe_halt+0x6 ([kernel.kallsyms])
>
> This patch adds the proper use of timekeeping_delta_to_ns()
> to convert the cycle delta to nanoseconds as needed.
>
> Thanks to Brendan and Alexei for finding this quickly after
> the v4.8 release. Unfortunately the problematic commit has
> landed in some -stable trees so they'll need this fix as
> well.
>
> Many apologies for this mistake. I'll be looking to add a
> perf-clock sanity test to the kselftest timers tests soon.
>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable <stable@vger.kernel.org>
> Cc: Brendan Gregg <bgregg@netflix.com>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with
> CONFIG_DEBUG_TIMEKEEPING"
> Reported-by: Brendan Gregg <bgregg@netflix.com>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> Thomas/Ingo:
> I've reproduced the issue and validated this fixes it, but given my limited perf
> usage so far, I'd appreciate waiting for a Tested-by: from one of the reporters
> before considering for tip/timers/urgent.
>
> kernel/time/timekeeping.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index e07fb09..37dec7e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -403,8 +403,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct
> tk_fast *tkf)
> tkr = tkf->base + (seq & 0x01);
> now = ktime_to_ns(tkr->base);
>
> - now += clocksource_delta(tkr->read(tkr->clock),
> - tkr->cycle_last, tkr->mask);
> + now += timekeeping_delta_to_ns(tkr,
> + clocksource_delta(
> + tkr->read(tkr->clock),
> + tkr->cycle_last,
> + tkr->mask));
> } while (read_seqcount_retry(&tkf->seq, seq));
>
> return now;
> --
> 1.9.1
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2016-10-05 10:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-23 23:08 [GIT PULL][PATCH 0/2] Timekeeping items for 4.8 via tip/timers/urgent John Stultz
2016-08-23 23:08 ` [PATCH 1/2] timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING John Stultz
2016-08-24 7:40 ` [tip:timers/urgent] " tip-bot for John Stultz
2016-10-05 0:30 ` [PATCH 1/2] " Alexei Starovoitov
2016-10-05 0:50 ` John Stultz
2016-10-05 2:55 ` [PATCH] timekeeping: Fix __ktime_get_fast_ns() regression John Stultz
2016-10-05 10:57 ` Mathieu Desnoyers [this message]
2016-10-05 13:48 ` [tip:timers/urgent] " tip-bot for John Stultz
2016-08-23 23:08 ` [PATCH 2/2] timekeeping: Cap array access in timekeeping_debug to protect against invalid sleep times John Stultz
2016-08-24 0:58 ` Chen Yu
2016-08-24 2:54 ` John Stultz
2016-08-24 2:56 ` [RFC][PATCH v2] " John Stultz
2016-08-24 7:41 ` [tip:timers/urgent] timekeeping: Cap array access in timekeeping_debug tip-bot for John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=473423924.46781.1475665032457.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bgregg@netflix.com \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox