From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752375AbdCPOBk (ORCPT ); Thu, 16 Mar 2017 10:01:40 -0400 Received: from mail.kernel.org ([198.145.29.136]:59312 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750899AbdCPOBj (ORCPT ); Thu, 16 Mar 2017 10:01:39 -0400 Date: Thu, 16 Mar 2017 11:01:03 -0300 From: Arnaldo Carvalho de Melo To: Peter Zijlstra , Adrian Hunter Cc: Jiri Olsa , Namhyung Kim , Wang Nan , Linux Kernel Mailing List Subject: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity" Message-ID: <20170316140103.GU12825@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Url: http://acmel.wordpress.com User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, this entry is failing for a while: [root@jouet ~]# perf test -v tsc 55: Convert perf time to TSC : --- start --- test child forked, pid 3008 mmap size 528384B 1st event perf time 93133455486631 tsc 15369449468752 rdtsc time 93133464598760 tsc 15369473104358 2nd event perf time 93133455506961 tsc 15369449521485 test child finished with -1 ---- end ---- Convert perf time to TSC: FAILED! [root@jouet ~]# I bisected it to the following kernel change, ideas? [acme@felicio linux]$ git bisect good 5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit commit 5680d8094ffa9e5cfc81afdd865027ee6417c263 Author: Peter Zijlstra Date: Thu Dec 15 13:36:17 2016 +0100 sched/clock: Provide better clock continuity When switching between the unstable and stable variants it is currently possible that clock discontinuities occur. And while these will mostly be 'small', attempt to do better. As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the ktime_get_ns() based timeline at the point of switchover (sched_clock_init_late()) after SMP bringup. Equally, when the TSC is later found to be unstable -- typically because SMM tries to hide its SMI latencies by mucking with the TSC -- we want to avoid large jumps. Since the clocksource watchdog reports the issue after the fact we cannot exactly fix up time, but since SMI latencies are typically small (~10ns range), the discontinuity is mainly due to drift between sched_clock() and ktime_get_ns() (which on my desktop is ~79s over 24days). I dislike this patch because it adds overhead to the good case in favour of dealing with badness. But given the widespread failure of TSC stability this is worth it. Note that in case the TSC makes drastic jumps after SMP bringup we're still hosed. There's just not much we can do in that case without stupid overhead. If we were to somehow expose tsc_clocksource_reliable (which is hard because this code is also used on ia64 and parisc) we could avoid some of the newly introduced overhead. Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar :040000 040000 152545abe3b879aaa3cf053cdd58ef998c285529 3afcd0a5bc643fdd0fc994ee11cbfd87cfe4c30f M kernel [acme@felicio linux]$