From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751788AbbJ2HjF (ORCPT ); Thu, 29 Oct 2015 03:39:05 -0400 Received: from szxga03-in.huawei.com ([119.145.14.66]:41716 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750853AbbJ2HjD (ORCPT ); Thu, 29 Oct 2015 03:39:03 -0400 Subject: Re: [PATCH 2/2] arm64: validate the delta of cycle_now and cycle_last To: Mark Rutland References: <1445952073-7260-1-git-send-email-yangyingliang@huawei.com> <1445952073-7260-3-git-send-email-yangyingliang@huawei.com> <20151027140302.GE3091@leverpostej> CC: , , Thomas Gleixner , , , , Hanjun Guo From: Yang Yingliang Message-ID: <5631CC8F.6050201@huawei.com> Date: Thu, 29 Oct 2015 15:36:47 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20151027140302.GE3091@leverpostej> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.19.219] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.5631CC9A.00A6,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 4b0bfb77d632a36529a31aa158bc12c5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2015/10/27 22:03, Mark Rutland wrote: > On Tue, Oct 27, 2015 at 09:21:13PM +0800, Yang Yingliang wrote: >> In multi-core system, if the clock is not sync perfectly, it >> will make cycle_last that recorded by CPU-A is a little more >> than cycle_now that read by CPU-B. > > If that is happening, that sounds like a hardware and/or firmware bug. > > The ARM ARM states the following (where a CPU is a device): > > The system counter must provide a uniform view of system time. More > precisely, it must be impossible for the following sequence of events > to show system time going backwards: > > 1. Device A reads the time from the system counter. > > 2. Device A communicates with another agent in the system, Device B. > > 3. After recognizing the communication from Device A, Device B reads > the time from the system counter. > > Per [1] it seems like the TSC is not architected to provide this guarantee for > x86. > > Are you certain that the CPUs have clocks which are not in sync? > >> With the negative result, >> hrtimer_update_base() return a huge and wrong time. It leads >> to the cpu can not finish the while loop in hrtimer_interrupt() >> until the real nowtime which is returned from ktime_get() catch >> up with the wrong time on clock monotonic base. >> >> I was able to reproudce the problem with calling clock_settime >> and clock_adjtime repeatedly on each cpu. The params of the calls >> is random. > > Could you share your reproducer? https://github.com/kernelslacker/trinity I use this testsuite and I make it call clock_settime and clock_adjtime alternately with random params on each core. > > How long does it take to trigger the issue? It's not certain. It would take half an hour or several hours or more longer. > >> Here is the calltrace: >> >> Jan 01 00:02:29 Linux kernel: INFO: rcu_sched detected stalls on CPUs/tasks: >> Jan 01 00:02:29 Linux kernel: 0: (2 GPs behind) idle=913/1/0 softirq=59289/59291 fqs=488 >> Jan 01 00:02:29 Linux kernel: (detected by 20, t=5252 jiffies, g=35769, c=35768, q=567) >> Jan 01 00:02:29 Linux kernel: Task dump for CPU 0: >> Jan 01 00:02:29 Linux kernel: swapper/0 R running task 0 0 0 0x00000002 >> Jan 01 00:02:29 Linux kernel: Call trace: >> Jan 01 00:02:29 Linux kernel: [] __switch_to+0x74/0x8c >> Jan 01 00:02:29 Linux kernel: rcu_sched kthread starved for 4764 jiffies! >> Jan 01 00:03:32 Linux kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] >> Jan 01 00:03:32 Linux kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.6+ #184 >> Jan 01 00:03:32 Linux kernel: task: ffffffc00091cdf0 ti: ffffffc000910000 task.ti: ffffffc000910000 >> Jan 01 00:03:32 Linux kernel: PC is at arch_cpu_idle+0x10/0x18 >> Jan 01 00:03:32 Linux kernel: LR is at arch_cpu_idle+0xc/0x18 >> Jan 01 00:03:32 Linux kernel: pc : [] lr : [] pstate: 60000145 >> Jan 01 00:03:32 Linux kernel: sp : ffffffc000913f20 >> Jan 01 00:03:32 Linux kernel: x29: ffffffc000913f20 x28: 000000003f4bbab0 >> Jan 01 00:03:32 Linux kernel: x27: ffffffc00091669c x26: ffffffc00096e000 >> Jan 01 00:03:32 Linux kernel: x25: ffffffc000804000 x24: ffffffc000913f30 >> Jan 01 00:03:32 Linux kernel: x23: 0000000000000001 x22: ffffffc0006817f8 >> Jan 01 00:03:32 Linux kernel: x21: ffffffc0008fdb00 x20: ffffffc000916618 >> Jan 01 00:03:32 Linux kernel: x19: ffffffc000910000 x18: 00000000ffffffff >> Jan 01 00:03:32 Linux kernel: x17: 0000007f9d6f682c x16: ffffffc0001e19d0 >> Jan 01 00:03:32 Linux kernel: x15: 0000000000000061 x14: 0000000000000072 >> Jan 01 00:03:32 Linux kernel: x13: 0000000000000067 x12: ffffffc000682528 >> Jan 01 00:03:32 Linux kernel: x11: 0000000000000005 x10: 00000001000faf9a >> Jan 01 00:03:32 Linux kernel: x9 : ffffffc000913e60 x8 : ffffffc00091d350 >> Jan 01 00:03:32 Linux kernel: x7 : 0000000000000000 x6 : 002b24c4f00026aa >> Jan 01 00:03:32 Linux kernel: x5 : 0000001ffd5c6000 x4 : ffffffc000913ea0 >> Jan 01 00:03:32 Linux kernel: x3 : ffffffdffdec3b44 x2 : ffffffdffdec3b44 >> Jan 01 00:03:32 Linux kernel: x1 : 0000000000000000 x0 : 0000000000000000 > > In this case was CNTVOFF uniform on all CPUs? > > Was the kernel booted at EL2 or EL1N? Was it booted under a hypervisor? At EL1N without a hypervisor. > >> CPU-A updates the cycle_last in do_settimeofday64() under lock and CPU-B >> reads the current cycles which is slightly behind CPU-A to substract the >> cycle_last after unlock, then we get a negative result, after masking it >> comes to a extremely huge value and lead to "hang" in hrtimer_interrupt(). > > It's possible that we have missing ISBs or DSBs somewhere, and we're > encountering a re-ordering that we did not expect. > > It would be very helpful to be able to analyse with your reproducer. I > have a kernel test in a local branch which I've never managed to trigger > a failure with otehr than on systems with a horrifically skewed CNTVOFF. > >> And multi-core system on X86 had already met such problem and Thomas introduce >> a fix which is commit 47001d603375 ("x86: tsc prevent time going backwards"). >> And then Thomas moved the fix code into the core code file of time in >> commit 09ec54429c6d ("clocksource: Move cycle_last validation to core code"). >> Now the validation can be enabled by config CLOCKSOURCE_VALIDATE_LAST_CYCLE. >> I think we can fix the problem on arm64 by selecting the config. This is no >> side effect for systems with counters running properly. > > As above, per [1], I'm not sure that the same rationale applies, and > it's somewhat worrying to mask the issue in this manner. > > Thanks, > Mark. > > [1] https://lkml.org/lkml/2007/8/23/96 > >> >> Signed-off-by: Yang Yingliang >> Cc: Thomas Gleixner >> --- >> arch/arm64/Kconfig | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 07d1811..6a53926 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -30,6 +30,7 @@ config ARM64 >> select GENERIC_ALLOCATOR >> select GENERIC_CLOCKEVENTS >> select GENERIC_CLOCKEVENTS_BROADCAST >> + select CLOCKSOURCE_VALIDATE_LAST_CYCLE >> select GENERIC_CPU_AUTOPROBE >> select GENERIC_EARLY_IOREMAP >> select GENERIC_IDLE_POLL_SETUP >> -- >> 2.5.0 >> >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> > > . >