From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A777346E67 for ; Tue, 9 Jun 2026 19:35:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781033720; cv=none; b=r6MAl1TUpF4zepwIaZrIM9bssrTqHWJyIEb4SDeHdIuuOds5nG+ixa6LBf+QenyvNMk9AUovZesb1LICvivyhJ+R3B/qxwFBqajPinW9vzRlcdukNPzuYLwnWBPWgLcUtROqECC8iVromY2f385nByHLXVGmIp0x0TnyJiHH9hM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781033720; c=relaxed/simple; bh=snDDMjwd61vBOta+As6IuasrMgAhLZ+F4/liM7AoCeg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=aItRmqflTkIMX4GzONlKwNUzz9AqyZn7Kcal25YzJ6r216WT/020248HhNwv5epqz7XWi8H3n0G6zDKNZSbLQ8GBqbMLihBzxkT9JMq4CPnQ9AOA/rSDuslVlRTbQbExvRK+aVuMoHeHxtxvQS/+HC4Z00ITea5o45JB8qqBh50= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=ZmfCu7CI; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="ZmfCu7CI" Received: from pps.filterd (m0134420.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 659IZ4wA450709; Tue, 9 Jun 2026 19:34:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=pps0720; bh=put7eadEAcF/Srma04KeDpWOJQ nE3gBlzCQSRQFSvnw=; b=ZmfCu7CIGrgvZ1+NnuGtry1opOuh4rMluM55ryEoKj Zln+FrrGhdkQ0UEAlMm64UiPghcqNhLbcOirhhSFHz3KdfnQ3d5MFvMZhiW0m0bo 6gCXS3DJmvMKHFwTu6uxBfFvtlPZTR/VE/4dFkeYPCIcryRQxCMMd6m9ZgeKbgYh 80p5FL69jpBV1412iPHhE35hL3QrQ6Rm86pqOw3KDkUlUw7juJ3oJ6OaCPBClFi+ FamV/8tEy0VsClT28SJ5HxHIEonGws+rU/yMOHbmNJVqjIsZ2JEpY3EHRIA5p/PI qwGcg85su009UQyR5mROGpKOy22dVeqC5mLIXGzP8P1w== Received: from p1lg14878.it.hpe.com (p1lg14878.it.hpe.com [16.230.97.204]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 4epjhvq3v9-1 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 09 Jun 2026 19:34:16 +0000 (GMT) Received: from p1lg14886.dc01.its.hpecorp.net (unknown [10.119.18.237]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14878.it.hpe.com (Postfix) with ESMTPS id B4BDC13173; Tue, 9 Jun 2026 19:34:15 +0000 (UTC) Received: from hpe.com (unknown [16.231.227.36]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by p1lg14886.dc01.its.hpecorp.net (Postfix) with ESMTPS id 8E12480C445; Tue, 9 Jun 2026 19:34:12 +0000 (UTC) Date: Tue, 9 Jun 2026 14:34:10 -0500 From: Dimitri Sivanich To: Jiri Wiesner Cc: Thomas Gleixner , Linux Kernel Mailing List , Steve Wahl , Justin Ernst , Kyle Meyer , Russ Anderson , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , "Peter Zijlstra (Intel)" , Ilpo =?iso-8859-1?Q?J=E4rvinen?= , Marco Elver , "Guilherme G. Piccoli" , Nikunj A Dadhania , "Xin Li (Intel)" , Dimitri Sivanich Subject: Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. Message-ID: References: <87lddcv9vd.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Proofpoint-ORIG-GUID: 3bSOjpVJ0bl8h0A2D8Mo_pOzvwVcXdmC X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjA5MDE4MyBTYWx0ZWRfX1LjxtBJLwHed Bledi0gNFOCOF6uoACoEnEXf3Pl76CGGfD6S9u0QVxZMDLMzQJFPeDOaRGdxSo4hG73yar9Hivb Hxe66JOJ2gAEKuBZx0DGbO4nk3s/YYrfPREJO6o4cpsgZbs8TNM3FNke9QUsJzKDW02+PmsCy3o JK/aF1gAMt4kR/kTD9RaG8Bg1Q+coM5AEDTigTkpku5Nx8UKZPlYtHc7oLulh9a3p8h8zWpg0hA 2ovH04sL3ejvVwcZjmI+t1rEWU9VPQmJJLZoXwDOBldv8Al7WqA0VDWFVc4yR8EMS6HId6O9TnP GJkEZTkNaNIPKxN5BkAgKLSTF6sy1TtdQLGzDO07QKJ6ytpmIAdCtm5gVZsnPwXRs6SRgpWe6pW +VcvifIPnKbCMyBstYZPxrbcYT4uEKj4NQsOx8hcn756CmZEyC05QQ5vMoSrHEmY/t4LphIJQBk UwPhTF1Nw+adNFVtAWA== X-Proofpoint-GUID: 3bSOjpVJ0bl8h0A2D8Mo_pOzvwVcXdmC X-Authority-Analysis: v=2.4 cv=Zest8MVA c=1 sm=1 tr=0 ts=6a286ab8 cx=c_pps a=UObrlqRbTUrrdMEdGJ+KZA==:117 a=UObrlqRbTUrrdMEdGJ+KZA==:17 a=kj9zAlcOel0A:10 a=FelO9ux0wxsA:10 a=VkNPw1HP01LnGYTKEx00:22 a=gQcMVamqm3wCPoSYhaRC:22 a=RtSn8ETxjE2H05FtM2s8:22 a=-NEzFEGw4OCqj-ux-OwA:9 a=CjuIK1q_8ugA:10 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-09_04,2026-06-09_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 spamscore=0 adultscore=0 priorityscore=1501 clxscore=1015 impostorscore=0 bulkscore=0 malwarescore=0 suspectscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2605210000 definitions=main-2606090183 On Tue, Jun 09, 2026 at 11:59:26AM +0200, Jiri Wiesner wrote: > On Thu, May 21, 2026 at 09:08:34PM -0500, Dimitri Sivanich wrote: > > On Thu, May 21, 2026 at 09:30:14PM +0200, Thomas Gleixner wrote: > > > On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > > > > HPE UV hardware and firmware is designed to ensure a reliable and > > > > synchronized TSC mechanism. Comparing the TSC against secondary > > > > clocksources can result in false positives due to variable access > > > > latency caused by system traffic. > > I do not think that the access latency of the reference clocksource, sgi_rtc in this case, is the cause of the false positives. I think sgi_rtc really experiences time skew. More details below. Jiri, FYI, there was a firmware regression that impacted sgi_rtc on our Sapphire Rapids based UV systems, that would've caused the results that you saw. That has since been fixed. > > > The best course of action against > > > > these false positives has been found to simply disable watchdog > > > > checking of the TSC. > > > > > > > > Commits [1] and [2] were introduced to avoid an issue where the TSC > > > > is falsely declared unstable by exempting qualified platforms of up > > > > to 4-sockets from TSC clocksource watchdog checking. Extend that > > > > exemption to include recent and future UV platforms. > > > > > > Jiri asked you in the V3 submission: > > > > > > "A new implementation of the clocksource watchdog has been merged into > > > the upstream kernel. One of the changes made by the new clocksource > > > watchdog implementation is that reference clocksource reads are made > > > on the boot CPU only. Perhaps, the sgi_rtc clocksource would work well > > > with this implementation. So, testing is needed in order to find out > > > if this patch are any future in the upstream Linux. Dimitri, would you > > > be able to run tests on UV systems to check if the new clocksource > > > watchdog implementation works and the hardware limitations of sgi_rtc > > > do not get in the way?" > > > > > > This question is still not answered by you and it has been confirmed > > > that the new watchdog works flawlessly on a 1920 threads 16 socket > > > system under massive load and system traffic. > > > > I tested a 7.1-rc4 kernel on a 2048 thread 16 socket system and, while > > under test, the TSC did get marked as unstable after a series of "sgi_rtc > > read timed out" warnings. > > The new clocksource watchdog implementation makes sure to act on time skew only if the time between two reference clocksource readouts does not exceed 50 us. The threshold for evaluating time skew (based on SHIFT_500PPM) is 244 us for a 500 ms interval plus the measured reference clocksource readout latency. If the comparison to the reference clocksource fails on CPU 0 the time skew between the clocksource being checked and the reference clocksource must be at least 244 us. The clocksource watchdog cannot distiguish which of the clocksources is skewed, and it must make the assumption that the clocksource being checked is skewed. > > In the past, I worked on a bug where a customer with an HPE UV machine reported degraded performance and switches to the HPET. This kernel had the old clocksource watchdog implementation. I created a debugging kernel with the HPET as a second watchdog (not affecting the decisions by the watchdog) and got this result: > > clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large: > > clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff > > clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff > > clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff > > clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms) > > clocksource: 'tsc' is current clocksource. > > tsc: Marking TSC unstable due to clocksource watchdog > > clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767. > > clocksource: Switched to clocksource sgi_rtc > > The intervals measured by the TSC and the HPET match very well; the sgi_rtc is off. I find it hard to believe that both the TSC and the HPET would be skewed - both reporting a longer interval - while sgi_rtc was correct. I think sgi_rtc was skewed. > > There are several solution to work around the hardware limitation of sgi_rtc: > 1. Disable the clocksource watchdog > 2. Decrease the rating of sgi_rtc > 3. Disable sgi_rtc > > Solution 3 in the form of a nouvrtc parameter was previously rejected on this mailing list. The disadvantage of the solution is that each customer would have to pass the nouvrtc parameter to the kernel to avoid false positives by the clocksource watchdog, which makes no sense from the POV of OS support (as done by e.g. SUSE). > -- > Jiri Wiesner > SUSE Labs