* [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives.
@ 2026-05-21 13:17 Dimitri Sivanich
2026-05-21 13:20 ` [PATCH v4 1/2] x86/platform/uv: Expose the uv_hub_type() interface Dimitri Sivanich
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Dimitri Sivanich @ 2026-05-21 13:17 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Jiri Wiesner, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen,
Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania,
Xin Li (Intel), Dimitri Sivanich
HPE UV hardware and firmware is designed to ensure a reliable and
synchronized TSC mechanism. Comparing the TSC against secondary
clocksources can result in false positives due to variable access
latency caused by system traffic. The best course of action against
these false positives has been found to simply disable watchdog
checking of the TSC.
Commits [1] and [2] were introduced to avoid an issue where the TSC
is falsely declared unstable by exempting qualified platforms of up
to 4-sockets from TSC clocksource watchdog checking. Extend that
exemption to include recent and future UV platforms.
[1] commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms")
[2] commit 233756a640be ("x86/tsc: Extend watchdog check exemption to 4-Sockets platform")
Dimitri Sivanich (2):
Expose the uv_hub_type() interface
Disable clocksource watchdog checking on recent and future UV
platforms.
arch/x86/include/asm/uv/uv_hub.h | 7 +++++++
arch/x86/kernel/tsc.c | 27 +++++++++++++++++++++------
2 files changed, 28 insertions(+), 6 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v4 1/2] x86/platform/uv: Expose the uv_hub_type() interface 2026-05-21 13:17 [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Dimitri Sivanich @ 2026-05-21 13:20 ` Dimitri Sivanich 2026-05-21 13:23 ` [PATCH v4 2/2] x86/tsc: Disable clocksource watchdog checking on recent and future UV platforms Dimitri Sivanich 2026-05-21 19:30 ` [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Thomas Gleixner 2 siblings, 0 replies; 8+ messages in thread From: Dimitri Sivanich @ 2026-05-21 13:20 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Jiri Wiesner, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich Expose the uv_hub_type() interface for use in non-UV specific code. Signed-off-by: Dimitri Sivanich <sivanich@hpe.com> --- arch/x86/include/asm/uv/uv_hub.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/include/asm/uv/uv_hub.h b/arch/x86/include/asm/uv/uv_hub.h index ea877fd83114..5cffc5b9e989 100644 --- a/arch/x86/include/asm/uv/uv_hub.h +++ b/arch/x86/include/asm/uv/uv_hub.h @@ -209,10 +209,17 @@ static inline struct uv_hub_info_s *uv_cpu_hub_info(int cpu) return (struct uv_hub_info_s *)uv_cpu_info_per(cpu)->p_uv_hub_info; } +#ifdef CONFIG_X86_UV static inline int uv_hub_type(void) { return uv_hub_info->hub_type; } +#else +static inline int uv_hub_type(void) +{ + return 0; +} +#endif static inline __init void uv_hub_type_set(int uvmask) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/2] x86/tsc: Disable clocksource watchdog checking on recent and future UV platforms. 2026-05-21 13:17 [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Dimitri Sivanich 2026-05-21 13:20 ` [PATCH v4 1/2] x86/platform/uv: Expose the uv_hub_type() interface Dimitri Sivanich @ 2026-05-21 13:23 ` Dimitri Sivanich 2026-05-21 19:30 ` [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Thomas Gleixner 2 siblings, 0 replies; 8+ messages in thread From: Dimitri Sivanich @ 2026-05-21 13:23 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Jiri Wiesner, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich HPE UV hardware and firmware is designed to ensure a reliable and synchronized TSC mechanism. Comparing the TSC against secondary clocksources can result in false positives due to variable access latency caused by system traffic. The best course of action against these false positives has been found to simply disable watchdog checking of the TSC. Commits [1] and [2] were introduced to avoid an issue where the TSC is falsely declared unstable by exempting qualified platforms of up to 4-sockets from TSC clocksource watchdog checking. Extend that exemption to include recent and future UV platforms. [1] 'commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms")' [2] 'commit 233756a640be ("x86/tsc: Extend watchdog check exemption to 4-Sockets platform")' Signed-off-by: Dimitri Sivanich <sivanich@hpe.com> --- arch/x86/kernel/tsc.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index c5110eb554bc..08e8e5511749 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -32,6 +32,7 @@ #include <asm/i8259.h> #include <asm/msr.h> #include <asm/topology.h> +#include <asm/uv/uv_hub.h> #include <asm/uv/uv.h> #include <asm/sev.h> @@ -1228,6 +1229,20 @@ static void __init tsc_disable_clocksource_watchdog(void) clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; } +static bool __init platform_is_exempt_from_watchdog(void) +{ + /* Platforms with no more than 4 packages are exempt */ + if (topology_max_packages() <= 4) + return true; + +#ifdef CONFIG_X86_64 + /* Recent UV systems are exempt */ + if (is_uvy_hub()) + return true; +#endif + return false; +} + static void __init check_system_tsc_reliable(void) { #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC) @@ -1246,17 +1261,17 @@ static void __init check_system_tsc_reliable(void) tsc_clocksource_reliable = 1; /* - * Disable the clocksource watchdog when the system has: - * - TSC running at constant frequency - * - TSC which does not stop in C-States - * - the TSC_ADJUST register which allows to detect even minimal + * Disable the clocksource watchdog when the system: + * - has TSC running at constant frequency + * - has TSC which does not stop in C-States + * - has the TSC_ADJUST register which allows to detect even minimal * modifications - * - not more than four packages + * - is exempt from running the clocksource watchdog */ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - topology_max_packages() <= 4) + platform_is_exempt_from_watchdog()) tsc_disable_clocksource_watchdog(); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. 2026-05-21 13:17 [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Dimitri Sivanich 2026-05-21 13:20 ` [PATCH v4 1/2] x86/platform/uv: Expose the uv_hub_type() interface Dimitri Sivanich 2026-05-21 13:23 ` [PATCH v4 2/2] x86/tsc: Disable clocksource watchdog checking on recent and future UV platforms Dimitri Sivanich @ 2026-05-21 19:30 ` Thomas Gleixner 2026-05-22 2:08 ` Dimitri Sivanich 2 siblings, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2026-05-21 19:30 UTC (permalink / raw) To: Dimitri Sivanich, Linux Kernel Mailing List Cc: Jiri Wiesner, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > HPE UV hardware and firmware is designed to ensure a reliable and > synchronized TSC mechanism. Comparing the TSC against secondary > clocksources can result in false positives due to variable access > latency caused by system traffic. The best course of action against > these false positives has been found to simply disable watchdog > checking of the TSC. > > Commits [1] and [2] were introduced to avoid an issue where the TSC > is falsely declared unstable by exempting qualified platforms of up > to 4-sockets from TSC clocksource watchdog checking. Extend that > exemption to include recent and future UV platforms. Jiri asked you in the V3 submission: "A new implementation of the clocksource watchdog has been merged into the upstream kernel. One of the changes made by the new clocksource watchdog implementation is that reference clocksource reads are made on the boot CPU only. Perhaps, the sgi_rtc clocksource would work well with this implementation. So, testing is needed in order to find out if this patch are any future in the upstream Linux. Dimitri, would you be able to run tests on UV systems to check if the new clocksource watchdog implementation works and the hardware limitations of sgi_rtc do not get in the way?" This question is still not answered by you and it has been confirmed that the new watchdog works flawlessly on a 1920 threads 16 socket system under massive load and system traffic. So you do not even have the courtesy to test, you just go and make the same claims you made before based on the original watchdog implementation. Feel free to ignore people, but then don't be surprised when people ignore you as well. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. 2026-05-21 19:30 ` [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Thomas Gleixner @ 2026-05-22 2:08 ` Dimitri Sivanich 2026-06-09 9:59 ` Jiri Wiesner 0 siblings, 1 reply; 8+ messages in thread From: Dimitri Sivanich @ 2026-05-22 2:08 UTC (permalink / raw) To: Thomas Gleixner Cc: Linux Kernel Mailing List, Jiri Wiesner, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich On Thu, May 21, 2026 at 09:30:14PM +0200, Thomas Gleixner wrote: > On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > > HPE UV hardware and firmware is designed to ensure a reliable and > > synchronized TSC mechanism. Comparing the TSC against secondary > > clocksources can result in false positives due to variable access > > latency caused by system traffic. The best course of action against > > these false positives has been found to simply disable watchdog > > checking of the TSC. > > > > Commits [1] and [2] were introduced to avoid an issue where the TSC > > is falsely declared unstable by exempting qualified platforms of up > > to 4-sockets from TSC clocksource watchdog checking. Extend that > > exemption to include recent and future UV platforms. > > Jiri asked you in the V3 submission: > > "A new implementation of the clocksource watchdog has been merged into > the upstream kernel. One of the changes made by the new clocksource > watchdog implementation is that reference clocksource reads are made > on the boot CPU only. Perhaps, the sgi_rtc clocksource would work well > with this implementation. So, testing is needed in order to find out > if this patch are any future in the upstream Linux. Dimitri, would you > be able to run tests on UV systems to check if the new clocksource > watchdog implementation works and the hardware limitations of sgi_rtc > do not get in the way?" > > This question is still not answered by you and it has been confirmed > that the new watchdog works flawlessly on a 1920 threads 16 socket > system under massive load and system traffic. I tested a 7.1-rc4 kernel on a 2048 thread 16 socket system and, while under test, the TSC did get marked as unstable after a series of "sgi_rtc read timed out" warnings. > > So you do not even have the courtesy to test, you just go and make the > same claims you made before based on the original watchdog > implementation. > > Feel free to ignore people, but then don't be surprised when people > ignore you as well. > > Thanks, > > tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. 2026-05-22 2:08 ` Dimitri Sivanich @ 2026-06-09 9:59 ` Jiri Wiesner 2026-06-09 19:34 ` Dimitri Sivanich 0 siblings, 1 reply; 8+ messages in thread From: Jiri Wiesner @ 2026-06-09 9:59 UTC (permalink / raw) To: Dimitri Sivanich Cc: Thomas Gleixner, Linux Kernel Mailing List, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich On Thu, May 21, 2026 at 09:08:34PM -0500, Dimitri Sivanich wrote: > On Thu, May 21, 2026 at 09:30:14PM +0200, Thomas Gleixner wrote: > > On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > > > HPE UV hardware and firmware is designed to ensure a reliable and > > > synchronized TSC mechanism. Comparing the TSC against secondary > > > clocksources can result in false positives due to variable access > > > latency caused by system traffic. I do not think that the access latency of the reference clocksource, sgi_rtc in this case, is the cause of the false positives. I think sgi_rtc really experiences time skew. More details below. > The best course of action against > > > these false positives has been found to simply disable watchdog > > > checking of the TSC. > > > > > > Commits [1] and [2] were introduced to avoid an issue where the TSC > > > is falsely declared unstable by exempting qualified platforms of up > > > to 4-sockets from TSC clocksource watchdog checking. Extend that > > > exemption to include recent and future UV platforms. > > > > Jiri asked you in the V3 submission: > > > > "A new implementation of the clocksource watchdog has been merged into > > the upstream kernel. One of the changes made by the new clocksource > > watchdog implementation is that reference clocksource reads are made > > on the boot CPU only. Perhaps, the sgi_rtc clocksource would work well > > with this implementation. So, testing is needed in order to find out > > if this patch are any future in the upstream Linux. Dimitri, would you > > be able to run tests on UV systems to check if the new clocksource > > watchdog implementation works and the hardware limitations of sgi_rtc > > do not get in the way?" > > > > This question is still not answered by you and it has been confirmed > > that the new watchdog works flawlessly on a 1920 threads 16 socket > > system under massive load and system traffic. > > I tested a 7.1-rc4 kernel on a 2048 thread 16 socket system and, while > under test, the TSC did get marked as unstable after a series of "sgi_rtc > read timed out" warnings. The new clocksource watchdog implementation makes sure to act on time skew only if the time between two reference clocksource readouts does not exceed 50 us. The threshold for evaluating time skew (based on SHIFT_500PPM) is 244 us for a 500 ms interval plus the measured reference clocksource readout latency. If the comparison to the reference clocksource fails on CPU 0 the time skew between the clocksource being checked and the reference clocksource must be at least 244 us. The clocksource watchdog cannot distiguish which of the clocksources is skewed, and it must make the assumption that the clocksource being checked is skewed. In the past, I worked on a bug where a customer with an HPE UV machine reported degraded performance and switches to the HPET. This kernel had the old clocksource watchdog implementation. I created a debugging kernel with the HPET as a second watchdog (not affecting the decisions by the watchdog) and got this result: > clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large: > clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff > clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff > clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff > clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms) > clocksource: 'tsc' is current clocksource. > tsc: Marking TSC unstable due to clocksource watchdog > clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767. > clocksource: Switched to clocksource sgi_rtc The intervals measured by the TSC and the HPET match very well; the sgi_rtc is off. I find it hard to believe that both the TSC and the HPET would be skewed - both reporting a longer interval - while sgi_rtc was correct. I think sgi_rtc was skewed. There are several solution to work around the hardware limitation of sgi_rtc: 1. Disable the clocksource watchdog 2. Decrease the rating of sgi_rtc 3. Disable sgi_rtc Solution 3 in the form of a nouvrtc parameter was previously rejected on this mailing list. The disadvantage of the solution is that each customer would have to pass the nouvrtc parameter to the kernel to avoid false positives by the clocksource watchdog, which makes no sense from the POV of OS support (as done by e.g. SUSE). -- Jiri Wiesner SUSE Labs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. 2026-06-09 9:59 ` Jiri Wiesner @ 2026-06-09 19:34 ` Dimitri Sivanich 2026-06-10 10:21 ` Jiri Wiesner 0 siblings, 1 reply; 8+ messages in thread From: Dimitri Sivanich @ 2026-06-09 19:34 UTC (permalink / raw) To: Jiri Wiesner Cc: Thomas Gleixner, Linux Kernel Mailing List, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich On Tue, Jun 09, 2026 at 11:59:26AM +0200, Jiri Wiesner wrote: > On Thu, May 21, 2026 at 09:08:34PM -0500, Dimitri Sivanich wrote: > > On Thu, May 21, 2026 at 09:30:14PM +0200, Thomas Gleixner wrote: > > > On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > > > > HPE UV hardware and firmware is designed to ensure a reliable and > > > > synchronized TSC mechanism. Comparing the TSC against secondary > > > > clocksources can result in false positives due to variable access > > > > latency caused by system traffic. > > I do not think that the access latency of the reference clocksource, sgi_rtc in this case, is the cause of the false positives. I think sgi_rtc really experiences time skew. More details below. Jiri, FYI, there was a firmware regression that impacted sgi_rtc on our Sapphire Rapids based UV systems, that would've caused the results that you saw. That has since been fixed. > > > The best course of action against > > > > these false positives has been found to simply disable watchdog > > > > checking of the TSC. > > > > > > > > Commits [1] and [2] were introduced to avoid an issue where the TSC > > > > is falsely declared unstable by exempting qualified platforms of up > > > > to 4-sockets from TSC clocksource watchdog checking. Extend that > > > > exemption to include recent and future UV platforms. > > > > > > Jiri asked you in the V3 submission: > > > > > > "A new implementation of the clocksource watchdog has been merged into > > > the upstream kernel. One of the changes made by the new clocksource > > > watchdog implementation is that reference clocksource reads are made > > > on the boot CPU only. Perhaps, the sgi_rtc clocksource would work well > > > with this implementation. So, testing is needed in order to find out > > > if this patch are any future in the upstream Linux. Dimitri, would you > > > be able to run tests on UV systems to check if the new clocksource > > > watchdog implementation works and the hardware limitations of sgi_rtc > > > do not get in the way?" > > > > > > This question is still not answered by you and it has been confirmed > > > that the new watchdog works flawlessly on a 1920 threads 16 socket > > > system under massive load and system traffic. > > > > I tested a 7.1-rc4 kernel on a 2048 thread 16 socket system and, while > > under test, the TSC did get marked as unstable after a series of "sgi_rtc > > read timed out" warnings. > > The new clocksource watchdog implementation makes sure to act on time skew only if the time between two reference clocksource readouts does not exceed 50 us. The threshold for evaluating time skew (based on SHIFT_500PPM) is 244 us for a 500 ms interval plus the measured reference clocksource readout latency. If the comparison to the reference clocksource fails on CPU 0 the time skew between the clocksource being checked and the reference clocksource must be at least 244 us. The clocksource watchdog cannot distiguish which of the clocksources is skewed, and it must make the assumption that the clocksource being checked is skewed. > > In the past, I worked on a bug where a customer with an HPE UV machine reported degraded performance and switches to the HPET. This kernel had the old clocksource watchdog implementation. I created a debugging kernel with the HPET as a second watchdog (not affecting the decisions by the watchdog) and got this result: > > clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large: > > clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff > > clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff > > clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff > > clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms) > > clocksource: 'tsc' is current clocksource. > > tsc: Marking TSC unstable due to clocksource watchdog > > clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767. > > clocksource: Switched to clocksource sgi_rtc > > The intervals measured by the TSC and the HPET match very well; the sgi_rtc is off. I find it hard to believe that both the TSC and the HPET would be skewed - both reporting a longer interval - while sgi_rtc was correct. I think sgi_rtc was skewed. > > There are several solution to work around the hardware limitation of sgi_rtc: > 1. Disable the clocksource watchdog > 2. Decrease the rating of sgi_rtc > 3. Disable sgi_rtc > > Solution 3 in the form of a nouvrtc parameter was previously rejected on this mailing list. The disadvantage of the solution is that each customer would have to pass the nouvrtc parameter to the kernel to avoid false positives by the clocksource watchdog, which makes no sense from the POV of OS support (as done by e.g. SUSE). > -- > Jiri Wiesner > SUSE Labs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives. 2026-06-09 19:34 ` Dimitri Sivanich @ 2026-06-10 10:21 ` Jiri Wiesner 0 siblings, 0 replies; 8+ messages in thread From: Jiri Wiesner @ 2026-06-10 10:21 UTC (permalink / raw) To: Dimitri Sivanich Cc: Thomas Gleixner, Linux Kernel Mailing List, Steve Wahl, Justin Ernst, Kyle Meyer, Russ Anderson, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra (Intel), Ilpo Järvinen, Marco Elver, Guilherme G. Piccoli, Nikunj A Dadhania, Xin Li (Intel), Dimitri Sivanich On Tue, Jun 09, 2026 at 02:34:10PM -0500, Dimitri Sivanich wrote: > On Tue, Jun 09, 2026 at 11:59:26AM +0200, Jiri Wiesner wrote: > > On Thu, May 21, 2026 at 09:08:34PM -0500, Dimitri Sivanich wrote: > > > On Thu, May 21, 2026 at 09:30:14PM +0200, Thomas Gleixner wrote: > > > > On Thu, May 21 2026 at 08:17, Dimitri Sivanich wrote: > > > > > HPE UV hardware and firmware is designed to ensure a reliable and > > > > > synchronized TSC mechanism. Comparing the TSC against secondary > > > > > clocksources can result in false positives due to variable access > > > > > latency caused by system traffic. > > > > I do not think that the access latency of the reference clocksource, sgi_rtc in this case, is the cause of the false positives. I think sgi_rtc really experiences time skew. More details below. > > FYI, there was a firmware regression that impacted sgi_rtc on our Sapphire > Rapids based UV systems, that would've caused the results that you saw. > That has since been fixed. It may not seem so but I was actually trying to help make the argument for why this little patchset disabling the clocksource watchdog on UV systems should be merged. I think still think the view that the false positives are caused by the access latency of sgi_rtc contradicts the way the new clocksource watchdog implementation works. Details are below. > > > I tested a 7.1-rc4 kernel on a 2048 thread 16 socket system and, while > > > under test, the TSC did get marked as unstable after a series of "sgi_rtc > > > read timed out" warnings. > > > > The new clocksource watchdog implementation makes sure to act on time skew only if the time between two reference clocksource readouts does not exceed 50 us. The threshold for evaluating time skew (based on SHIFT_500PPM) is 244 us for a 500 ms interval plus the measured reference clocksource readout latency. If the comparison to the reference clocksource fails on CPU 0 the time skew between the clocksource being checked and the reference clocksource must be at least 244 us. The clocksource watchdog cannot distiguish which of the clocksources is skewed, and it must make the assumption that the clocksource being checked is skewed. The code executes: scoped_guard(irq) { wd_ts0 = watchdog->read(watchdog); cs_ts = cs->read(cs); wd_ts1 = watchdog->read(watchdog); } which I imagine works in this way: * send a request to sgi_rtc * read sgi_rtc timestamp (wd_ts0) * return timestamp from sgi_rtc * read TSC timestamp on the CPU * send a request to sgi_rtc * read sgi_rtc timestamp (wd_ts1) * return timestamp from sgi_rtc The code calculates wd_seq by subtracting wd_ts0 from wd_ts1. wd_seq includes the time needed to (assuming the sgi_rtc timestamp read takes zero time): * return timestamp from sgi_rtc * read TSC timestamp on the CPU * send a request to sgi_rtc The code guarantees that wd_seq is less than 50 us, which means the latency between reading wd_ts0 and reading cs_ts must also be less than 50 us. In other words, when a clocksource fails to pass the frequency check there is provable time skew. Either the TSC got skewed, which produced most of the measured time skew, or sgi_rtc got skewed (maybe both got skewed to a considerable degree but that seems is unlikely). The case described below provides an example of an sgi_rtc clocksource experiencing time skew. If the below example was caused by a firmware bug that has since been resolved, there needs to be another bug causing the time skew of sgi_rtc, based on the reported false positive of the new clocksource watchdog implementation on a UV system. Otherwise, we will have to blame the TSC and the clocksource watchdog should not be disabled on UV systems. > > In the past, I worked on a bug where a customer with an HPE UV machine reported degraded performance and switches to the HPET. This kernel had the old clocksource watchdog implementation. I created a debugging kernel with the HPET as a second watchdog (not affecting the decisions by the watchdog) and got this result: > > > clocksource: timekeeping watchdog on CPU118: Marking clocksource 'tsc' as unstable because the skew is too large: > > > clocksource: 'sgi_rtc' wd_nsec: 511302794 wd_now: 1cb50e4c4b wd_last: 1ca7097111 mask: ffffffffffffff > > > clocksource: 'hpet' wd2_nsec: 512005960 wd2_now: 65892719 wd2_last: 64c5d684 mask: ffffffff > > > clocksource: 'tsc' cs_nsec: 512006458 cs_now: 86b5982cb1 cs_last: 867581bbab mask: ffffffffffffffff > > > clocksource: 'tsc' skewed 703664 ns (0 ms) over watchdog 'sgi_rtc' interval of 511302794 ns (511 ms) > > > clocksource: 'tsc' is current clocksource. > > > tsc: Marking TSC unstable due to clocksource watchdog > > > clocksource: Checking clocksource tsc synchronization from CPU 610 to CPUs 0-609,611-767. > > > clocksource: Switched to clocksource sgi_rtc The machine has 8 CPU sockets with Intel(R) Xeon(R) Platinum 8468H (family: 0x6, model: 0x8f, stepping: 0x8), cpu ucode 0x2b000620: [ 0.000000] DMI: HPE Compute Scale-up Server 3200/Compute Scale-up Server 3200, BIOS Bundle:1.60.88-20250613_054523 SFW:009.040.012.000.25060304 [ 0.675393] UV: Found UV500 hub [ 0.679294] UV: UVsystab: Revision:500 The test was carried out on 22nd May, 2025. > > The intervals measured by the TSC and the HPET match very well; the sgi_rtc is off. I find it hard to believe that both the TSC and the HPET would be skewed - both reporting a longer interval - while sgi_rtc was correct. I think sgi_rtc was skewed. > > > > There are several solution to work around the hardware limitation of sgi_rtc: > > 1. Disable the clocksource watchdog > > 2. Decrease the rating of sgi_rtc Solution 1 is actionable but I think we should be sure it is made for the right reasons. -- Jiri Wiesner SUSE Labs ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-06-10 10:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-21 13:17 [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Dimitri Sivanich 2026-05-21 13:20 ` [PATCH v4 1/2] x86/platform/uv: Expose the uv_hub_type() interface Dimitri Sivanich 2026-05-21 13:23 ` [PATCH v4 2/2] x86/tsc: Disable clocksource watchdog checking on recent and future UV platforms Dimitri Sivanich 2026-05-21 19:30 ` [PATCH v4 0/2] x86/tsc: Exempt recent UV systems from clocksource watchdog checks to avoid false positives Thomas Gleixner 2026-05-22 2:08 ` Dimitri Sivanich 2026-06-09 9:59 ` Jiri Wiesner 2026-06-09 19:34 ` Dimitri Sivanich 2026-06-10 10:21 ` Jiri Wiesner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox