From: Thomas Gleixner <tglx@linutronix.de>
To: Feng Tang <feng.tang@intel.com>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@intel.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Peter Zijlstra <peterz@infradead.org>,
x86@kernel.org, linux-kernel@vger.kernel.org
Cc: rui.zhang@intel.com, tim.c.chen@intel.com,
Xiongfeng Wang <wangxiongfeng2@huawei.com>,
Feng Tang <feng.tang@intel.com>, Yu Liao <liaoyu15@huawei.com>
Subject: Re: [PATCH v2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform
Date: Wed, 19 Oct 2022 11:18:43 +0200 [thread overview]
Message-ID: <87tu40p3ws.ffs@tglx> (raw)
In-Reply-To: <20221013131200.973649-1-feng.tang@intel.com>
On Thu, Oct 13 2022 at 21:12, Feng Tang wrote:
> There is report again that the tsc clocksource on a 4 sockets x86
> Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
> and disabled [1].
>
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduce to deal with these false
> alarms of tsc unstable issues, covering qualified platforms for 2
> sockets or smaller ones.
>
> Extend the exemption to 4 sockets to fix the issue.
>
> We also got similar reports on 8 sockets platform from internal test,
> but as Peter pointed out, there was tsc sync issues for 8-sockets
> platform, and it'd better be handled architecture by architecture,
> instead of directly changing the threshold to 8 here.
>
> Rui also proposed another way to disable 'jiffies' as clocksource
> watchdog [2], which can also solve this specific problem in an
> architecture independent way, with one limitation that some tsc false
> alarms are reported by other watchdogs like HPET in post-boot time,
> while 'jiffies' is mostly used in boot phase before hardware
> clocksources are initialized.
HPET is initialized early, but if HPET is disabled or not advertised
then the only other hardware clocksource is PMTIMER which is initialized
late via fs_initcall. PMTIMER is initialized late due to broken Pentium
era chipsets which are sorted with PCI quirks. For anything else we can
initialize it early. Something like the below.
I'm sure I said this more than once, but I'm happy to repeat myself
forever:
Instead of proliferating lousy hacks, can the X86 vendors finaly get
their act together and provide some architected information whether
the TSC is trustworthy or not?
Thanks,
tglx
---
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -10,6 +10,7 @@
*
*/
+#include <linux/acpi_pmtmr.h>
#include <linux/clocksource.h>
#include <linux/clockchips.h>
#include <linux/interrupt.h>
@@ -75,6 +76,14 @@ static void __init setup_default_timer_i
void __init hpet_time_init(void)
{
if (!hpet_enable()) {
+ /*
+ * Some Pentium chipsets have broken HPETs and need
+ * PCI quirks to run before init.
+ */
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
+ boot_cpu_data.family != 5)
+ init_acpi_pm_clocksource();
+
if (!pit_timer_init())
return;
}
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -30,6 +30,7 @@
* in arch/i386/kernel/acpi/boot.c
*/
u32 pmtmr_ioport __read_mostly;
+static bool pmtmr_initialized __init_data;
static inline u32 read_pmtmr(void)
{
@@ -142,7 +143,7 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SE
* Some boards have the PMTMR running way too fast. We check
* the PMTMR rate against PIT channel 2 to catch these cases.
*/
-static int verify_pmtmr_rate(void)
+static int __init verify_pmtmr_rate(void)
{
u64 value1, value2;
unsigned long count, delta;
@@ -172,14 +173,18 @@ static int verify_pmtmr_rate(void)
/* Number of reads we try to get two different values */
#define ACPI_PM_READ_CHECKS 10000
-static int __init init_acpi_pm_clocksource(void)
+int __init init_acpi_pm_clocksource(void)
{
u64 value1, value2;
unsigned int i, j = 0;
+ int ret;
if (!pmtmr_ioport)
return -ENODEV;
+ if (pmtmr_initialized)
+ return 0;
+
/* "verify" this timing source: */
for (j = 0; j < ACPI_PM_MONOTONICITY_CHECKS; j++) {
udelay(100 * j);
@@ -210,10 +215,11 @@ static int __init init_acpi_pm_clocksour
return -ENODEV;
}
- return clocksource_register_hz(&clocksource_acpi_pm,
- PMTMR_TICKS_PER_SEC);
+ ret = clocksource_register_hz(&clocksource_acpi_pm, PMTMR_TICKS_PER_SEC);
+ if (!ret)
+ pmtimer_initialized = true;
+ return ret;
}
-
/* We use fs_initcall because we want the PCI fixups to have run
* but we still need to load before device_initcall
*/
--- a/include/linux/acpi_pmtmr.h
+++ b/include/linux/acpi_pmtmr.h
@@ -13,6 +13,8 @@
/* Overrun value */
#define ACPI_PM_OVRRUN (1<<24)
+extern int __init init_acpi_pm_clocksource(void);
+
#ifdef CONFIG_X86_PM_TIMER
extern u32 acpi_pm_read_verified(void);
next prev parent reply other threads:[~2022-10-19 11:06 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-13 13:12 [PATCH v2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Feng Tang
2022-10-13 16:02 ` Dave Hansen
2022-10-14 0:37 ` Feng Tang
2022-10-14 1:14 ` Feng Tang
2022-10-19 9:18 ` Thomas Gleixner [this message]
2022-10-20 4:44 ` Feng Tang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tu40p3ws.ffs@tglx \
--to=tglx@linutronix.de \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=feng.tang@intel.com \
--cc=hpa@zytor.com \
--cc=liaoyu15@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rui.zhang@intel.com \
--cc=tim.c.chen@intel.com \
--cc=wangxiongfeng2@huawei.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.