* Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL
@ 2017-11-28 2:08 Zhang Rui
2017-11-28 8:14 ` Peter Zijlstra
0 siblings, 1 reply; 20+ messages in thread
From: Zhang Rui @ 2017-11-28 2:08 UTC (permalink / raw)
To: LKML, linux-x86, peterz, tglx; +Cc: Len Brown, Chen, Yu C
Hi, All,
My Surface Pro 4 is unable to boot after 4.12. The symptom is that
kernel freezes during boot, and the last message in the screen is
loading the initrd image. And I have bisected it to this commit
commit bd9240a18edfbfa72e957fc2ba831cf1f13ea073 (refs/bisect/bad)
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed May 31 17:52:03 2017 +0200
Commit: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun Jun 4 21:55:53 2017 +0200
x86/apic: Add TSC_DEADLINE quirk due to errata
Due to errata it is possible for the TSC_DEADLINE timer to
misbehave
after using TSC_ADJUST. A microcode update is available to fix this
situation.
Avoid using the TSC_DEADLINE timer if it is affected by this issue
and
report the required microcode version.
[ tglx: Renamed function to apic_check_deadline_errata() ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: kevin.b.stanton@intel.com
Link: http://lkml.kernel.org/r/20170531155306.050849877@infradead.o
rg
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently, I'm using v4.14 kernel with the following workaround on top.
---
arch/x86/kernel/apic/apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ff89177..cd419d4 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -596,7 +596,7 @@ static const struct x86_cpu_id deadline_match[] = {
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_CORE, 0x25),
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_BROADWELL_GT3E, 0x17),
- DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, 0xb2),
+// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, 0xb2),
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_DESKTOP, 0xb2),
DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_KABYLAKE_MOBILE, 0x52),
--
2.7.4
I'm reading the related code but have not figured out the root cause yet.
Anyone suggestions?
thanks,
rui
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 2:08 Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL Zhang Rui @ 2017-11-28 8:14 ` Peter Zijlstra 2017-11-28 8:22 ` Zhang Rui 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-11-28 8:14 UTC (permalink / raw) To: Zhang Rui; +Cc: LKML, linux-x86, tglx, Len Brown, Chen, Yu C On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote: > Hi, All, > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that > kernel freezes during boot, and the last message in the screen is > loading the initrd image. And I have bisected it to this commit > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, 0xb2), And what microcode version do you run? Have you installed the latest microcode package and updated your initrd to include it? My skylake is running 0xba. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 8:14 ` Peter Zijlstra @ 2017-11-28 8:22 ` Zhang Rui 2017-11-28 9:34 ` Thomas Gleixner 2017-11-28 9:35 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Zhang Rui @ 2017-11-28 8:22 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, linux-x86, tglx, Len Brown, Chen, Yu C On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote: > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote: > > > > Hi, All, > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that > > kernel freezes during boot, and the last message in the screen is > > loading the initrd image. And I have bisected it to this commit > > > > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, > > 0xb2), > And what microcode version do you run? Have you installed the latest > microcode package and updated your initrd to include it? My skylake > is > running 0xba. No, I didn't upgrade my microcode. $ cat /proc/cpuinfo ... processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz stepping : 3 microcode : 0x9e ... I suppose the problem should be gone if I upgrade the microcode. But the real problem to me is that the system FREEZES after kernel upgrade. thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 8:22 ` Zhang Rui @ 2017-11-28 9:34 ` Thomas Gleixner 2017-11-28 10:59 ` Zhang Rui 2017-11-28 9:35 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Thomas Gleixner @ 2017-11-28 9:34 UTC (permalink / raw) To: Zhang Rui; +Cc: Peter Zijlstra, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 28 Nov 2017, Zhang Rui wrote: > On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote: > > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote: > > > > > > Hi, All, > > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that > > > kernel freezes during boot, and the last message in the screen is > > > loading the initrd image. And I have bisected it to this commit > > > > > > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, > > > 0xb2), > > And what microcode version do you run? Have you installed the latest > > microcode package and updated your initrd to include it? My skylake > > is > > running 0xba. > > No, I didn't upgrade my microcode. > > $ cat /proc/cpuinfo > ... > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 78 > model name : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz > stepping : 3 > microcode : 0x9e > ... > > I suppose the problem should be gone if I upgrade the microcode. > But the real problem to me is that the system FREEZES after kernel > upgrade. Confused. So the match disables the deadline timer due to 0x9e < 0xb2. And not having deadline timer makes the boot fail despite the fact that deadline timer is borked with microcode < 0xb2. Can you verify by adding 'lapic=notscdeadline' to the command line of a 'working' kernel? That should cause a freeze as well then. Thanks, tglx ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 9:34 ` Thomas Gleixner @ 2017-11-28 10:59 ` Zhang Rui 2017-11-28 12:36 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Zhang Rui @ 2017-11-28 10:59 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Peter Zijlstra, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-11-28 at 10:34 +0100, Thomas Gleixner wrote: > On Tue, 28 Nov 2017, Zhang Rui wrote: > > > > > On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote: > > > > > > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote: > > > > > > > > > > > > Hi, All, > > > > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is > > > > that > > > > kernel freezes during boot, and the last message in the screen > > > > is > > > > loading the initrd image. And I have bisected it to this commit > > > > > > > > +// DEADLINE_MODEL_MATCH_REV ( > > > > INTEL_FAM6_SKYLAKE_MOBILE, > > > > 0xb2), > > > And what microcode version do you run? Have you installed the > > > latest > > > microcode package and updated your initrd to include it? My > > > skylake > > > is > > > running 0xba. > > No, I didn't upgrade my microcode. > > > > $ cat /proc/cpuinfo > > ... > > processor : 0 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 78 > > model name : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz > > stepping : 3 > > microcode : 0x9e > > ... > > > > I suppose the problem should be gone if I upgrade the microcode. > > But the real problem to me is that the system FREEZES after kernel > > upgrade. > Confused. So the match disables the deadline timer due to 0x9e < > 0xb2. And > not having deadline timer makes the boot fail despite the fact that > deadline timer is borked with microcode < 0xb2. > > Can you verify by adding 'lapic=notscdeadline' to the command line of > a > 'working' kernel? That should cause a freeze as well then. > yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes with boot option "notscdeadline"/"lapic=notscdeadline". thanks, rui > Thanks, > > tglx ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 10:59 ` Zhang Rui @ 2017-11-28 12:36 ` Peter Zijlstra 2017-11-29 14:44 ` Zhang Rui 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-11-28 12:36 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote: > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is > yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always freezes > with boot option "notscdeadline"/"lapic=notscdeadline". Then for some mysterious reason, your Surface thing has a borked LAPIC. No idea how or why.. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 12:36 ` Peter Zijlstra @ 2017-11-29 14:44 ` Zhang Rui 2017-12-18 20:28 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Zhang Rui @ 2017-11-29 14:44 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-11-28 at 13:36 +0100, Peter Zijlstra wrote: > On Tue, Nov 28, 2017 at 06:59:01PM +0800, Zhang Rui wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom > > > > > > is > > > > yes. Tried 4.4 distro and 4.12 vanilla kernel, kernel always > > freezes > > with boot option "notscdeadline"/"lapic=notscdeadline". > Then for some mysterious reason, your Surface thing has a borked > LAPIC. > I agree. But however, on this platform, forcing a cpu microcode upgrade after upgrading kernel does not seem like a good solution to users. Thus I still like to dig into the problem further to see if we can workaround this "regression" in software. As the kernel freezes at a very early stage, do you have any advice on how to narrow down the problem? thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-29 14:44 ` Zhang Rui @ 2017-12-18 20:28 ` Peter Zijlstra 2017-12-19 10:48 ` Zhang Rui 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-12-18 20:28 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C Hi, can you see if this makes you Surface boot? I tested it on my IVB by making has_legacy_pic() return unconditional true. [ 0.024000] tsc: Unable to calibrate against PIT [ 0.025000] tsc: using HPET reference calibration [ 0.026000] tsc: Detected 2792.451 MHz processor --- diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h index c8376b40e882..e2cfc4b52ee4 100644 --- a/arch/x86/include/asm/i8259.h +++ b/arch/x86/include/asm/i8259.h @@ -69,6 +69,11 @@ struct legacy_pic { extern struct legacy_pic *legacy_pic; extern struct legacy_pic null_legacy_pic; +static inline bool has_legacy_pic(void) +{ + return legacy_pic == &null_legacy_pic; +} + static inline int nr_legacy_irqs(void) { return legacy_pic->nr_legacy_irqs; diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 8ea117f8142e..2afc623b2280 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -25,6 +25,7 @@ #include <asm/geode.h> #include <asm/apic.h> #include <asm/intel-family.h> +#include <asm/i8259.h> unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); @@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32 latch, unsigned long ms, int loopmin) unsigned long tscmin, tscmax; int pitcnt; + if (!has_legacy_pic()) { + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + udelay(10 * USEC_PER_MSEC); + return ULONG_MAX; + } + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); @@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void) u64 tsc, delta; unsigned long d1, d2; + if (!has_legacy_pic()) + return 0; + /* Set the Gate high, disable speaker */ outb((inb(0x61) & ~0x02) | 0x01, 0x61); ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-18 20:28 ` Peter Zijlstra @ 2017-12-19 10:48 ` Zhang Rui 2017-12-19 13:15 ` Peter Zijlstra 2017-12-19 15:23 ` Peter Zijlstra 0 siblings, 2 replies; 20+ messages in thread From: Zhang Rui @ 2017-12-19 10:48 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > Hi, can you see if this makes you Surface boot? > No, it does not boot. > I tested it on my IVB by making has_legacy_pic() return unconditional > true. > > [ 0.024000] tsc: Unable to calibrate against PIT > [ 0.025000] tsc: using HPET reference calibration > [ 0.026000] tsc: Detected 2792.451 MHz processor > > --- > > > diff --git a/arch/x86/include/asm/i8259.h > b/arch/x86/include/asm/i8259.h > index c8376b40e882..e2cfc4b52ee4 100644 > --- a/arch/x86/include/asm/i8259.h > +++ b/arch/x86/include/asm/i8259.h > @@ -69,6 +69,11 @@ struct legacy_pic { > extern struct legacy_pic *legacy_pic; > extern struct legacy_pic null_legacy_pic; > > +static inline bool has_legacy_pic(void) > +{ > + return legacy_pic == &null_legacy_pic; > +} > + shouldn't this be return legacy_pic == &default_legacy_pic; ? thanks, rui > static inline int nr_legacy_irqs(void) > { > return legacy_pic->nr_legacy_irqs; > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index 8ea117f8142e..2afc623b2280 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -25,6 +25,7 @@ > #include <asm/geode.h> > #include <asm/apic.h> > #include <asm/intel-family.h> > +#include <asm/i8259.h> > > unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not > used here */ > EXPORT_SYMBOL(cpu_khz); > @@ -363,6 +364,15 @@ static unsigned long pit_calibrate_tsc(u32 > latch, unsigned long ms, int loopmin) > unsigned long tscmin, tscmax; > int pitcnt; > > + if (!has_legacy_pic()) { > + udelay(10 * USEC_PER_MSEC); > + udelay(10 * USEC_PER_MSEC); > + udelay(10 * USEC_PER_MSEC); > + udelay(10 * USEC_PER_MSEC); > + udelay(10 * USEC_PER_MSEC); > + return ULONG_MAX; > + } > + > /* Set the Gate high, disable speaker */ > outb((inb(0x61) & ~0x02) | 0x01, 0x61); > > @@ -487,6 +497,9 @@ static unsigned long quick_pit_calibrate(void) > u64 tsc, delta; > unsigned long d1, d2; > > + if (!has_legacy_pic()) > + return 0; > + > /* Set the Gate high, disable speaker */ > outb((inb(0x61) & ~0x02) | 0x01, 0x61); > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 10:48 ` Zhang Rui @ 2017-12-19 13:15 ` Peter Zijlstra 2017-12-19 15:23 ` Zhang Rui 2017-12-19 15:23 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-12-19 13:15 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > Hi, can you see if this makes you Surface boot? > > > No, it does not boot. Bah, staring at the lapic calibrate now, that is a bit of a mess.. > > I tested it on my IVB by making has_legacy_pic() return unconditional > > true. > > > > [ 0.024000] tsc: Unable to calibrate against PIT > > [ 0.025000] tsc: using HPET reference calibration > > [ 0.026000] tsc: Detected 2792.451 MHz processor > > > > --- > > > > > > diff --git a/arch/x86/include/asm/i8259.h > > b/arch/x86/include/asm/i8259.h > > index c8376b40e882..e2cfc4b52ee4 100644 > > --- a/arch/x86/include/asm/i8259.h > > +++ b/arch/x86/include/asm/i8259.h > > @@ -69,6 +69,11 @@ struct legacy_pic { > > extern struct legacy_pic *legacy_pic; > > extern struct legacy_pic null_legacy_pic; > > > > +static inline bool has_legacy_pic(void) > > +{ > > + return legacy_pic == &null_legacy_pic; > > +} > > + > shouldn't this be > return legacy_pic == &default_legacy_pic; > ? != &null, but yes, I mess that up. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 13:15 ` Peter Zijlstra @ 2017-12-19 15:23 ` Zhang Rui 0 siblings, 0 replies; 20+ messages in thread From: Zhang Rui @ 2017-12-19 15:23 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-12-19 at 14:15 +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > > > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > > > > > Hi, can you see if this makes you Surface boot? > > > > > No, it does not boot. > Bah, staring at the lapic calibrate now, that is a bit of a mess.. > > > > > > > > > I tested it on my IVB by making has_legacy_pic() return > > > unconditional > > > true. > > > > > > [ 0.024000] tsc: Unable to calibrate against PIT > > > [ 0.025000] tsc: using HPET reference calibration > > > [ 0.026000] tsc: Detected 2792.451 MHz processor > > > > > > --- > > > > > > > > > diff --git a/arch/x86/include/asm/i8259.h > > > b/arch/x86/include/asm/i8259.h > > > index c8376b40e882..e2cfc4b52ee4 100644 > > > --- a/arch/x86/include/asm/i8259.h > > > +++ b/arch/x86/include/asm/i8259.h > > > @@ -69,6 +69,11 @@ struct legacy_pic { > > > extern struct legacy_pic *legacy_pic; > > > extern struct legacy_pic null_legacy_pic; > > > > > > +static inline bool has_legacy_pic(void) > > > +{ > > > + return legacy_pic == &null_legacy_pic; > > > +} > > > + > > shouldn't this be > > return legacy_pic == &default_legacy_pic; > > ? > != &null, but yes, I mess that up. I see. I have changed that and the platform can not boot neither. thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 10:48 ` Zhang Rui 2017-12-19 13:15 ` Peter Zijlstra @ 2017-12-19 15:23 ` Peter Zijlstra 2017-12-19 15:31 ` Peter Zijlstra 2017-12-19 15:42 ` Zhang Rui 1 sibling, 2 replies; 20+ messages in thread From: Peter Zijlstra @ 2017-12-19 15:23 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > Hi, can you see if this makes you Surface boot? > > > No, it does not boot. So I'm confused on the lapic calibration. That stuff uses global_clock_event, which is initially the i8253 (PIT), but because !PIC this thing won't be there either on your platform. Then we initialize I/O APIC, and your machine has: [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ9 used by override. So your ACPI table has an override for IRQ2 and routes it to IRQ0. Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(), which sets the LegacyRouting bit. The HPET document says: If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the interrupts will be routed as follows: Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC Timer 2-n will be routed as per the routing in the timer n config registers. If the LegacyReplacement Route bit is set, the individual routing bits for timers 0 and 1 (APIC or FSB) will have no impact. And then we set global_clock_event to &hpet_clockevent. At this point that _SHOULD_ work afaict, even without actual PIC present. Sometime after that we call into calibrate_APIC_clock() -- because !TSC_DEADLINE -- and this is where you get stuck, because global_clock_event is not in fact delivering interrupts. Thomas may have more clue, we'll have to wait for him to have a time-slot available. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 15:23 ` Peter Zijlstra @ 2017-12-19 15:31 ` Peter Zijlstra 2017-12-19 15:43 ` Zhang Rui 2017-12-19 15:42 ` Zhang Rui 1 sibling, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-12-19 15:31 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > > Hi, can you see if this makes you Surface boot? > > > > > No, it does not boot. > > So I'm confused on the lapic calibration. > > That stuff uses global_clock_event, which is initially the i8253 (PIT), > but because !PIC this thing won't be there either on your platform. > > Then we initialize I/O APIC, and your machine has: > > [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119 > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > [ 0.000000] ACPI: IRQ0 used by override. > [ 0.000000] ACPI: IRQ9 used by override. > > So your ACPI table has an override for IRQ2 and routes it to IRQ0. > > Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(), > which sets the LegacyRouting bit. The HPET document says: > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the > interrupts will be routed as follows: > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC > Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O APIC > Timer 2-n will be routed as per the routing in the timer n config registers. > > If the LegacyReplacement Route bit is set, the individual routing bits > for timers 0 and 1 (APIC or FSB) will have no impact. > > And then we set global_clock_event to &hpet_clockevent. > > At this point that _SHOULD_ work afaict, even without actual PIC > present. > > Sometime after that we call into calibrate_APIC_clock() -- because > !TSC_DEADLINE -- and this is where you get stuck, because > global_clock_event is not in fact delivering interrupts. > > Thomas may have more clue, we'll have to wait for him to have a > time-slot available. What does your /proc/interrupts look like (on a tsc-deadline boot) ? 0: is the HPET, LOC: is the lapic/tsc-deadline On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes over. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 15:31 ` Peter Zijlstra @ 2017-12-19 15:43 ` Zhang Rui 0 siblings, 0 replies; 20+ messages in thread From: Zhang Rui @ 2017-12-19 15:43 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-12-19 at 16:31 +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 04:23:07PM +0100, Peter Zijlstra wrote: > > > > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > > > > > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > > > > > > > Hi, can you see if this makes you Surface boot? > > > > > > > No, it does not boot. > > So I'm confused on the lapic calibration. > > > > That stuff uses global_clock_event, which is initially the i8253 > > (PIT), > > but because !PIC this thing won't be there either on your platform. > > > > Then we initialize I/O APIC, and your machine has: > > > > [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address > > 0xfec00000, GSI 0-119 > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl > > dfl) > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high > > level) > > [ 0.000000] ACPI: IRQ0 used by override. > > [ 0.000000] ACPI: IRQ9 used by override. > > > > So your ACPI table has an override for IRQ2 and routes it to IRQ0. > > > > Then we initialize HPET, and we _always_ do > > hpet_enable_legacy_int(), > > which sets the LegacyRouting bit. The HPET document says: > > > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then > > the > > interrupts will be routed as follows: > > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O > > APIC > > Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O > > APIC > > Timer 2-n will be routed as per the routing in the timer n > > config registers. > > > > If the LegacyReplacement Route bit is set, the individual routing > > bits > > for timers 0 and 1 (APIC or FSB) will have no impact. > > > > And then we set global_clock_event to &hpet_clockevent. > > > > At this point that _SHOULD_ work afaict, even without actual PIC > > present. > > > > Sometime after that we call into calibrate_APIC_clock() -- because > > !TSC_DEADLINE -- and this is where you get stuck, because > > global_clock_event is not in fact delivering interrupts. > > > > Thomas may have more clue, we'll have to wait for him to have a > > time-slot available. > What does your /proc/interrupts look like (on a tsc-deadline boot) ? > > 0: is the HPET, LOC: is the lapic/tsc-deadline > > On my SKL desktop I get 21 PIT/HPET ticks on CPU0 before lapic takes > over. No irq0. LOC: 220071 210079 184892 176494 Local timer interrupts thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 15:23 ` Peter Zijlstra 2017-12-19 15:31 ` Peter Zijlstra @ 2017-12-19 15:42 ` Zhang Rui 2017-12-19 16:01 ` Peter Zijlstra 1 sibling, 1 reply; 20+ messages in thread From: Zhang Rui @ 2017-12-19 15:42 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 06:48:24PM +0800, Zhang Rui wrote: > > > > On Mon, 2017-12-18 at 21:28 +0100, Peter Zijlstra wrote: > > > > > > Hi, can you see if this makes you Surface boot? > > > > > No, it does not boot. > So I'm confused on the lapic calibration. > > That stuff uses global_clock_event, which is initially the i8253 > (PIT), > but because !PIC this thing won't be there either on your platform. > > Then we initialize I/O APIC, and your machine has: > > [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, > GSI 0-119 > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl > dfl) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high > level) > [ 0.000000] ACPI: IRQ0 used by override. > [ 0.000000] ACPI: IRQ9 used by override. > > So your ACPI table has an override for IRQ2 and routes it to IRQ0. > > Then we initialize HPET, and we _always_ do hpet_enable_legacy_int(), > which sets the LegacyRouting bit. Right. > The HPET document says: > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the > interrupts will be routed as follows: > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O > APIC But AFAICS, the HPET emulated timer interrupts goes to IRQ0 on all the machines I have tested, but on this MS Surface Pro 4, there is no irq 0 row in /proc/interrupts. $ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 8: 0 1 0 0 IR-IO-APIC 8- edge rtc0 9: 476 144 4573 132 IR-IO-APIC 9- fasteoi acpi 14: 0 0 0 0 IR-IO-APIC 14- edge INT344B:00 16: 646 4971 63574 3973 IR-IO-APIC 16- fasteoi idma64.0, MRVL_PCIE, i2c_designware.0 17: 0 0 0 0 IR-IO-APIC 17- fasteoi idma64.1, i2c_designware.1 18: 0 0 0 0 IR-IO-APIC 18- fasteoi idma64.2, i2c_designware.2 19: 0 0 0 0 IR-IO-APIC 19- fasteoi idma64.3, i2c_designware.3 Maybe I need to check if IRQ0 is overridden on other platforms, if no, registering to IRQ2 for HPET Timer 0 could help in this case? > Timer 1 will be routed to IRQ8 in Non-APIC or IRQ8 in the I/O > APIC > Timer 2-n will be routed as per the routing in the timer n config > registers. > > If the LegacyReplacement Route bit is set, the individual routing > bits > for timers 0 and 1 (APIC or FSB) will have no impact. > > And then we set global_clock_event to &hpet_clockevent. Yes, I can confirm this. > > At this point that _SHOULD_ work afaict, even without actual PIC > present. so IOAPIC is ready when we calibrating Lapic timer? > > Sometime after that we call into calibrate_APIC_clock() -- because > !TSC_DEADLINE -- and this is where you get stuck, because > global_clock_event is not in fact delivering interrupts. right. > > Thomas may have more clue, we'll have to wait for him to have a > time-slot available. okay. thanks, rui ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 15:42 ` Zhang Rui @ 2017-12-19 16:01 ` Peter Zijlstra 2017-12-19 17:23 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-12-19 16:01 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote: > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote: > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > > [ 0.000000] ACPI: IRQ0 used by override. > > > > So your ACPI table has an override for IRQ2 and routes it to IRQ0. ^ this > > The HPET document says: > > > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the > > interrupts will be routed as follows: > > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC > > But AFAICS, the HPET emulated timer interrupts goes to IRQ0 Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or it _should_. Clearly something is messed up here.. but I've no idea what. That whole IRQ routing stuff is confusing. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 16:01 ` Peter Zijlstra @ 2017-12-19 17:23 ` Peter Zijlstra 2017-12-20 14:08 ` Zhang Rui 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2017-12-19 17:23 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote: > > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote: > > > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > > > [ 0.000000] ACPI: IRQ0 used by override. > > > > > > So your ACPI table has an override for IRQ2 and routes it to IRQ0. > > ^ this > > > > The HPET document says: > > > > > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, then the > > > interrupts will be routed as follows: > > > > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the I/O APIC > > > > But AFAICS, the HPET emulated timer interrupts goes to IRQ0 > > Right, so see that ACPI override, that routes I/O APIC IRQ2 to IRQ0, or > it _should_. > > Clearly something is messed up here.. but I've no idea what. That whole > IRQ routing stuff is confusing. Does this help? diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c index 749d189f8cd4..45675072771c 100644 --- a/arch/x86/kernel/time.c +++ b/arch/x86/kernel/time.c @@ -69,8 +69,6 @@ static struct irqaction irq0 = { static void __init setup_default_timer_irq(void) { - if (!nr_legacy_irqs()) - return; setup_irq(0, &irq0); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-19 17:23 ` Peter Zijlstra @ 2017-12-20 14:08 ` Zhang Rui 2017-12-20 14:48 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Zhang Rui @ 2017-12-20 14:08 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Tue, 2017-12-19 at 18:23 +0100, Peter Zijlstra wrote: > On Tue, Dec 19, 2017 at 05:01:55PM +0100, Peter Zijlstra wrote: > > > > On Tue, Dec 19, 2017 at 11:42:41PM +0800, Zhang Rui wrote: > > > > > > On Tue, 2017-12-19 at 16:23 +0100, Peter Zijlstra wrote: > > > > > > > > > > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 > > > > dfl dfl) > > > > [ 0.000000] ACPI: IRQ0 used by override. > > > > > > > > So your ACPI table has an override for IRQ2 and routes it to > > > > IRQ0. > > ^ this > > > > > > > > > > > > > The HPET document says: > > > > > > > > If the ENABLE_CNF bit and the LEG_RT_CNF bit are both set, > > > > then the > > > > interrupts will be routed as follows: > > > > > > > > Timer 0 will be routed to IRQ0 in Non-APIC or IRQ2 in the > > > > I/O APIC > > > But AFAICS, the HPET emulated timer interrupts goes to IRQ0 > > Right, so see that ACPI override, that routes I/O APIC IRQ2 to > > IRQ0, or > > it _should_. > > > > Clearly something is messed up here.. but I've no idea what. That > > whole > > IRQ routing stuff is confusing. > Does this help? > No. thanks, rui > diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c > index 749d189f8cd4..45675072771c 100644 > --- a/arch/x86/kernel/time.c > +++ b/arch/x86/kernel/time.c > @@ -69,8 +69,6 @@ static struct irqaction irq0 = { > > static void __init setup_default_timer_irq(void) > { > - if (!nr_legacy_irqs()) > - return; > setup_irq(0, &irq0); > } > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-12-20 14:08 ` Zhang Rui @ 2017-12-20 14:48 ` Peter Zijlstra 0 siblings, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2017-12-20 14:48 UTC (permalink / raw) To: Zhang Rui; +Cc: Thomas Gleixner, LKML, linux-x86, Len Brown, Chen, Yu C On Wed, Dec 20, 2017 at 10:08:21PM +0800, Zhang Rui wrote: > > Does this help? > > > No. Bah!, does this at least get you a IRQ0 line in /proc/interrupts? > > diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c > > index 749d189f8cd4..45675072771c 100644 > > --- a/arch/x86/kernel/time.c > > +++ b/arch/x86/kernel/time.c > > @@ -69,8 +69,6 @@ static struct irqaction irq0 = { > > > > static void __init setup_default_timer_irq(void) > > { > > - if (!nr_legacy_irqs()) > > - return; > > setup_irq(0, &irq0); > > } > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL 2017-11-28 8:22 ` Zhang Rui 2017-11-28 9:34 ` Thomas Gleixner @ 2017-11-28 9:35 ` Peter Zijlstra 1 sibling, 0 replies; 20+ messages in thread From: Peter Zijlstra @ 2017-11-28 9:35 UTC (permalink / raw) To: Zhang Rui; +Cc: LKML, linux-x86, tglx, Len Brown, Chen, Yu C On Tue, Nov 28, 2017 at 04:22:15PM +0800, Zhang Rui wrote: > On Tue, 2017-11-28 at 09:14 +0100, Peter Zijlstra wrote: > > On Tue, Nov 28, 2017 at 10:08:53AM +0800, Zhang Rui wrote: > > > > > > Hi, All, > > > > > > My Surface Pro 4 is unable to boot after 4.12. The symptom is that > > > kernel freezes during boot, and the last message in the screen is > > > loading the initrd image. And I have bisected it to this commit > > > > > > +// DEADLINE_MODEL_MATCH_REV ( INTEL_FAM6_SKYLAKE_MOBILE, > > > 0xb2), > > And what microcode version do you run? Have you installed the latest > > microcode package and updated your initrd to include it? My skylake > > is > > running 0xba. > > No, I didn't upgrade my microcode. > > $ cat /proc/cpuinfo > ... > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 78 > model name : Intel(R) Core(TM) i7-6650U CPU @ 2.20GHz > stepping : 3 > microcode : 0x9e > ... > > I suppose the problem should be gone if I upgrade the microcode. > But the real problem to me is that the system FREEZES after kernel > upgrade. Yes, that is weird and unexpected. The patch basically kills TSC_DEADLINE usage on older microcode. This makes the kernel then use the old LAPIC timer mode. Does your machine boot with "notscdeadline" ? Which _should_ be the same. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-12-20 14:48 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-28 2:08 Regression: unable to boot after commit bd9240a18edf ("x86/apic: Add TSC_DEADLINE quirk due to errata") - Surface Pro 4 SKL Zhang Rui
2017-11-28 8:14 ` Peter Zijlstra
2017-11-28 8:22 ` Zhang Rui
2017-11-28 9:34 ` Thomas Gleixner
2017-11-28 10:59 ` Zhang Rui
2017-11-28 12:36 ` Peter Zijlstra
2017-11-29 14:44 ` Zhang Rui
2017-12-18 20:28 ` Peter Zijlstra
2017-12-19 10:48 ` Zhang Rui
2017-12-19 13:15 ` Peter Zijlstra
2017-12-19 15:23 ` Zhang Rui
2017-12-19 15:23 ` Peter Zijlstra
2017-12-19 15:31 ` Peter Zijlstra
2017-12-19 15:43 ` Zhang Rui
2017-12-19 15:42 ` Zhang Rui
2017-12-19 16:01 ` Peter Zijlstra
2017-12-19 17:23 ` Peter Zijlstra
2017-12-20 14:08 ` Zhang Rui
2017-12-20 14:48 ` Peter Zijlstra
2017-11-28 9:35 ` Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.