* [PATCH] x86: serialize LVTT and TSC_DEADLINE write @ 2015-07-31 22:11 Shaohua Li 2015-08-01 10:10 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Shaohua Li @ 2015-07-31 22:11 UTC (permalink / raw) To: x86, linux-kernel Cc: Kernel-team, Suresh Siddha, Thomas Gleixner, H. Peter Anvin, Ingo Molnar, stable, v3.7+ We saw a strange issue with local APIC timer. Some random CPU doesn't receive any local APIC timer interrupt, which causes different issues. The cpu uses TSC-Deadline mode for local APIC timer and APIC is in xAPIC mode. When this happens, manually writing TSC_DEADLINE MSR can trigger interrupt again and the system goes normal. Currently we only see this issue in E5-2660 v2 and E5-2680 v2 CPU. Compiler version seems mattering too, it's quite easy to reproduce the issue with v4.7 gcc. Since the local APIC timer interrupt number is 0, we either lose the first interrupt or TSC_DEADLINE MSR isn't set correctly. After some debugging, we believe it's the serialize issue described in Intel SDM. In xAPIC mode, write to APIC LVTT and write to TSC_DEADLINE isn't serialized. Debug shows read TSC_DEADLINE MSR followed the very first MSR write returns 0 in the buggy cpu. The patch uses the algorithm Intel SDM described. The issue only happens in xAPIC mode, but it doesn't bother to check the APIC mode I guess. Without this patch, we see the issue after ~5 reboots. With it, we don't see it after 24hr reboot test. Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org v3.7+ Signed-off-by: Shaohua Li <shli@fb.com> --- arch/x86/kernel/apic/apic.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index dcb5285..b7890b3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { + u64 msr; + + /* + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. + * This uses the algorithm described in Intel SDM to serialize + * the two writes + * */ + while (1) { + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); + if (msr) + break; + } + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); + printk_once(KERN_DEBUG "TSC deadline timer enabled\n"); return; } -- 1.8.5.6 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-07-31 22:11 [PATCH] x86: serialize LVTT and TSC_DEADLINE write Shaohua Li @ 2015-08-01 10:10 ` Thomas Gleixner 2015-08-02 15:49 ` Shaohua Li 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2015-08-01 10:10 UTC (permalink / raw) To: Shaohua Li Cc: x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, Ingo Molnar, stable, v3.7+ On Fri, 31 Jul 2015, Shaohua Li wrote: > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > apic_write(APIC_LVTT, lvtt_value); > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > + u64 msr; > + > + /* > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > + * This uses the algorithm described in Intel SDM to serialize > + * the two writes > + * */ > + while (1) { > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > + if (msr) > + break; > + } > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); I think this is exceptionally silly. A proper fence after the apic_write() should have the same effect. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-01 10:10 ` Thomas Gleixner @ 2015-08-02 15:49 ` Shaohua Li 2015-08-02 19:41 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Shaohua Li @ 2015-08-02 15:49 UTC (permalink / raw) To: Thomas Gleixner Cc: x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, Ingo Molnar, stable On Sat, Aug 01, 2015 at 12:10:41PM +0200, Thomas Gleixner wrote: > On Fri, 31 Jul 2015, Shaohua Li wrote: > > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > > apic_write(APIC_LVTT, lvtt_value); > > > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > > + u64 msr; > > + > > + /* > > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > > + * This uses the algorithm described in Intel SDM to serialize > > + * the two writes > > + * */ > > + while (1) { > > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > > + if (msr) > > + break; > > + } > > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); > > > I think this is exceptionally silly. A proper fence after the > apic_write() should have the same effect. Not sure what happens in the hardware, I could have a try of fence, but I'd prefer using the algorithm Intel described. This is not a fast path, the loop will exit immediately regardless the issue occurs anyway. Thanks, Shaohua ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-02 15:49 ` Shaohua Li @ 2015-08-02 19:41 ` Thomas Gleixner 2015-08-03 23:58 ` Shaohua Li 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2015-08-02 19:41 UTC (permalink / raw) To: Shaohua Li Cc: x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, Ingo Molnar, stable [-- Attachment #1: Type: TEXT/PLAIN, Size: 1920 bytes --] On Sun, 2 Aug 2015, Shaohua Li wrote: > On Sat, Aug 01, 2015 at 12:10:41PM +0200, Thomas Gleixner wrote: > > On Fri, 31 Jul 2015, Shaohua Li wrote: > > > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > > > apic_write(APIC_LVTT, lvtt_value); > > > > > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > > > + u64 msr; > > > + > > > + /* > > > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > > > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > > > + * This uses the algorithm described in Intel SDM to serialize > > > + * the two writes > > > + * */ > > > + while (1) { > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > > > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > > > + if (msr) > > > + break; > > > + } > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); > > > > > > I think this is exceptionally silly. A proper fence after the > > apic_write() should have the same effect. > > Not sure what happens in the hardware, I could have a try of fence, but > I'd prefer using the algorithm Intel described. This is not a fast path, s/algorithm/voodoo/ > the loop will exit immediately regardless the issue occurs anyway. Well, the SDM also says: "To allow for efficient access to the APIC registers in x2APIC mode, the serializing semantics of WRMSR are relaxed when writing to the APIC registers. Thus, system software should not use “WRMSR to APIC registers in x2APIC mode” as a serializing instruction. Read and write accesses to the APIC registers will occur in program order. A WRMSR to an APIC register may complete before all preceding stores are globally visible; software can prevent this by inserting a serializing instruction, an SFENCE, or an MFENCE before the WRMSR." And that's what happens here. The write to the LVT has not yet hit the APIC, so the WRMSR has no effect. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-02 19:41 ` Thomas Gleixner @ 2015-08-03 23:58 ` Shaohua Li 2015-08-05 8:44 ` Ingo Molnar 0 siblings, 1 reply; 11+ messages in thread From: Shaohua Li @ 2015-08-03 23:58 UTC (permalink / raw) To: Thomas Gleixner Cc: x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, Ingo Molnar, stable, ak, lenb, fenghua.yu On Sun, Aug 02, 2015 at 09:41:08PM +0200, Thomas Gleixner wrote: > On Sun, 2 Aug 2015, Shaohua Li wrote: > > > On Sat, Aug 01, 2015 at 12:10:41PM +0200, Thomas Gleixner wrote: > > > On Fri, 31 Jul 2015, Shaohua Li wrote: > > > > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > > > > apic_write(APIC_LVTT, lvtt_value); > > > > > > > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > > > > + u64 msr; > > > > + > > > > + /* > > > > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > > > > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > > > > + * This uses the algorithm described in Intel SDM to serialize > > > > + * the two writes > > > > + * */ > > > > + while (1) { > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > > > > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > > > > + if (msr) > > > > + break; > > > > + } > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); > > > > > > > > > I think this is exceptionally silly. A proper fence after the > > > apic_write() should have the same effect. > > > > Not sure what happens in the hardware, I could have a try of fence, but > > I'd prefer using the algorithm Intel described. This is not a fast path, > > s/algorithm/voodoo/ > > > the loop will exit immediately regardless the issue occurs anyway. > > Well, the SDM also says: > > "To allow for efficient access to the APIC registers in x2APIC mode, > the serializing semantics of WRMSR are relaxed when writing to the > APIC registers. Thus, system software should not use “WRMSR to APIC > registers in x2APIC mode” as a serializing instruction. Read and write > accesses to the APIC registers will occur in program order. A WRMSR to > an APIC register may complete before all preceding stores are globally > visible; software can prevent this by inserting a serializing > instruction, an SFENCE, or an MFENCE before the WRMSR." > > And that's what happens here. The write to the LVT has not yet hit the > APIC, so the WRMSR has no effect. What you quoted is for x2APIC, I didn't see similar description for xAPIC. Tested mfence here, it does work. But I'm not convinced it's the right thing. the xAPIC access is memory mapped IO, mfence is nothing related to it. Anyway, cc-ed more intel people, hope they can share some insights. Thanks, Shaohua ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-03 23:58 ` Shaohua Li @ 2015-08-05 8:44 ` Ingo Molnar 2015-08-05 16:25 ` Shaohua Li 0 siblings, 1 reply; 11+ messages in thread From: Ingo Molnar @ 2015-08-05 8:44 UTC (permalink / raw) To: Shaohua Li Cc: Thomas Gleixner, x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, stable, ak, lenb, fenghua.yu * Shaohua Li <shli@fb.com> wrote: > On Sun, Aug 02, 2015 at 09:41:08PM +0200, Thomas Gleixner wrote: > > On Sun, 2 Aug 2015, Shaohua Li wrote: > > > > > On Sat, Aug 01, 2015 at 12:10:41PM +0200, Thomas Gleixner wrote: > > > > On Fri, 31 Jul 2015, Shaohua Li wrote: > > > > > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > > > > > apic_write(APIC_LVTT, lvtt_value); > > > > > > > > > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > > > > > + u64 msr; > > > > > + > > > > > + /* > > > > > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > > > > > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > > > > > + * This uses the algorithm described in Intel SDM to serialize > > > > > + * the two writes > > > > > + * */ > > > > > + while (1) { > > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > > > > > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > > > > > + if (msr) > > > > > + break; > > > > > + } > > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); > > > > > > > > > > > > I think this is exceptionally silly. A proper fence after the > > > > apic_write() should have the same effect. > > > > > > Not sure what happens in the hardware, I could have a try of fence, but > > > I'd prefer using the algorithm Intel described. This is not a fast path, > > > > s/algorithm/voodoo/ > > > > > the loop will exit immediately regardless the issue occurs anyway. > > > > Well, the SDM also says: > > > > "To allow for efficient access to the APIC registers in x2APIC mode, > > the serializing semantics of WRMSR are relaxed when writing to the > > APIC registers. Thus, system software should not use “WRMSR to APIC > > registers in x2APIC mode” as a serializing instruction. Read and write > > accesses to the APIC registers will occur in program order. A WRMSR to > > an APIC register may complete before all preceding stores are globally > > visible; software can prevent this by inserting a serializing > > instruction, an SFENCE, or an MFENCE before the WRMSR." > > > > And that's what happens here. The write to the LVT has not yet hit the > > APIC, so the WRMSR has no effect. > > What you quoted is for x2APIC, I didn't see similar description for > xAPIC. > > Tested mfence here, it does work. But I'm not convinced it's the right thing. > the xAPIC access is memory mapped IO, mfence is nothing related to it. [...] Huh? Can you cite the SDM that says that MFENCE will have no effect on instructions that deal with accesses that are not to system RAM (i.e. are memory mapped IO)? Thanks, Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-05 8:44 ` Ingo Molnar @ 2015-08-05 16:25 ` Shaohua Li 2015-09-09 3:39 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Shaohua Li @ 2015-08-05 16:25 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, stable, ak, lenb, fenghua.yu On Wed, Aug 05, 2015 at 10:44:24AM +0200, Ingo Molnar wrote: > > * Shaohua Li <shli@fb.com> wrote: > > > On Sun, Aug 02, 2015 at 09:41:08PM +0200, Thomas Gleixner wrote: > > > On Sun, 2 Aug 2015, Shaohua Li wrote: > > > > > > > On Sat, Aug 01, 2015 at 12:10:41PM +0200, Thomas Gleixner wrote: > > > > > On Fri, 31 Jul 2015, Shaohua Li wrote: > > > > > > @@ -336,6 +336,22 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) > > > > > > apic_write(APIC_LVTT, lvtt_value); > > > > > > > > > > > > if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { > > > > > > + u64 msr; > > > > > > + > > > > > > + /* > > > > > > + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, > > > > > > + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. > > > > > > + * This uses the algorithm described in Intel SDM to serialize > > > > > > + * the two writes > > > > > > + * */ > > > > > > + while (1) { > > > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, -1L); > > > > > > + rdmsrl(MSR_IA32_TSC_DEADLINE, msr); > > > > > > + if (msr) > > > > > > + break; > > > > > > + } > > > > > > + wrmsrl(MSR_IA32_TSC_DEADLINE, 0); > > > > > > > > > > > > > > > I think this is exceptionally silly. A proper fence after the > > > > > apic_write() should have the same effect. > > > > > > > > Not sure what happens in the hardware, I could have a try of fence, but > > > > I'd prefer using the algorithm Intel described. This is not a fast path, > > > > > > s/algorithm/voodoo/ > > > > > > > the loop will exit immediately regardless the issue occurs anyway. > > > > > > Well, the SDM also says: > > > > > > "To allow for efficient access to the APIC registers in x2APIC mode, > > > the serializing semantics of WRMSR are relaxed when writing to the > > > APIC registers. Thus, system software should not use “WRMSR to APIC > > > registers in x2APIC mode” as a serializing instruction. Read and write > > > accesses to the APIC registers will occur in program order. A WRMSR to > > > an APIC register may complete before all preceding stores are globally > > > visible; software can prevent this by inserting a serializing > > > instruction, an SFENCE, or an MFENCE before the WRMSR." > > > > > > And that's what happens here. The write to the LVT has not yet hit the > > > APIC, so the WRMSR has no effect. > > > > What you quoted is for x2APIC, I didn't see similar description for > > xAPIC. > > > > Tested mfence here, it does work. But I'm not convinced it's the right thing. > > the xAPIC access is memory mapped IO, mfence is nothing related to it. [...] > > Huh? > > Can you cite the SDM that says that MFENCE will have no effect on instructions > that deal with accesses that are not to system RAM (i.e. are memory mapped IO)? Hmm, I didn't mean mfence can't serialize the instructions. For a true IO, a serialization can't guarantee device finishes the IO, we generally read some safe IO registers to wait IO finish. I completely don't know if this case fits here though. Thanks, Shaohua ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-08-05 16:25 ` Shaohua Li @ 2015-09-09 3:39 ` Andi Kleen 2015-09-09 4:13 ` Shaohua Li 2015-09-09 7:35 ` [PATCH] x86: serialize LVTT and TSC_DEADLINE write Thomas Gleixner 0 siblings, 2 replies; 11+ messages in thread From: Andi Kleen @ 2015-09-09 3:39 UTC (permalink / raw) To: Shaohua Li Cc: Ingo Molnar, Thomas Gleixner, x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, stable, lenb, fenghua.yu > Hmm, I didn't mean mfence can't serialize the instructions. For a true > IO, a serialization can't guarantee device finishes the IO, we generally > read some safe IO registers to wait IO finish. I completely don't know > if this case fits here though. Sorry for the late answer. We (Intel) analyzed this case in detail and can confirm that the following sequence 1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. 23. MFENCE. 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline. has the same effect as the loop algorithm described in the SDM on all Intel CPUs. So it's fine to use MFENCE here. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-09-09 3:39 ` Andi Kleen @ 2015-09-09 4:13 ` Shaohua Li 2015-09-14 20:06 ` [tip:x86/urgent] x86/apic: Serialize LVTT and TSC_DEADLINE writes tip-bot for Shaohua Li 2015-09-09 7:35 ` [PATCH] x86: serialize LVTT and TSC_DEADLINE write Thomas Gleixner 1 sibling, 1 reply; 11+ messages in thread From: Shaohua Li @ 2015-09-09 4:13 UTC (permalink / raw) To: Andi Kleen Cc: Ingo Molnar, Thomas Gleixner, x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, stable, lenb, fenghua.yu On Tue, Sep 08, 2015 at 08:39:37PM -0700, Andi Kleen wrote: > > Hmm, I didn't mean mfence can't serialize the instructions. For a true > > IO, a serialization can't guarantee device finishes the IO, we generally > > read some safe IO registers to wait IO finish. I completely don't know > > if this case fits here though. > > Sorry for the late answer. We (Intel) analyzed this case in detail and > can confirm that the following sequence > > 1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. > 23. MFENCE. > 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline. > > has the same effect as the loop algorithm described in the SDM on all Intel > CPUs. So it's fine to use MFENCE here. Thank you very much, Andi! Here is the updated patch. >From b5cdd3c14b0589e5d5a72d607814d9e808d05765 Mon Sep 17 00:00:00 2001 Message-Id: <b5cdd3c14b0589e5d5a72d607814d9e808d05765.1441771848.git.shli@fb.com> From: Shaohua Li <shli@fb.com> Date: Thu, 30 Jul 2015 16:24:43 -0700 Subject: [PATCH] x86: serialize LVTT and TSC_DEADLINE write We saw a strange issue with local APIC timer. Some random CPU doesn't receive any local APIC timer interrupt, which causes different issues. The cpu uses TSC-Deadline mode for local APIC timer and APIC is in xAPIC mode. When this happens, manually writing TSC_DEADLINE MSR can trigger interrupt again and the system goes normal. Currently we only see this issue in E5-2660 v2 and E5-2680 v2 CPU. Compiler version seems mattering too, it's quite easy to reproduce the issue with v4.7 gcc. Since the local APIC timer interrupt number is 0, we either lose the first interrupt or TSC_DEADLINE MSR isn't set correctly. After some debugging, we believe it's the serialize issue described in Intel SDM. In xAPIC mode, write to APIC LVTT and write to TSC_DEADLINE isn't serialized. Debug shows read TSC_DEADLINE MSR followed the very first MSR write returns 0 in the buggy cpu. Thanks Andi Kleen confirms a 'mfence' can do the serialization here. 'mfence' instruction should be supported by CPU with TSC_DEADLINE, so we don't need to check the instruction availability. Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: stable@vger.kernel.org v3.7+ Signed-off-by: Shaohua Li <shli@fb.com> --- arch/x86/kernel/apic/apic.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 3ca3e46..e3fd4ab 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -336,6 +336,13 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { + /* + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, + * writing APIC LVTT and TSC_DEADLINE MSR isn't serialized. + * According to Intel, mfence can do the serialization here. + * */ + asm volatile("mfence" : : : "memory"); + printk_once(KERN_DEBUG "TSC deadline timer enabled\n"); return; } -- 1.8.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [tip:x86/urgent] x86/apic: Serialize LVTT and TSC_DEADLINE writes 2015-09-09 4:13 ` Shaohua Li @ 2015-09-14 20:06 ` tip-bot for Shaohua Li 0 siblings, 0 replies; 11+ messages in thread From: tip-bot for Shaohua Li @ 2015-09-14 20:06 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, Kernel-team, mingo, fenghua.yu, tglx, ak, lenb, linux-kernel, shli Commit-ID: 5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 Gitweb: http://git.kernel.org/tip/5d7c631d926b59aa16f3c56eaeb83f1036c81dc7 Author: Shaohua Li <shli@fb.com> AuthorDate: Thu, 30 Jul 2015 16:24:43 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Mon, 14 Sep 2015 18:29:59 +0200 x86/apic: Serialize LVTT and TSC_DEADLINE writes The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not guaranteed that the write to LVTT has reached the APIC before the TSC_DEADLINE MSR is written. In such a case the write to the MSR is ignored and as a consequence the local timer interrupt never fires. The SDM decribes this issue for xAPIC and x2APIC modes. The serialization methods recommended by the SDM differ. xAPIC: "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. 2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter. 3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2. 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline." x2APIC: "To allow for efficient access to the APIC registers in x2APIC mode, the serializing semantics of WRMSR are relaxed when writing to the APIC registers. Thus, system software should not use 'WRMSR to APIC registers in x2APIC mode' as a serializing instruction. Read and write accesses to the APIC registers will occur in program order. A WRMSR to an APIC register may complete before all preceding stores are globally visible; software can prevent this by inserting a serializing instruction, an SFENCE, or an MFENCE before the WRMSR." The xAPIC method is to just wait for the memory mapped write to hit the LVTT by checking whether the MSR write has reached the hardware. There is no reason why a proper MFENCE after the memory mapped write would not do the same. Andi Kleen confirmed that MFENCE is sufficient for the xAPIC case as well. Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE support. [ tglx: Massaged the changelog ] Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: <Kernel-team@fb.com> Cc: <lenb@kernel.org> Cc: <fenghua.yu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org #v3.7+ Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/kernel/apic/apic.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 3ca3e46..24e94ce 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -336,6 +336,13 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen) apic_write(APIC_LVTT, lvtt_value); if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) { + /* + * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode, + * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized. + * According to Intel, MFENCE can do the serialization here. + */ + asm volatile("mfence" : : : "memory"); + printk_once(KERN_DEBUG "TSC deadline timer enabled\n"); return; } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: serialize LVTT and TSC_DEADLINE write 2015-09-09 3:39 ` Andi Kleen 2015-09-09 4:13 ` Shaohua Li @ 2015-09-09 7:35 ` Thomas Gleixner 1 sibling, 0 replies; 11+ messages in thread From: Thomas Gleixner @ 2015-09-09 7:35 UTC (permalink / raw) To: Andi Kleen Cc: Shaohua Li, Ingo Molnar, x86, linux-kernel, Kernel-team, Suresh Siddha, H. Peter Anvin, stable, lenb, fenghua.yu Andi, On Tue, 8 Sep 2015, Andi Kleen wrote: > > Hmm, I didn't mean mfence can't serialize the instructions. For a true > > IO, a serialization can't guarantee device finishes the IO, we generally > > read some safe IO registers to wait IO finish. I completely don't know > > if this case fits here though. > > Sorry for the late answer. We (Intel) analyzed this case in detail and > can confirm that the following sequence > > 1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b. > 23. MFENCE. > 4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline. > > has the same effect as the loop algorithm described in the SDM on all Intel > CPUs. So it's fine to use MFENCE here. Thanks for the confirmation. It would have been really surprising if that wouldn't work. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-09-14 20:07 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-31 22:11 [PATCH] x86: serialize LVTT and TSC_DEADLINE write Shaohua Li 2015-08-01 10:10 ` Thomas Gleixner 2015-08-02 15:49 ` Shaohua Li 2015-08-02 19:41 ` Thomas Gleixner 2015-08-03 23:58 ` Shaohua Li 2015-08-05 8:44 ` Ingo Molnar 2015-08-05 16:25 ` Shaohua Li 2015-09-09 3:39 ` Andi Kleen 2015-09-09 4:13 ` Shaohua Li 2015-09-14 20:06 ` [tip:x86/urgent] x86/apic: Serialize LVTT and TSC_DEADLINE writes tip-bot for Shaohua Li 2015-09-09 7:35 ` [PATCH] x86: serialize LVTT and TSC_DEADLINE write Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).