* [PATCH v1] Misc fix to hypervisor. @ 2014-06-04 13:37 Konrad Rzeszutek Wilk 2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk 0 siblings, 1 reply; 5+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-06-04 13:37 UTC (permalink / raw) To: xen-devel, chegger, jinsong.liu, keir, jbeulich This patch came about as I was working on a different bug and crashed dom0 quite often. That caused an infinite loop which made the CPUs hot and the firmware decided to tell me about that. In fact it told me so often that I got fed up with it. As such this patch adds a state machine to this so it will only print this when the state has changed. xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) 2014-06-04 13:37 [PATCH v1] Misc fix to hypervisor Konrad Rzeszutek Wilk @ 2014-06-04 13:37 ` Konrad Rzeszutek Wilk 2014-06-04 13:49 ` Andrew Cooper 2014-06-04 14:49 ` Jan Beulich 0 siblings, 2 replies; 5+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-06-04 13:37 UTC (permalink / raw) To: xen-devel, chegger, jinsong.liu, keir, jbeulich; +Cc: Konrad Rzeszutek Wilk If the machine has been quite busy it ends up with these messages printed on the hypervisor console: (XEN) CPU3: Temperature/speed normal (XEN) CPU1: Temperature/speed normal (XEN) CPU0: Temperature/speed normal (XEN) CPU1: Temperature/speed normal (XEN) CPU0: Temperature/speed normal (XEN) CPU2: Temperature/speed normal (XEN) CPU3: Temperature/speed normal (XEN) CPU0: Temperature/speed normal (XEN) CPU2: Temperature/speed normal (XEN) CPU3: Temperature/speed normal (XEN) CPU1: Temperature/speed normal (XEN) CPU0: Temperature above threshold (XEN) CPU0: Running in modulated clock mode (XEN) CPU1: Temperature/speed normal (XEN) CPU2: Temperature/speed normal (XEN) CPU3: Temperature/speed normal While the state changes are important, the non-altered state information is not needed. As such add a latch mechanism to only print the information if it has changed since the last update. This was observed on Intel DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 CC: Jan Beulich <jbeulich@suse.com> CC: Keir Fraser <keir@xen.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: Redo per Daniel and Boris's review] --- xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c index b32fdb2..a5caf47 100644 --- a/xen/arch/x86/cpu/mcheck/mce_intel.c +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c @@ -49,11 +49,13 @@ static int __read_mostly nr_intel_ext_msrs; #define INTEL_SRAR_INSTR_FETCH 0x150 #ifdef CONFIG_X86_MCE_THERMAL +#define MCE_RING 0x1 static void intel_thermal_interrupt(struct cpu_user_regs *regs) { uint64_t msr_content; unsigned int cpu = smp_processor_id(); static DEFINE_PER_CPU(s_time_t, next); + static DEFINE_PER_CPU(int, last_state) = -1; ack_APIC_irq(); @@ -62,10 +64,13 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs) per_cpu(next, cpu) = NOW() + MILLISECS(5000); rdmsrl(MSR_IA32_THERM_STATUS, msr_content); - if (msr_content & 0x1) { + if ( __get_cpu_var(last_state) == (msr_content & MCE_RING) ) + return; + per_cpu(last_state, cpu) = msr_content & MCE_RING; + if ( msr_content & MCE_RING ) + { printk(KERN_EMERG "CPU%d: Temperature above threshold\n", cpu); - printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", - cpu); + printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", cpu); add_taint(TAINT_MACHINE_CHECK); } else { printk(KERN_INFO "CPU%d: Temperature/speed normal\n", cpu); -- 1.9.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) 2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk @ 2014-06-04 13:49 ` Andrew Cooper 2014-06-04 14:49 ` Jan Beulich 1 sibling, 0 replies; 5+ messages in thread From: Andrew Cooper @ 2014-06-04 13:49 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, chegger, keir, jbeulich, jinsong.liu On 04/06/14 14:37, Konrad Rzeszutek Wilk wrote: > If the machine has been quite busy it ends up with these > messages printed on the hypervisor console: > > (XEN) CPU3: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature above threshold > (XEN) CPU0: Running in modulated clock mode > (XEN) CPU1: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > > While the state changes are important, the non-altered > state information is not needed. As such add a latch > mechanism to only print the information if it has > changed since the last update. > > This was observed on Intel DQ67SW, > BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 > > CC: Jan Beulich <jbeulich@suse.com> > CC: Keir Fraser <keir@xen.org> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > [v2: Redo per Daniel and Boris's review] > --- > xen/arch/x86/cpu/mcheck/mce_intel.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c > index b32fdb2..a5caf47 100644 > --- a/xen/arch/x86/cpu/mcheck/mce_intel.c > +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c > @@ -49,11 +49,13 @@ static int __read_mostly nr_intel_ext_msrs; > #define INTEL_SRAR_INSTR_FETCH 0x150 > > #ifdef CONFIG_X86_MCE_THERMAL > +#define MCE_RING 0x1 > static void intel_thermal_interrupt(struct cpu_user_regs *regs) > { > uint64_t msr_content; > unsigned int cpu = smp_processor_id(); > static DEFINE_PER_CPU(s_time_t, next); > + static DEFINE_PER_CPU(int, last_state) = -1; Percpu areas are all 0'd during init. This default value won't stick on anything other than the BSP, and that is only because of the current implementation. > > ack_APIC_irq(); > > @@ -62,10 +64,13 @@ static void intel_thermal_interrupt(struct cpu_user_regs *regs) > > per_cpu(next, cpu) = NOW() + MILLISECS(5000); > rdmsrl(MSR_IA32_THERM_STATUS, msr_content); > - if (msr_content & 0x1) { > + if ( __get_cpu_var(last_state) == (msr_content & MCE_RING) ) > + return; > + per_cpu(last_state, cpu) = msr_content & MCE_RING; You have an asymmetry in accessing last_state. As both of them end up using RELOC_HIDE(), it is more efficient to use int *this_last_state = &per_cpu(last_state, cpu); > + if ( msr_content & MCE_RING ) > + { > printk(KERN_EMERG "CPU%d: Temperature above threshold\n", cpu); > - printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", > - cpu); > + printk(KERN_EMERG "CPU%d: Running in modulated clock mode\n", cpu); As you are fixing this up, cpu is unsigned so should use %u rather than %d ~Andrew > add_taint(TAINT_MACHINE_CHECK); > } else { > printk(KERN_INFO "CPU%d: Temperature/speed normal\n", cpu); ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) 2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk 2014-06-04 13:49 ` Andrew Cooper @ 2014-06-04 14:49 ` Jan Beulich 2014-06-04 18:50 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 5+ messages in thread From: Jan Beulich @ 2014-06-04 14:49 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: jinsong.liu, chegger, keir, xen-devel >>> On 04.06.14 at 15:37, <konrad.wilk@oracle.com> wrote: > If the machine has been quite busy it ends up with these > messages printed on the hypervisor console: > > (XEN) CPU3: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > (XEN) CPU0: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > (XEN) CPU1: Temperature/speed normal > (XEN) CPU0: Temperature above threshold > (XEN) CPU0: Running in modulated clock mode > (XEN) CPU1: Temperature/speed normal > (XEN) CPU2: Temperature/speed normal > (XEN) CPU3: Temperature/speed normal > > While the state changes are important, the non-altered > state information is not needed. As such add a latch > mechanism to only print the information if it has > changed since the last update. But isn't the interrupt supposed to happen only when state changes in the first place? Jan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) 2014-06-04 14:49 ` Jan Beulich @ 2014-06-04 18:50 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 5+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-06-04 18:50 UTC (permalink / raw) To: Jan Beulich; +Cc: jinsong.liu, chegger, keir, xen-devel On Wed, Jun 04, 2014 at 03:49:52PM +0100, Jan Beulich wrote: > >>> On 04.06.14 at 15:37, <konrad.wilk@oracle.com> wrote: > > If the machine has been quite busy it ends up with these > > messages printed on the hypervisor console: > > > > (XEN) CPU3: Temperature/speed normal > > (XEN) CPU1: Temperature/speed normal > > (XEN) CPU0: Temperature/speed normal > > (XEN) CPU1: Temperature/speed normal > > (XEN) CPU0: Temperature/speed normal > > (XEN) CPU2: Temperature/speed normal > > (XEN) CPU3: Temperature/speed normal > > (XEN) CPU0: Temperature/speed normal > > (XEN) CPU2: Temperature/speed normal > > (XEN) CPU3: Temperature/speed normal > > (XEN) CPU1: Temperature/speed normal > > (XEN) CPU0: Temperature above threshold > > (XEN) CPU0: Running in modulated clock mode > > (XEN) CPU1: Temperature/speed normal > > (XEN) CPU2: Temperature/speed normal > > (XEN) CPU3: Temperature/speed normal > > > > While the state changes are important, the non-altered > > state information is not needed. As such add a latch > > mechanism to only print the information if it has > > changed since the last update. > > But isn't the interrupt supposed to happen only when state changes > in the first place? HA! That is what I thought too, but this machine keeps on triggering this interrupt. > > Jan > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-06-04 18:51 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-04 13:37 [PATCH v1] Misc fix to hypervisor Konrad Rzeszutek Wilk 2014-06-04 13:37 ` [PATCH v1] xen/mce: Don't spam the console with "CPUx: Temperature z" (v2) Konrad Rzeszutek Wilk 2014-06-04 13:49 ` Andrew Cooper 2014-06-04 14:49 ` Jan Beulich 2014-06-04 18:50 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.