* [PATCH] x86: Don't print number of MCE banks for every CPU
@ 2009-10-15 21:21 Roland Dreier
2009-10-16 7:20 ` Ingo Molnar
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Roland Dreier @ 2009-10-15 21:21 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin; +Cc: linux-kernel, x86
The MCE initialization code explicitly says it doesn't handle asymmetric
configurations where different CPUs support different numbers of MCE
banks, and it prints a big warning in that case. Therefore, printing
the "mce: CPU supports <x> MCE banks" message into the kernel log for
every CPU is pure redundancy that clutters the log significantly for
systems with lots of CPUs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1598a9..721a77c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
rdmsrl(MSR_IA32_MCG_CAP, cap);
b = cap & MCG_BANKCNT_MASK;
- printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+ if (!banks)
+ printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
if (b > MAX_NR_BANKS) {
printk(KERN_WARNING
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] x86: Don't print number of MCE banks for every CPU 2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier @ 2009-10-16 7:20 ` Ingo Molnar 2009-10-16 7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier 2009-10-27 19:42 ` [PATCH] " Mike Travis 2 siblings, 0 replies; 17+ messages in thread From: Ingo Molnar @ 2009-10-16 7:20 UTC (permalink / raw) To: Roland Dreier Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 * Roland Dreier <rdreier@cisco.com> wrote: > The MCE initialization code explicitly says it doesn't handle asymmetric > configurations where different CPUs support different numbers of MCE > banks, and it prints a big warning in that case. Therefore, printing > the "mce: CPU supports <x> MCE banks" message into the kernel log for > every CPU is pure redundancy that clutters the log significantly for > systems with lots of CPUs. > > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) Applied, thanks Roland! Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* [tip:x86/urgent] x86: Don't print number of MCE banks for every CPU 2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier 2009-10-16 7:20 ` Ingo Molnar @ 2009-10-16 7:22 ` tip-bot for Roland Dreier 2009-10-27 19:42 ` [PATCH] " Mike Travis 2 siblings, 0 replies; 17+ messages in thread From: tip-bot for Roland Dreier @ 2009-10-16 7:22 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rdreier, rolandd, tglx, mingo Commit-ID: 93ae5012a79b11e7fc855b52c7ce1e16fe1540b0 Gitweb: http://git.kernel.org/tip/93ae5012a79b11e7fc855b52c7ce1e16fe1540b0 Author: Roland Dreier <rdreier@cisco.com> AuthorDate: Thu, 15 Oct 2009 14:21:14 -0700 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Fri, 16 Oct 2009 09:20:03 +0200 x86: Don't print number of MCE banks for every CPU The MCE initialization code explicitly says it doesn't handle asymmetric configurations where different CPUs support different numbers of MCE banks, and it prints a big warning in that case. Therefore, printing the "mce: CPU supports <x> MCE banks" message into the kernel log for every CPU is pure redundancy that clutters the log significantly for systems with lots of CPUs. Signed-off-by: Roland Dreier <rolandd@cisco.com> LKML-Reference: <adaeip473qt.fsf@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index b1598a9..721a77c 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void) rdmsrl(MSR_IA32_MCG_CAP, cap); b = cap & MCG_BANKCNT_MASK; - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); + if (!banks) + printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); if (b > MAX_NR_BANKS) { printk(KERN_WARNING ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] x86: Don't print number of MCE banks for every CPU 2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier 2009-10-16 7:20 ` Ingo Molnar 2009-10-16 7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier @ 2009-10-27 19:42 ` Mike Travis 2009-10-27 20:53 ` Mike Travis 2 siblings, 1 reply; 17+ messages in thread From: Mike Travis @ 2009-10-27 19:42 UTC (permalink / raw) To: Roland Dreier Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Hi Roland, I've found that I'm getting one of these lines for every cpu: mce: CPU supports 0 MCE banks Regards, Mike Roland Dreier wrote: > The MCE initialization code explicitly says it doesn't handle asymmetric > configurations where different CPUs support different numbers of MCE > banks, and it prints a big warning in that case. Therefore, printing > the "mce: CPU supports <x> MCE banks" message into the kernel log for > every CPU is pure redundancy that clutters the log significantly for > systems with lots of CPUs. > > Signed-off-by: Roland Dreier <rolandd@cisco.com> > --- > arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c > index b1598a9..721a77c 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void) > rdmsrl(MSR_IA32_MCG_CAP, cap); > > b = cap & MCG_BANKCNT_MASK; > - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); > + if (!banks) > + printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); > > if (b > MAX_NR_BANKS) { > printk(KERN_WARNING > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86: Don't print number of MCE banks for every CPU 2009-10-27 19:42 ` [PATCH] " Mike Travis @ 2009-10-27 20:53 ` Mike Travis 2009-10-28 4:07 ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto 2009-10-28 4:26 ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier 0 siblings, 2 replies; 17+ messages in thread From: Mike Travis @ 2009-10-27 20:53 UTC (permalink / raw) To: Roland Dreier Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Mike Travis wrote: > Hi Roland, > > I've found that I'm getting one of these lines for every cpu: > > mce: CPU supports 0 MCE banks > A bit more info. THe data above was from our simulator which apparently is not simulating mce very well. On a live system I get 383 lines (for 383 additional cpus) with what appears to be redundant lines... [ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21 [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21 ... [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21 > Regards, > Mike > > Roland Dreier wrote: >> The MCE initialization code explicitly says it doesn't handle asymmetric >> configurations where different CPUs support different numbers of MCE >> banks, and it prints a big warning in that case. Therefore, printing >> the "mce: CPU supports <x> MCE banks" message into the kernel log for >> every CPU is pure redundancy that clutters the log significantly for >> systems with lots of CPUs. >> >> Signed-off-by: Roland Dreier <rolandd@cisco.com> >> --- >> arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c >> b/arch/x86/kernel/cpu/mcheck/mce.c >> index b1598a9..721a77c 100644 >> --- a/arch/x86/kernel/cpu/mcheck/mce.c >> +++ b/arch/x86/kernel/cpu/mcheck/mce.c >> @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void) >> rdmsrl(MSR_IA32_MCG_CAP, cap); >> >> b = cap & MCG_BANKCNT_MASK; >> - printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); >> + if (!banks) >> + printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); >> >> if (b > MAX_NR_BANKS) { >> printk(KERN_WARNING >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-27 20:53 ` Mike Travis @ 2009-10-28 4:07 ` Hidetoshi Seto 2009-10-28 5:24 ` Andi Kleen 2009-10-28 4:26 ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier 1 sibling, 1 reply; 17+ messages in thread From: Hidetoshi Seto @ 2009-10-28 4:07 UTC (permalink / raw) To: Mike Travis Cc: Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86, Andi Kleen Mike Travis wrote: > > Mike Travis wrote: >> Hi Roland, >> >> I've found that I'm getting one of these lines for every cpu: >> >> mce: CPU supports 0 MCE banks I believe my patch at last in this mail will solve this issue. > A bit more info. THe data above was from our simulator which > apparently is not simulating mce very well. On a live system > I get 383 lines (for 383 additional cpus) with what appears to be > redundant lines... > > [ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 > SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 > SHD:19 SHD:20 SHD:21 > [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 > SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 > SHD:19 SHD:20 SHD:21 > ... > [ 4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 > SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 > SHD:19 SHD:20 SHD:21 Hum, I suppose the line for CPU 0 was slightly different from others, because SHD means "this bank is shared bank and controlled by other". Maybe: CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 But I agree that we could some work for this messages... Is it better to change the message level to debug from info? How about changing the format like: CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss : If there are no complains, I'll make another patch to do so. Thanks, H.Seto === Subject: [PATCH] x86, mce: disable MCE if cpu has no MCE banks If cpu has no MCE banks (e.g. simulated processor on VMs), it is better to disable MCE support on the system since we cannot handle MCE well. Reported-by: Mike Travis <travis@sgi.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> --- arch/x86/kernel/cpu/mcheck/mce.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 8080170..29055ab 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1228,6 +1228,10 @@ static int __cpuinit __mcheck_cpu_cap_init(void) rdmsrl(MSR_IA32_MCG_CAP, cap); b = cap & MCG_BANKCNT_MASK; + if (!b) { + pr_info("MCE: no MCE banks - not enabling MCE support.\n"); + return -ENODEV; + } if (!banks) printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b); -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 4:07 ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto @ 2009-10-28 5:24 ` Andi Kleen 2009-10-28 6:26 ` Hidetoshi Seto 2009-10-28 12:03 ` Valdis.Kletnieks 0 siblings, 2 replies; 17+ messages in thread From: Andi Kleen @ 2009-10-28 5:24 UTC (permalink / raw) To: Hidetoshi Seto Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Hidetoshi Seto wrote: > Mike Travis wrote: >> Mike Travis wrote: >>> Hi Roland, >>> >>> I've found that I'm getting one of these lines for every cpu: >>> >>> mce: CPU supports 0 MCE banks That message can be just removed I think. I don't see much value in it because the value is in sysfs and when you see the CPU type you can easily determine it anyways. I don't think the patch below really solves the problem because they would have the same noise problem back once they switch from the simulator to a real box which has banks. > Hum, I suppose the line for CPU 0 was slightly different from others, > because SHD means "this bank is shared bank and controlled by other". > Maybe: > CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 > > But I agree that we could some work for this messages... > Is it better to change the message level to debug from info? Can be made INFO yes, but I would prefer not removing them from the dmesg for now. Perhaps they could be also compressed a bit like SRAT. -Andi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 5:24 ` Andi Kleen @ 2009-10-28 6:26 ` Hidetoshi Seto 2009-10-28 6:48 ` Andi Kleen 2009-10-28 12:03 ` Valdis.Kletnieks 1 sibling, 1 reply; 17+ messages in thread From: Hidetoshi Seto @ 2009-10-28 6:26 UTC (permalink / raw) To: Andi Kleen Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Andi Kleen wrote: > Hidetoshi Seto wrote: >> Mike Travis wrote: >>> Mike Travis wrote: >>>> Hi Roland, >>>> >>>> I've found that I'm getting one of these lines for every cpu: >>>> >>>> mce: CPU supports 0 MCE banks > > That message can be just removed I think. I don't see much value in it > because the value is in sysfs and when you see the CPU type you can easily > determine it anyways. > > I don't think the patch below really solves the problem because they > would have the same noise problem back once they switch from the simulator > to a real box which has banks. If box has any banks more than 0, then the line above will be appeared only once for CPU 0. Only on the simulator, with MCE-capable processor with no bank, this message becomes unacceptable noise because it appears for every cpu. Anyway I think my patch is nice to have, to avoid unexpected behavior on uncertain environment. Without disabling, what can we do on MCE with no bank? I found that do_machine_check() does nothing if banks==0 ... it is better to let system to panic with "Machine check from unknown source"? >> Hum, I suppose the line for CPU 0 was slightly different from others, >> because SHD means "this bank is shared bank and controlled by other". >> Maybe: >> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >> >> But I agree that we could some work for this messages... >> Is it better to change the message level to debug from info? > > Can be made INFO yes, but I would prefer not removing them > from the dmesg for now. > > Perhaps they could be also compressed a bit like SRAT. Like SRAT? I could not catch the meaning ... For example? Thanks, H.Seto ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 6:26 ` Hidetoshi Seto @ 2009-10-28 6:48 ` Andi Kleen 2009-10-28 8:18 ` Hidetoshi Seto 2009-10-28 17:12 ` Roland Dreier 0 siblings, 2 replies; 17+ messages in thread From: Andi Kleen @ 2009-10-28 6:48 UTC (permalink / raw) To: Hidetoshi Seto Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Hidetoshi Seto wrote: > > Without disabling, what can we do on MCE with no bank? Nothing, but is it really worth adding a special case? > I found that do_machine_check() does nothing if banks==0 ... it is better > to let system to panic with "Machine check from unknown source"? IMHO yes. In this case the system must be very confused and panic is the best you can do. Otherwise it won't do anything interesting anyways. > >>> Hum, I suppose the line for CPU 0 was slightly different from others, >>> because SHD means "this bank is shared bank and controlled by other". >>> Maybe: >>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >>> >>> But I agree that we could some work for this messages... >>> Is it better to change the message level to debug from info? >> Can be made INFO yes, but I would prefer not removing them >> from the dmesg for now. >> >> Perhaps they could be also compressed a bit like SRAT. > > Like SRAT? I could not catch the meaning ... For example? See the recent patches from David Rientjes in the same original thread. -Andi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 6:48 ` Andi Kleen @ 2009-10-28 8:18 ` Hidetoshi Seto 2009-10-28 17:09 ` Mike Travis 2009-10-28 17:12 ` Roland Dreier 1 sibling, 1 reply; 17+ messages in thread From: Hidetoshi Seto @ 2009-10-28 8:18 UTC (permalink / raw) To: Andi Kleen Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Andi Kleen wrote: > Hidetoshi Seto wrote: >> Without disabling, what can we do on MCE with no bank? > > Nothing, but is it really worth adding a special case? If question were: - is it really worth to support this special environment, "MCE-capable but no MCE banks" ? then I'd like to say no. So I suggested to disable MCE on this uncertain environment. Or we will end up adding more codes for special cases... >> I found that do_machine_check() does nothing if banks==0 ... it is better >> to let system to panic with "Machine check from unknown source"? > > IMHO yes. In this case the system must be very confused and panic is the > best you can do. Otherwise it won't do anything interesting anyways. Agreed, but this is also a special case. Not depending on the real number of banks, confused system could fail to get the value from memory... Humm, in theory MCE handler must be implemented carefully, but I bet the confused value will not be always 0, ... is it worth to do? >>>> Hum, I suppose the line for CPU 0 was slightly different from others, >>>> because SHD means "this bank is shared bank and controlled by other". >>>> Maybe: >>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >>>> >>>> But I agree that we could some work for this messages... >>>> Is it better to change the message level to debug from info? >>> Can be made INFO yes, but I would prefer not removing them >>> from the dmesg for now. >>> >>> Perhaps they could be also compressed a bit like SRAT. >> >> Like SRAT? I could not catch the meaning ... For example? > > See the recent patches from David Rientjes in the same original thread. I found it, thanks. So I suppose your idea is like: CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11} CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11} right? IMHO the format I suggested is better to read, as far as banks is not so big number. CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss Thanks, H.Seto ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 8:18 ` Hidetoshi Seto @ 2009-10-28 17:09 ` Mike Travis 0 siblings, 0 replies; 17+ messages in thread From: Mike Travis @ 2009-10-28 17:09 UTC (permalink / raw) To: Hidetoshi Seto Cc: Andi Kleen, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Hidetoshi Seto wrote: > Andi Kleen wrote: >> Hidetoshi Seto wrote: >>> Without disabling, what can we do on MCE with no bank? >> Nothing, but is it really worth adding a special case? > > If question were: > - is it really worth to support this special environment, > "MCE-capable but no MCE banks" ? > then I'd like to say no. > > So I suggested to disable MCE on this uncertain environment. > Or we will end up adding more codes for special cases... > >>> I found that do_machine_check() does nothing if banks==0 ... it is better >>> to let system to panic with "Machine check from unknown source"? >> IMHO yes. In this case the system must be very confused and panic is the >> best you can do. Otherwise it won't do anything interesting anyways. > > Agreed, but this is also a special case. > Not depending on the real number of banks, confused system could fail to > get the value from memory... Humm, in theory MCE handler must be > implemented carefully, but I bet the confused value will not be always 0, > ... is it worth to do? > >>>>> Hum, I suppose the line for CPU 0 was slightly different from others, >>>>> because SHD means "this bank is shared bank and controlled by other". >>>>> Maybe: >>>>> CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21 >>>>> >>>>> But I agree that we could some work for this messages... >>>>> Is it better to change the message level to debug from info? >>>> Can be made INFO yes, but I would prefer not removing them >>>> from the dmesg for now. >>>> >>>> Perhaps they could be also compressed a bit like SRAT. >>> Like SRAT? I could not catch the meaning ... For example? >> See the recent patches from David Rientjes in the same original thread. > > I found it, thanks. > > So I suppose your idea is like: > CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11} > CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11} > right? > > IMHO the format I suggested is better to read, as far as banks is > not so big number. > CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC > CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss > > > Thanks, > H.Seto The problem comes up when you have a whole bunch of cpus, and the lines become redundant. Can you compress the lines so that cpus with the same given mappings are printed on one line? Thanks, Mike ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 6:48 ` Andi Kleen 2009-10-28 8:18 ` Hidetoshi Seto @ 2009-10-28 17:12 ` Roland Dreier 2009-10-28 17:37 ` Mike Travis 1 sibling, 1 reply; 17+ messages in thread From: Roland Dreier @ 2009-10-28 17:12 UTC (permalink / raw) To: Andi Kleen Cc: Hidetoshi Seto, Mike Travis, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 > Perhaps they could be also compressed a bit like SRAT. Seems like a good idea... but I wonder what the best way to represent things is. For example I have a 2-socket Nehalem system that shows: 2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8 6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8 8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 presumably the first line is once per package, the next line is for the first sibling in all the other cores in a package, and the last line is for the SMT siblings of all the cores. But would we want to accumulate all the different combinations of banks along with a CPU mask and then print something like: CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8 CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8 CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 of course output like that is going to lead to super-long lines on a 64-thread system. Also I'm not sure of a clean way to implement this; unlike the SRAT stuff, we need to deal with CPU hotplug so all this at best could be __cpuinitdata, ie we can't discard it in most configs. However the "MCA banks" output definitely is annoying on a 64-thread system -- the amount of output is far greater than the utility of said output. So ideas on the best way to reduce this would be appreciated. Thanks, Roland ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 17:12 ` Roland Dreier @ 2009-10-28 17:37 ` Mike Travis 2009-10-28 18:03 ` Roland Dreier 0 siblings, 1 reply; 17+ messages in thread From: Mike Travis @ 2009-10-28 17:37 UTC (permalink / raw) To: Roland Dreier Cc: Andi Kleen, Hidetoshi Seto, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Roland Dreier wrote: > > Perhaps they could be also compressed a bit like SRAT. > > Seems like a good idea... but I wonder what the best way to represent > things is. For example I have a 2-socket Nehalem system that shows: > > 2 times: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8 > 6 times: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8 > 8 times: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 > > presumably the first line is once per package, the next line is for the > first sibling in all the other cores in a package, and the last line is > for the SMT siblings of all the cores. > > But would we want to accumulate all the different combinations of banks > along with a CPU mask and then print something like: > > CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8 > CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8 > CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely. > > of course output like that is going to lead to super-long lines on a > 64-thread system. > > Also I'm not sure of a clean way to implement this; unlike the SRAT > stuff, we need to deal with CPU hotplug so all this at best could be > __cpuinitdata, ie we can't discard it in most configs. > > However the "MCA banks" output definitely is annoying on a 64-thread > system -- the amount of output is far greater than the utility of said > output. So ideas on the best way to reduce this would be appreciated. > > Thanks, > Roland ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 17:37 ` Mike Travis @ 2009-10-28 18:03 ` Roland Dreier 0 siblings, 0 replies; 17+ messages in thread From: Roland Dreier @ 2009-10-28 18:03 UTC (permalink / raw) To: Mike Travis Cc: Andi Kleen, Hidetoshi Seto, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 > > But would we want to accumulate all the different combinations of banks > > along with a CPU mask and then print something like: > > > > CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8 > > CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8 > > CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8 > > Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely. Thanks! I didn't know about that API. However with that said I think the real issue is whether that style of output is a good idea, no matter how nicely the CPU list is formatted :) - R. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 5:24 ` Andi Kleen 2009-10-28 6:26 ` Hidetoshi Seto @ 2009-10-28 12:03 ` Valdis.Kletnieks 2009-10-28 13:44 ` Andi Kleen 1 sibling, 1 reply; 17+ messages in thread From: Valdis.Kletnieks @ 2009-10-28 12:03 UTC (permalink / raw) To: Andi Kleen Cc: Hidetoshi Seto, Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 [-- Attachment #1: Type: text/plain, Size: 472 bytes --] On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said: > >>> mce: CPU supports 0 MCE banks > > That message can be just removed I think. I don't see much value in it > because the value is in sysfs and when you see the CPU type you can easily > determine it anyways. Maybe it should only print a message if it finds an unexpected number of banks? "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the hardware says there's only 4. What's up with that?" [-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks 2009-10-28 12:03 ` Valdis.Kletnieks @ 2009-10-28 13:44 ` Andi Kleen 0 siblings, 0 replies; 17+ messages in thread From: Andi Kleen @ 2009-10-28 13:44 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Hidetoshi Seto, Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 Valdis.Kletnieks@vt.edu wrote: > On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said: > >>>>> mce: CPU supports 0 MCE banks >> That message can be just removed I think. I don't see much value in it >> because the value is in sysfs and when you see the CPU type you can easily >> determine it anyways. > > Maybe it should only print a message if it finds an unexpected number of banks? > "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the > hardware says there's only 4. What's up with that?" The kernel doesn't know what number of banks are expected, just humans do. -Andi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] x86: Don't print number of MCE banks for every CPU 2009-10-27 20:53 ` Mike Travis 2009-10-28 4:07 ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto @ 2009-10-28 4:26 ` Roland Dreier 1 sibling, 0 replies; 17+ messages in thread From: Roland Dreier @ 2009-10-28 4:26 UTC (permalink / raw) To: Mike Travis Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86 > [ 4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21 Yes, we should probably kill that debug output as well, that was on my list of things to do. - R. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-10-28 18:03 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier 2009-10-16 7:20 ` Ingo Molnar 2009-10-16 7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier 2009-10-27 19:42 ` [PATCH] " Mike Travis 2009-10-27 20:53 ` Mike Travis 2009-10-28 4:07 ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto 2009-10-28 5:24 ` Andi Kleen 2009-10-28 6:26 ` Hidetoshi Seto 2009-10-28 6:48 ` Andi Kleen 2009-10-28 8:18 ` Hidetoshi Seto 2009-10-28 17:09 ` Mike Travis 2009-10-28 17:12 ` Roland Dreier 2009-10-28 17:37 ` Mike Travis 2009-10-28 18:03 ` Roland Dreier 2009-10-28 12:03 ` Valdis.Kletnieks 2009-10-28 13:44 ` Andi Kleen 2009-10-28 4:26 ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox