* [PATCH] x86/mce: Clear a useless global variable in mce.c
@ 2014-05-17 8:45 Chen Yucong
2014-05-17 9:58 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Chen Yucong @ 2014-05-17 8:45 UTC (permalink / raw)
To: tony.luck; +Cc: bp, linux-kernel, linux-edac, Chen Yucong
This patch is just used to remove a useless global variable mce_entry
and relative operations in mce.c.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 5 -----
2 files changed, 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6e4ce2d..958b90f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -176,8 +176,6 @@ int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
-extern atomic_t mce_entry;
-
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c8..8f520a1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -60,8 +60,6 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
#define SPINUNIT 100 /* 100ns */
-atomic_t mce_entry;
-
DEFINE_PER_CPU(unsigned, mce_exception_count);
struct mce_bank *mce_banks __read_mostly;
@@ -1041,8 +1039,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
char *msg = "Unknown";
- atomic_inc(&mce_entry);
-
this_cpu_inc(mce_exception_count);
if (!cfg->banks)
@@ -1172,7 +1168,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
- atomic_dec(&mce_entry);
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-17 8:45 Chen Yucong
@ 2014-05-17 9:58 ` Borislav Petkov
2014-05-19 0:08 ` Chen Yucong
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-17 9:58 UTC (permalink / raw)
To: Chen Yucong; +Cc: tony.luck, linux-kernel, linux-edac
On Sat, May 17, 2014 at 04:45:24PM +0800, Chen Yucong wrote:
> This patch is just used to remove a useless global variable mce_entry
> and relative operations in mce.c.
Well, I can see from the diff below what you're saying here but a commit
message should contain information which explains *why* you're doing the
change and not *what* you're doing - that we can see.
IOW, you could instead do some git history research and write in your
commit message why this mce_entry got unused and quote commits which
removed it, why they removed it and why we want to remove the remains of
it now.
In any case, the change itself is correct, I would like to have a commit
message which explains why we're removing it.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] x86/mce: Clear a useless global variable in mce.c
@ 2014-05-17 12:05 Chen Yucong
2014-05-19 17:59 ` Luck, Tony
0 siblings, 1 reply; 9+ messages in thread
From: Chen Yucong @ 2014-05-17 12:05 UTC (permalink / raw)
To: tony.luck; +Cc: bp, linux-kernel, linux-edac, Chen Yucong
The mce_entry, which was defined as a global variable in mce.c, was
used in nmi_watchdog_tick().But nmi_watchdog_tick() has been discarded
in the new generic nmi_watchdog implementation - commit 1fb9d6ad2766.
So, of course, there is no need for the mce_entry.
This patch is just used to remove a useless global variable mce_entry
and relative operations in mce.c.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 5 -----
2 files changed, 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6e4ce2d..958b90f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -176,8 +176,6 @@ int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
-extern atomic_t mce_entry;
-
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c8..8f520a1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -60,8 +60,6 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
#define SPINUNIT 100 /* 100ns */
-atomic_t mce_entry;
-
DEFINE_PER_CPU(unsigned, mce_exception_count);
struct mce_bank *mce_banks __read_mostly;
@@ -1041,8 +1039,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
char *msg = "Unknown";
- atomic_inc(&mce_entry);
-
this_cpu_inc(mce_exception_count);
if (!cfg->banks)
@@ -1172,7 +1168,6 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
- atomic_dec(&mce_entry);
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-17 9:58 ` Borislav Petkov
@ 2014-05-19 0:08 ` Chen Yucong
0 siblings, 0 replies; 9+ messages in thread
From: Chen Yucong @ 2014-05-19 0:08 UTC (permalink / raw)
To: Borislav Petkov; +Cc: tony.luck, linux-kernel, linux-edac
On Sat, 2014-05-17 at 11:58 +0200, Borislav Petkov wrote:
> On Sat, May 17, 2014 at 04:45:24PM +0800, Chen Yucong wrote:
> > This patch is just used to remove a useless global variable mce_entry
> > and relative operations in mce.c.
>
> Well, I can see from the diff below what you're saying here but a commit
> message should contain information which explains *why* you're doing the
> change and not *what* you're doing - that we can see.
>
> IOW, you could instead do some git history research and write in your
> commit message why this mce_entry got unused and quote commits which
> removed it, why they removed it and why we want to remove the remains of
> it now.
>
> In any case, the change itself is correct, I would like to have a commit
> message which explains why we're removing it.
>
> Thanks.
>
The mce_entry, which was defined as a global variable in mce.c, was
used in nmi_watchdog_tick(). But nmi_watchdog_tick() has been discarded
in the new generic nmi_watchdog implementation - commit 1fb9d6ad2766.
So, of course, there is no need for the mce_entry.
This patch is just used to remove a useless global variable mce_entry
and relative operations in mce.c.
Signed-off-by: Chen Yucong <slaoub@gmail.com>
---
arch/x86/include/asm/mce.h | 2 --
arch/x86/kernel/cpu/mcheck/mce.c | 5 -----
2 files changed, 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6e4ce2d..958b90f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -176,8 +176,6 @@ int mce_available(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(unsigned, mce_exception_count);
DECLARE_PER_CPU(unsigned, mce_poll_count);
-extern atomic_t mce_entry;
-
typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
b/arch/x86/kernel/cpu/mcheck/mce.c
index 68317c8..8f520a1 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -60,8 +60,6 @@ static DEFINE_MUTEX(mce_chrdev_read_mutex);
#define SPINUNIT 100 /* 100ns */
-atomic_t mce_entry;
-
DEFINE_PER_CPU(unsigned, mce_exception_count);
struct mce_bank *mce_banks __read_mostly;
@@ -1041,8 +1039,6 @@ void do_machine_check(struct pt_regs *regs, long
error_code)
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
char *msg = "Unknown";
- atomic_inc(&mce_entry);
-
this_cpu_inc(mce_exception_count);
if (!cfg->banks)
@@ -1172,7 +1168,6 @@ void do_machine_check(struct pt_regs *regs, long
error_code)
mce_report_event(regs);
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
out:
- atomic_dec(&mce_entry);
sync_core();
}
EXPORT_SYMBOL_GPL(do_machine_check);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* RE: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-17 12:05 [PATCH] x86/mce: Clear a useless global variable in mce.c Chen Yucong
@ 2014-05-19 17:59 ` Luck, Tony
2014-05-19 18:15 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2014-05-19 17:59 UTC (permalink / raw)
To: Chen Yucong
Cc: bp@alien8.de, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org
- atomic_inc(&mce_entry);
-
I have used this in the past (in conjunction with an external debugger) to
diagnose problems (not all cpus showing up in the machine check handler).
But I suppose these can also be diagnosed from the "Timeout synchronizing ..."
message from mce_timed_out() [though with a bit less precision ... we know
that some cpus didn't show up, but we don't have a count of how many did,
or how many are missing.
If we print the value of "mce_callin" somewhere in mce_timed_out() ...
then I think we'd have equivalent functionality (in fact better - because
we don't need the external debugger to peek at mce_entry).
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-19 17:59 ` Luck, Tony
@ 2014-05-19 18:15 ` Borislav Petkov
2014-05-19 22:06 ` Luck, Tony
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-19 18:15 UTC (permalink / raw)
To: Luck, Tony
Cc: Chen Yucong, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org
On Mon, May 19, 2014 at 05:59:23PM +0000, Luck, Tony wrote:
> - atomic_inc(&mce_entry);
> -
>
> I have used this in the past (in conjunction with an external debugger) to
> diagnose problems (not all cpus showing up in the machine check handler).
>
> But I suppose these can also be diagnosed from the "Timeout synchronizing ..."
> message from mce_timed_out() [though with a bit less precision ... we know
> that some cpus didn't show up, but we don't have a count of how many did,
> or how many are missing.
>
> If we print the value of "mce_callin" somewhere in mce_timed_out() ...
> then I think we'd have equivalent functionality (in fact better - because
> we don't need the external debugger to peek at mce_entry).
Right, I was thinking about it and this is something maybe you guys
should decide: do we want to panic by default in mce_timed_out if some
cores didn't show up?
I'm looking at this snippet:
/* CHECKME: Make panic default for 1 too? */
if (mca_cfg.tolerant < 1)
mce_panic("Timeout synchronizing machine check over CPUs",
NULL, NULL);
and since we have .tolerant=1 by default...
I mean, does the machine even recover after some of the cores have gone
into the weeds in #MC? Provided, of course, we don't have a no-way-out
MCE and we can resume execution.
Or is the box so hammered that there's no turning back?
Concerning mce_entry, I don't care all that much - if it is really
useful, you might slap a comment saying so and keep it, for all I care.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-19 18:15 ` Borislav Petkov
@ 2014-05-19 22:06 ` Luck, Tony
2014-05-20 10:02 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2014-05-19 22:06 UTC (permalink / raw)
To: Borislav Petkov
Cc: Chen Yucong, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 902 bytes --]
> I mean, does the machine even recover after some of the cores have gone
> into the weeds in #MC? Provided, of course, we don't have a no-way-out
> MCE and we can resume execution.
I doubt there is any hope for recovery if not all processors show up ... things
have to be already very broken for the machine check to be blocked.
> Or is the box so hammered that there's no turning back?
I think so.
> Concerning mce_entry, I don't care all that much - if it is really
> useful, you might slap a comment saying so and keep it, for all I care.
I'm OK with it going - but as I said before I'd like to see mce_callin printed
(so I can tell if just one cpu showed up, just the cpus from one socket, or
some other significant number).
-Tony
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-19 22:06 ` Luck, Tony
@ 2014-05-20 10:02 ` Borislav Petkov
2014-05-20 17:46 ` Tony Luck
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2014-05-20 10:02 UTC (permalink / raw)
To: Luck, Tony
Cc: Chen Yucong, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org
On Mon, May 19, 2014 at 10:06:38PM +0000, Luck, Tony wrote:
> I doubt there is any hope for recovery if not all processors show up
> ... things have to be already very broken for the machine check to be
> blocked.
Good, so this whole babble about the potential of a timeout and whatever
is all beside the point.
What we want to do is if any of the cores are stuck - monarch or not -
we want to panic the hell out of this box and not do anything further.
So only the tolerant check would need adjusting.
> I'm OK with it going - but as I said before I'd like to see mce_callin
> printed (so I can tell if just one cpu showed up, just the cpus from
> one socket, or some other significant number).
I don't think you want to do this unconditionally, do you? Rather maybe
mce_timed_out dumps the order variable before the box panics :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/mce: Clear a useless global variable in mce.c
2014-05-20 10:02 ` Borislav Petkov
@ 2014-05-20 17:46 ` Tony Luck
0 siblings, 0 replies; 9+ messages in thread
From: Tony Luck @ 2014-05-20 17:46 UTC (permalink / raw)
To: Borislav Petkov
Cc: Chen Yucong, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org
On Tue, May 20, 2014 at 3:02 AM, Borislav Petkov <bp@alien8.de> wrote:
>> I'm OK with it going - but as I said before I'd like to see mce_callin
>> printed (so I can tell if just one cpu showed up, just the cpus from
>> one socket, or some other significant number).
>
> I don't think you want to do this unconditionally, do you? Rather maybe
> mce_timed_out dumps the order variable before the box panics :-)
Agreed - not unconditionally - just when we have a problem. Dumping
"order" in the timeout case would be a good place.
-Tony
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-05-20 17:46 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17 12:05 [PATCH] x86/mce: Clear a useless global variable in mce.c Chen Yucong
2014-05-19 17:59 ` Luck, Tony
2014-05-19 18:15 ` Borislav Petkov
2014-05-19 22:06 ` Luck, Tony
2014-05-20 10:02 ` Borislav Petkov
2014-05-20 17:46 ` Tony Luck
-- strict thread matches above, loose matches on Subject: below --
2014-05-17 8:45 Chen Yucong
2014-05-17 9:58 ` Borislav Petkov
2014-05-19 0:08 ` Chen Yucong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox