* [PATCH] powerpc/mce: fix off by one errors in mce event handling
@ 2015-05-11 0:48 Daniel Axtens
2015-05-11 7:05 ` Mahesh J Salgaonkar
2015-05-11 10:16 ` Michael Ellerman
0 siblings, 2 replies; 3+ messages in thread
From: Daniel Axtens @ 2015-05-11 0:48 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mahesh Salgaonkar, Daniel Axtens, stable
Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
save_mce_event, index got the value of mce_nest_count, and
mce_nest_count was incremented *after* index was set.
However, that patch changed the behaviour so that mce_nest count was
incremented *before* setting index.
This causes an off-by-one error, as get_mce_event sets index as
mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
bogus data, causing warnings like
"Machine Check Exception, Unknown event version 0 !"
and breaking MCEs handling.
Restore the old behaviour and unbreak MCE handling by moving the
increment to after index is set.
The same broken change occured in machine_check_queue_event (which set
a queue read by machine_check_process_queued_event). Fix that too,
unbreaking printing of MCE information.
Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
CC: stable@vger.kernel.org
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
arch/powerpc/kernel/mce.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 15c99b6..f774b64 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -73,8 +73,9 @@ void save_mce_event(struct pt_regs *regs, long handled,
uint64_t nip, uint64_t addr)
{
uint64_t srr1;
- int index = __this_cpu_inc_return(mce_nest_count);
+ int index = __this_cpu_read(mce_nest_count);
struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
+ __this_cpu_inc(mce_nest_count);
/*
* Return if we don't have enough space to log mce event.
@@ -184,7 +185,8 @@ void machine_check_queue_event(void)
if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
return;
- index = __this_cpu_inc_return(mce_queue_count);
+ index = __this_cpu_read(mce_queue_count);
+ __this_cpu_inc(mce_queue_count);
/* If queue is full, just return for now. */
if (index >= MAX_MC_EVT) {
__this_cpu_dec(mce_queue_count);
--
2.1.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] powerpc/mce: fix off by one errors in mce event handling
2015-05-11 0:48 [PATCH] powerpc/mce: fix off by one errors in mce event handling Daniel Axtens
@ 2015-05-11 7:05 ` Mahesh J Salgaonkar
2015-05-11 10:16 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Mahesh J Salgaonkar @ 2015-05-11 7:05 UTC (permalink / raw)
To: Daniel Axtens; +Cc: linuxppc-dev, stable
On 2015-05-11 10:48:32 Mon, Daniel Axtens wrote:
> Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
> save_mce_event, index got the value of mce_nest_count, and
> mce_nest_count was incremented *after* index was set.
>
> However, that patch changed the behaviour so that mce_nest count was
> incremented *before* setting index.
>
> This causes an off-by-one error, as get_mce_event sets index as
> mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
> bogus data, causing warnings like
> "Machine Check Exception, Unknown event version 0 !"
> and breaking MCEs handling.
>
> Restore the old behaviour and unbreak MCE handling by moving the
> increment to after index is set.
>
> The same broken change occured in machine_check_queue_event (which set
> a queue read by machine_check_process_queued_event). Fix that too,
> unbreaking printing of MCE information.
>
> Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
> CC: stable@vger.kernel.org
> CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 15c99b6..f774b64 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -73,8 +73,9 @@ void save_mce_event(struct pt_regs *regs, long handled,
> uint64_t nip, uint64_t addr)
> {
> uint64_t srr1;
> - int index = __this_cpu_inc_return(mce_nest_count);
> + int index = __this_cpu_read(mce_nest_count);
> struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
> + __this_cpu_inc(mce_nest_count);
>
> /*
> * Return if we don't have enough space to log mce event.
> @@ -184,7 +185,8 @@ void machine_check_queue_event(void)
> if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
> return;
>
> - index = __this_cpu_inc_return(mce_queue_count);
> + index = __this_cpu_read(mce_queue_count);
> + __this_cpu_inc(mce_queue_count);
> /* If queue is full, just return for now. */
> if (index >= MAX_MC_EVT) {
> __this_cpu_dec(mce_queue_count);
> --
> 2.1.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
--
Mahesh J Salgaonkar
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: powerpc/mce: fix off by one errors in mce event handling
2015-05-11 0:48 [PATCH] powerpc/mce: fix off by one errors in mce event handling Daniel Axtens
2015-05-11 7:05 ` Mahesh J Salgaonkar
@ 2015-05-11 10:16 ` Michael Ellerman
1 sibling, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2015-05-11 10:16 UTC (permalink / raw)
To: Daniel Axtens, linuxppc-dev; +Cc: Mahesh Salgaonkar, stable, Daniel Axtens
Stable folks please ignore this patch.
Comments below.
On Mon, 2015-11-05 at 00:48:32 UTC, Daniel Axtens wrote:
> Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
> save_mce_event, index got the value of mce_nest_count, and
> mce_nest_count was incremented *after* index was set.
>
> However, that patch changed the behaviour so that mce_nest count was
> incremented *before* setting index.
>
> This causes an off-by-one error, as get_mce_event sets index as
> mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
> bogus data, causing warnings like
> "Machine Check Exception, Unknown event version 0 !"
> and breaking MCEs handling.
>
> Restore the old behaviour and unbreak MCE handling by moving the
> increment to after index is set.
>
> The same broken change occured in machine_check_queue_event (which set
> a queue read by machine_check_process_queued_event). Fix that too,
> unbreaking printing of MCE information.
>
> Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
> CC: stable@vger.kernel.org
> CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 15c99b6..f774b64 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -73,8 +73,9 @@ void save_mce_event(struct pt_regs *regs, long handled,
> uint64_t nip, uint64_t addr)
> {
> uint64_t srr1;
> - int index = __this_cpu_inc_return(mce_nest_count);
> + int index = __this_cpu_read(mce_nest_count);
> struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
> + __this_cpu_inc(mce_nest_count);
As we discussed offline, this looks racy against another machine check coming
in. ie. if another machine check comes in after mce_nest_count is loaded but
before it's incremented and stored, we might lose an increment.
But the original code also looks racy, just maybe with a smaller window.
Fixing it properly might be a bit involved though, so for a fix for stable we
might want to just do:
- int index = __this_cpu_inc_return(mce_nest_count);
+ int index = __this_cpu_inc_return(mce_nest_count) - 1;
Which will hopefully generate a ld/addi/std that is at least minimal in its
exposure to the race.
Thoughts?
cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-05-11 10:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-11 0:48 [PATCH] powerpc/mce: fix off by one errors in mce event handling Daniel Axtens
2015-05-11 7:05 ` Mahesh J Salgaonkar
2015-05-11 10:16 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).