From: Michael Ellerman <mpe@ellerman.id.au>
To: Daniel Axtens <dja@axtens.net>, linuxppc-dev@ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>,
stable@vger.kernel.org, Daniel Axtens <dja@axtens.net>
Subject: Re: powerpc/mce: fix off by one errors in mce event handling
Date: Mon, 11 May 2015 20:16:37 +1000 (AEST) [thread overview]
Message-ID: <20150511101637.3ADB614016A@ozlabs.org> (raw)
In-Reply-To: <1431305312-9955-1-git-send-email-dja@axtens.net>
Stable folks please ignore this patch.
Comments below.
On Mon, 2015-11-05 at 00:48:32 UTC, Daniel Axtens wrote:
> Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
> save_mce_event, index got the value of mce_nest_count, and
> mce_nest_count was incremented *after* index was set.
>
> However, that patch changed the behaviour so that mce_nest count was
> incremented *before* setting index.
>
> This causes an off-by-one error, as get_mce_event sets index as
> mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
> bogus data, causing warnings like
> "Machine Check Exception, Unknown event version 0 !"
> and breaking MCEs handling.
>
> Restore the old behaviour and unbreak MCE handling by moving the
> increment to after index is set.
>
> The same broken change occured in machine_check_queue_event (which set
> a queue read by machine_check_process_queued_event). Fix that too,
> unbreaking printing of MCE information.
>
> Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
> CC: stable@vger.kernel.org
> CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 15c99b6..f774b64 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -73,8 +73,9 @@ void save_mce_event(struct pt_regs *regs, long handled,
> uint64_t nip, uint64_t addr)
> {
> uint64_t srr1;
> - int index = __this_cpu_inc_return(mce_nest_count);
> + int index = __this_cpu_read(mce_nest_count);
> struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
> + __this_cpu_inc(mce_nest_count);
As we discussed offline, this looks racy against another machine check coming
in. ie. if another machine check comes in after mce_nest_count is loaded but
before it's incremented and stored, we might lose an increment.
But the original code also looks racy, just maybe with a smaller window.
Fixing it properly might be a bit involved though, so for a fix for stable we
might want to just do:
- int index = __this_cpu_inc_return(mce_nest_count);
+ int index = __this_cpu_inc_return(mce_nest_count) - 1;
Which will hopefully generate a ld/addi/std that is at least minimal in its
exposure to the race.
Thoughts?
cheers
WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: Daniel Axtens <dja@axtens.net>, linuxppc-dev@ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>,
Daniel Axtens <dja@axtens.net>,
stable@vger.kernel.org
Subject: Re: powerpc/mce: fix off by one errors in mce event handling
Date: Mon, 11 May 2015 20:16:37 +1000 (AEST) [thread overview]
Message-ID: <20150511101637.3ADB614016A@ozlabs.org> (raw)
In-Reply-To: <1431305312-9955-1-git-send-email-dja@axtens.net>
Stable folks please ignore this patch.
Comments below.
On Mon, 2015-11-05 at 00:48:32 UTC, Daniel Axtens wrote:
> Before 69111bac42f5 ("powerpc: Replace __get_cpu_var uses"), in
> save_mce_event, index got the value of mce_nest_count, and
> mce_nest_count was incremented *after* index was set.
>
> However, that patch changed the behaviour so that mce_nest count was
> incremented *before* setting index.
>
> This causes an off-by-one error, as get_mce_event sets index as
> mce_nest_count - 1 before reading mce_event. Thus get_mce_event reads
> bogus data, causing warnings like
> "Machine Check Exception, Unknown event version 0 !"
> and breaking MCEs handling.
>
> Restore the old behaviour and unbreak MCE handling by moving the
> increment to after index is set.
>
> The same broken change occured in machine_check_queue_event (which set
> a queue read by machine_check_process_queued_event). Fix that too,
> unbreaking printing of MCE information.
>
> Fixes: 69111bac42f5 ("powerpc: Replace __get_cpu_var uses")
> CC: stable@vger.kernel.org
> CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
> arch/powerpc/kernel/mce.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 15c99b6..f774b64 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -73,8 +73,9 @@ void save_mce_event(struct pt_regs *regs, long handled,
> uint64_t nip, uint64_t addr)
> {
> uint64_t srr1;
> - int index = __this_cpu_inc_return(mce_nest_count);
> + int index = __this_cpu_read(mce_nest_count);
> struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
> + __this_cpu_inc(mce_nest_count);
As we discussed offline, this looks racy against another machine check coming
in. ie. if another machine check comes in after mce_nest_count is loaded but
before it's incremented and stored, we might lose an increment.
But the original code also looks racy, just maybe with a smaller window.
Fixing it properly might be a bit involved though, so for a fix for stable we
might want to just do:
- int index = __this_cpu_inc_return(mce_nest_count);
+ int index = __this_cpu_inc_return(mce_nest_count) - 1;
Which will hopefully generate a ld/addi/std that is at least minimal in its
exposure to the race.
Thoughts?
cheers
next prev parent reply other threads:[~2015-05-11 10:16 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-11 0:48 [PATCH] powerpc/mce: fix off by one errors in mce event handling Daniel Axtens
2015-05-11 0:48 ` Daniel Axtens
2015-05-11 7:05 ` Mahesh J Salgaonkar
2015-05-11 10:16 ` Michael Ellerman [this message]
2015-05-11 10:16 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150511101637.3ADB614016A@ozlabs.org \
--to=mpe@ellerman.id.au \
--cc=dja@axtens.net \
--cc=linuxppc-dev@ozlabs.org \
--cc=mahesh@linux.vnet.ibm.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.