public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: <dougthompson@xmission.com>, <linux-edac@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] EDAC, MCE, AMD: Fix code to prevent NULL dereference
Date: Tue, 18 Feb 2014 12:27:19 -0600	[thread overview]
Message-ID: <5303A607.7090309@amd.com> (raw)
In-Reply-To: <20140218084636.GA24465@pd.tnic>

On 2/18/2014 2:46 AM, Borislav Petkov wrote:
> Ok, let's try a simpler thing. Only build-tested here:
>
>   
> +	if (!fam_ops)
> +		return NOTIFY_DONE;
> +
>   	if (amd_filter_mce(m))
>   		return NOTIFY_STOP;
>   
> @@ -816,10 +819,10 @@ static int __init mce_amd_init(void)
>   	struct cpuinfo_x86 *c = &boot_cpu_data;
>   
>   	if (c->x86_vendor != X86_VENDOR_AMD)
> -		return 0;
> +		return -ENODEV;
>   
>   	if (c->x86 < 0xf || c->x86 > 0x16)
> -		return 0;
> +		return -ENODEV;
>   
>   	fam_ops = kzalloc(sizeof(struct amd_decoder_ops), GFP_KERNEL);
>   	if (!fam_ops)
> @@ -874,6 +877,7 @@ static int __init mce_amd_init(void)
>   	default:
>   		printk(KERN_WARNING "Huh? What family is it: 0x%x?!\n", c->x86);
>   		kfree(fam_ops);
> +		fam_ops = NULL;
>   		return -EINVAL;
>   	}
>   
>

This works. But a drawback is that you wouldn't get the output from more 
generic error decoding that happens after the 'switch' in amd_decode_mce:

     pr_emerg(HW_ERR "Error Status: %s\n", decode_error_status(m))
     (etc..) (etc..)
     amd_decode_err_code(m->status & 0xffff);

A quick fix for this is to rearrange the above chunk of code to happen 
before the 'switch'
Tried it on local machine.Here's some sample outputs:

on unsupported h/w:
[   46.822828] [Hardware Error]: Error Status: Uncorrected, software 
containable error.
[   46.822846] [Hardware Error]: CPU:0 (15:30:0) 
MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[   46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, 
mem-tx: GEN, part-proc: GEN (timed out)

on supported h/w:(a MC0 error)
[   84.305292] [Hardware Error]: Error Status: Uncorrected, software 
containable error.
[   84.305312] [Hardware Error]: CPU:0 (15:30:0) 
MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[   84.305327] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, 
mem-tx: GEN, part-proc: GEN (timed out)
[   84.305343] [Hardware Error]: MC0 Error:  Internal error condition 
type 1.

on supported h/w:(a MC4 ECC error)
[  128.942878] [Hardware Error]: Error Status: System Fatal error.
[  128.942897] [Hardware Error]: CPU:0 (15:30:0) 
MC4_STATUS[-|UE|-|PCC|AddrV|-|-|UECC]: 0xa600200000080a23
[  128.942914] [Hardware Error]: MC4_ADDR: 0x0000000000000000
[  128.942922] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, 
mem-tx: WR, part-proc: RES (no timeout)
[  128.942939] [Hardware Error]: MC4 Error (node 0): DRAM ECC error 
detected on the NB.
[  128.942971] EDAC MC0: 1 UE on mc#0csrow#2channel#0 (csrow:2 channel:0 
page:0x0 offset:0x0 grain:0)

A word about your earlier suggestion of using amd_notifier_call_chain in 
mce_amd_inj:
The changes will need to be more involved..
- Firstly, x86_mce_decoder_chain is defined in mce.c. So we'd need to 
move it to somewhere in asm/mce.h
- include notifier.h in asm/mce.h (build error saying there are multiple 
definitions of 'x86_mce_decoder_chain' when I tried this.. haven't 
figured out why yet..)
- You'd need to change i_mce to pointer type which in turn will need 
changes in the manner we reference the struct variables in the code

Not sure if you need these many changes, not to mention - touch common 
mce code.
Simpler solution might be to rearrange the code in amd_decode_mce and 
use your hunk..
Thoughts?

Thanks,
-Aravind.



  reply	other threads:[~2014-02-18 18:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-17 17:49 [PATCH] EDAC, MCE, AMD: Fix code to prevent NULL dereference Aravind Gopalakrishnan
2014-02-17 18:27 ` Borislav Petkov
2014-02-17 19:26   ` Aravind Gopalakrishnan
2014-02-17 19:41     ` Borislav Petkov
2014-02-17 22:36       ` Aravind Gopalakrishnan
2014-02-18  0:36         ` Borislav Petkov
2014-02-18  8:46           ` Borislav Petkov
2014-02-18 18:27             ` Aravind Gopalakrishnan [this message]
2014-02-20  9:32               ` Borislav Petkov
2014-02-20 16:07                 ` Aravind Gopalakrishnan
2014-02-20 16:13                   ` Borislav Petkov
2014-02-21 14:23                     ` Borislav Petkov
2014-02-21 16:46                       ` Aravind Gopalakrishnan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5303A607.7090309@amd.com \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=bp@alien8.de \
    --cc=dougthompson@xmission.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox