All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: ruiv.wang@gmail.com
Cc: linux-kernel@vger.kernel.org, tony.luck@intel.com,
	gong.chen@linux.intel.com, rui.y.wang@intel.com
Subject: Re: [PATCH v3] x86/mce: Try printing all machine check banks known before panic
Date: Wed, 19 Nov 2014 11:29:54 +0100	[thread overview]
Message-ID: <20141119102954.GA5617@pd.tnic> (raw)
In-Reply-To: <1416388961-24159-1-git-send-email-ruiv.wang@gmail.com>

On Wed, Nov 19, 2014 at 05:22:41PM +0800, ruiv.wang@gmail.com wrote:
> From: Rui Wang <rui.y.wang@intel.com>
> 
> There are cases when an machine check panics without giving any information
> about the error:
> 
> [  177.806166] Kernel panic - not syncing: Machine check from unknown source
> 
> No information besides that it is a machine check. This happens in two cases:
> 1) The CPU logs the error with the MCi_STATUS.EN bit set to zero, and Linux
>    ignores EN=0 entries (as it should).

Well, I guess we shouldn't anymore. Apparently hw forgets to set the
bit when raising an MCE so then we should ignore it too in mce-severity
and delete that piece or grade it as higher severity based on, I dunno,
b0rked hardware family/model/stepping or whatever bit we set...

        MCESEV(
                NO, "Not enabled",
                BITCLR(MCI_STATUS_EN)
                ),

> 2) In normal processing the MCE handler ignores banks that do not contain fatal
>    or unrecoverable errors (these would later be found and logged by the CMCI
>    handler). If we panic, these will never be logged, but could be important
>    to diagnose the problem.

Well, we do this:

                /*
                 * Non uncorrected or non signaled errors are handled by
                 * machine_check_poll. Leave them alone, unless this panics.
                 */
                if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
                        !no_way_out)
                        continue;

so no_way_out gets indirectly controlled by mce-severity too. So I guess
mce-severity would need adjusting instead of adding more stuff to the #MC
handler.

Btw, the panic message comes from

        /*
         * No machine check event found. Must be some external
         * source or one CPU is hung. Panic.
         */
        if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
                mce_panic("Machine check from unknown source", NULL, NULL);

so fixing mce_severity is what should happen here instead, IMO.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

  reply	other threads:[~2014-11-19 10:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-19  9:22 [PATCH v3] x86/mce: Try printing all machine check banks known before panic ruiv.wang
2014-11-19 10:29 ` Borislav Petkov [this message]
2014-11-19 23:34   ` Luck, Tony
2014-11-20 10:15     ` Borislav Petkov
2014-11-21  1:20       ` rui wang
2014-11-21 16:41         ` Borislav Petkov
2014-11-21 17:20           ` Luck, Tony
2014-11-21 18:13             ` Borislav Petkov
2014-11-21 21:31               ` Luck, Tony
2014-11-21 21:35                 ` Borislav Petkov
2014-11-21 21:59                   ` Luck, Tony
2014-11-23 20:55                     ` Borislav Petkov
2014-11-22  2:16               ` rui wang
2014-11-22  9:44                 ` Borislav Petkov
2014-11-22 15:32                   ` rui wang
2014-11-22 16:31                     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141119102954.GA5617@pd.tnic \
    --to=bp@alien8.de \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rui.y.wang@intel.com \
    --cc=ruiv.wang@gmail.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.