From: Andi Kleen <andi@firstfloor.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>, linux-kernel@vger.kernel.org
Subject: Re: [git pull] machine check recovery fix
Date: Mon, 21 May 2012 16:32:10 -0700 [thread overview]
Message-ID: <m2mx518611.fsf@firstfloor.org> (raw)
In-Reply-To: <CA+55aFygZYVpM7Agy+Vi+Majaa=7GLzeS8Mp9v3jiq-t+_=hLQ@mail.gmail.com> (Linus Torvalds's message of "Thu, 17 May 2012 20:33:44 -0700")
Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> In fact, it's *all* crap. Because it shouldn't check "m->cs" and
> "m->ip" at all, because what matters is not which instruction caused
> the MCE, but whether the *return* address is in kernel mode or not!
No it matters which instruction caused the error, because it's the
one which saw data corruption. If that was not in kernel you
can safely just return because the kernel is completely fine
and the instruction can be restarted. It's just like a interrupt.
In the cases where this cannot be determined the MCE code
only uses the address and does not use this.
> Maybe the error that triggered the MCE happened in user mode, but
> asynchronously, so the return address is in kernel mode. So the whole
> "error_context()" thing is testing entirely the wrong thing.
EIPV==1 means the error IP is valid.
The asynchronous cases never handle this.
Yes the logic is rather hairy, but mainly because the whole problem
is very.
> That "is it in kernel mode" check also seems to not know about vm86
> mode. Let's hope those MCE's can never happen on an instruction in
> vm86 mode, because then the CS check is crap too.
I fixed the VM86 thing a long time ago, but it was never merged
unfortunately. Not that it matters much, because the systems which
have recoverable machine checks usually have far too much memory
for 32bit kernels.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
next prev parent reply other threads:[~2012-05-21 23:32 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-17 17:10 [git pull] machine check recovery fix Luck, Tony
2012-05-17 22:45 ` Linus Torvalds
2012-05-18 0:14 ` Tony Luck
2012-05-18 0:25 ` Linus Torvalds
2012-05-18 2:37 ` Tony Luck
2012-05-18 3:33 ` Linus Torvalds
2012-05-18 16:46 ` Linus Torvalds
2012-05-18 16:57 ` Luck, Tony
2012-05-18 17:40 ` Borislav Petkov
2012-05-21 23:32 ` Andi Kleen [this message]
2012-05-21 23:43 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2mx518611.fsf@firstfloor.org \
--to=andi@firstfloor.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox