All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Jue Wang <juew@google.com>
Cc: "bp@alien8.de" <bp@alien8.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"luto@kernel.org" <luto@kernel.org>,
	"naoya.horiguchi@nec.com" <naoya.horiguchi@nec.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"yaoaili@kingsoft.com" <yaoaili@kingsoft.com>
Subject: RE: [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery
Date: Mon, 19 Apr 2021 21:41:33 +0000	[thread overview]
Message-ID: <c2241025107a4f168070348b21d7bb78@intel.com> (raw)
In-Reply-To: <CAPcxDJ6SgSagJrF7u576WUb6p7Hg7+beYVoCpJ86Ocsb-mCHmQ@mail.gmail.com>

>> But there are places in the kernel where the code assumes that this
>> EFAULT return was simply because of a page fault. The code takes some
>> action to fix that, and then retries the access. This results in a second
>> machine check.
>
> What about return EHWPOISON instead of EFAULT and update the callers
> to handle EHWPOISON explicitly: i.e., not retry but give up on the page?

That seems like a good idea to me. But I got some pushback when I started
on this path earlier with some patches to the futex code.  But back then I
wasn't using error return of EHWPOISON ... possibly the code would look
less hacky with that explicitly called out.

The futex case was specifically for code using pagefault_disable(). Likely
all the other callers would need to be audited (but there are only a few dozen
places, so not too big of a deal).

> My main concern is that the strong assumptions that the kernel can't hit more
> than a fixed number of poisoned cache lines before turning to user space
> may simply not be true.

Agreed.

> When DIMM goes bad, it can easily affect an entire bank or entire ram device
> chip. Even with memory interleaving, it's possible that a kernel control path
> touches lots of poisoned cache lines in the buffer it is working through.

These larger failures have other problems ... dozens of unrelated pages
may be affected. In a perfect world Linux would be told on the first error
that this is just one of many errors ... and be given a list. But in the real
world that isn't likely to happen :-(

-Tony

  reply	other threads:[~2021-04-19 21:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 21:28 [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery Jue Wang
2021-04-19 21:41 ` Luck, Tony [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-03-26  0:02 [RFC 0/4] Fix machine check recovery for copy_from_user Tony Luck
2021-03-26  0:02 ` [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-04-08 13:36   ` Borislav Petkov
2021-04-08 16:06     ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2241025107a4f168070348b21d7bb78@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=x86@kernel.org \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.