linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Oscar Salvador <OSalvador@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@gmail.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Stable tree <stable@vger.kernel.org>
Subject: Re: [RFC PATCH] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
Date: Wed, 5 Dec 2018 13:29:18 +0100	[thread overview]
Message-ID: <20181205122918.GL1286@dhcp22.suse.cz> (raw)
In-Reply-To: <20181203100309.14784-1-mhocko@kernel.org>

On Mon 03-12-18 11:03:09, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> We have received a bug report that an injected MCE about faulty memory
> prevents memory offline to succeed. The underlying reason is that the
> HWPoison page has an elevated reference count and the migration keeps
> failing. There are two problems with that. First of all it is dubious
> to migrate the poisoned page because we know that accessing that memory
> is possible to fail. Secondly it doesn't make any sense to migrate a
> potentially broken content and preserve the memory corruption over to a
> new location.
> 
> Oscar has found out that it is the elevated reference count from
> memory_failure that is confusing the offlining path. HWPoisoned pages
> are isolated from the LRU list but __offline_pages might still try to
> migrate them if there is any preceding migrateable pages in the pfn
> range. Such a migration would fail due to the reference count but
> the migration code would put it back on the LRU list. This is quite
> wrong in itself but it would also make scan_movable_pages stumble over
> it again without any way out.
> 
> This means that the hotremove with hwpoisoned pages has never really
> worked (without a luck). HWPoisoning really needs a larger surgery
> but an immediate and backportable fix is to skip over these pages during
> offlining. Even if they are still mapped for some reason then
> try_to_unmap should turn those mappings into hwpoison ptes and cause
> SIGBUS on access. Nobody should be really touching the content of the
> page so it should be safe to ignore them even when there is a pending
> reference count.

After some more thinking I am not really sure the above reasoning is
still true with the current upstream kernel. Maybe I just managed to
confuse myself so please hold off on this patch for now. Testing by
Oscar has shown this patch is helping but the changelog might need to be
updated.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2018-12-05 12:29 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 10:03 [RFC PATCH] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined Michal Hocko
2018-12-04  7:21 ` Naoya Horiguchi
2018-12-04  8:48   ` Michal Hocko
2018-12-04  9:11     ` Naoya Horiguchi
2018-12-04  9:35       ` Michal Hocko
2018-12-05  1:14         ` Naoya Horiguchi
2018-12-04 11:22 ` David Hildenbrand
2018-12-04 12:30 ` osalvador
2018-12-05 12:29 ` Michal Hocko [this message]
2018-12-05 16:57   ` Michal Hocko
2018-12-06  5:21     ` Naoya Horiguchi
2018-12-06  8:32       ` Michal Hocko
2018-12-06  8:40         ` osalvador
2018-12-06  9:15         ` Naoya Horiguchi
2018-12-06 12:02           ` Michal Hocko
2018-12-06  6:43     ` osalvador
2018-12-06  9:02     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205122918.GL1286@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=OSalvador@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).