linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>, linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Yang Shi <shy828301@gmail.com>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <songmuchun@bytedance.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v1 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug
Date: Wed, 27 Apr 2022 12:48:16 +0200	[thread overview]
Message-ID: <54399815-10fe-9d43-7ada-7ddb55e798cb@redhat.com> (raw)
In-Reply-To: <20220427042841.678351-1-naoya.horiguchi@linux.dev>

On 27.04.22 06:28, Naoya Horiguchi wrote:
> Hi,
> 
> This patchset addresses some issues on the workload related to hwpoison,
> hugetlb, and memory_hotplug.  The problem in memory hotremove reported by
> Miaohe Lin [1] is mentioned in 2/4.  This patch depends on "storing raw
> error info" functionality provided by 1/4. This patch also provide delayed
> dissolve function too.
> 
> Patch 3/4 is to adjust unpoison to new semantics of HPageMigratable for
> hwpoisoned hugepage. And 4/4 is the fix for the inconsistent counter issue.
> 
> [1] https://lore.kernel.org/linux-mm/20220421135129.19767-1-linmiaohe@huawei.com/
> 
> Please let me know if you have any suggestions and comments.
> 

Hi,

I raised some time ago already that I don't quite see the value of
allowing memory offlining with poisened pages.

1) It overcomplicates the offlining code and seems to be partially
   broken
2) It happens rarely (ever?), so do we even care?
3) Once the memory is offline, we can re-online it and lost HWPoison.
   The memory can be happily used.

3) can happen easily if our DIMM consists of multiple memory blocks and
offlining of some memory block fails -> we'll re-online all already
offlined ones. We'll happily reuse previously HWPoisoned pages, which
feels more dangerous to me then just leaving the DIMM around (and
eventually hwpoisoning all pages on it such that it won't get used
anymore?).

So maybe we should just fail offlining once we stumble over a hwpoisoned
page?

Yes, we would disallow removing a semi-broken DIMM from the system that
was onlined MOVABLE. I wonder if we really need that and how often it
happens in real life. Most systems I am aware of don't allow for
replacing individual DIMMs, but only complete NUMA nodes. Hm.

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2022-04-27 10:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27  4:28 [RFC PATCH v1 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Naoya Horiguchi
2022-04-27  4:28 ` [RFC PATCH v1 1/4] mm, hwpoison, hugetlb: introduce SUBPAGE_INDEX_HWPOISON to save raw error page Naoya Horiguchi
2022-04-27  7:11   ` Miaohe Lin
2022-04-27 13:03     ` HORIGUCHI NAOYA(堀口 直也)
2022-04-28  3:14       ` Miaohe Lin
2022-05-12 22:31   ` Jane Chu
2022-05-12 22:49     ` HORIGUCHI NAOYA(堀口 直也)
2022-04-27  4:28 ` [RFC PATCH v1 2/4] mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage Naoya Horiguchi
2022-04-29  8:49   ` Miaohe Lin
2022-05-09  7:55     ` HORIGUCHI NAOYA(堀口 直也)
2022-05-09  8:57       ` Miaohe Lin
2022-04-27  4:28 ` [RFC PATCH v1 3/4] mm, hwpoison: add parameter unpoison to get_hwpoison_huge_page() Naoya Horiguchi
2022-04-27  4:28 ` [RFC PATCH v1 4/4] mm, memory_hotplug: fix inconsistent num_poisoned_pages on memory hotremove Naoya Horiguchi
2022-04-28  3:20   ` Miaohe Lin
2022-04-28  4:05     ` HORIGUCHI NAOYA(堀口 直也)
2022-04-28  7:16       ` Miaohe Lin
2022-05-09 13:34         ` Naoya Horiguchi
2022-04-27 10:48 ` David Hildenbrand [this message]
2022-04-27 12:20   ` [RFC PATCH v1 0/4] mm, hwpoison: improve handling workload related to hugetlb and memory_hotplug Oscar Salvador
2022-04-27 12:20   ` HORIGUCHI NAOYA(堀口 直也)
2022-04-28  8:44     ` David Hildenbrand
2022-05-09  7:29       ` HORIGUCHI NAOYA(堀口 直也)
2022-05-09  9:04         ` Miaohe Lin
2022-05-09  9:58           ` Oscar Salvador
2022-05-09 10:53             ` Miaohe Lin
2022-05-11 15:11               ` David Hildenbrand
2022-05-11 16:10                 ` HORIGUCHI NAOYA(堀口 直也)
2022-05-11 16:22                   ` David Hildenbrand
2022-05-12  3:04                     ` Miaohe Lin
2022-05-12  6:35                     ` HORIGUCHI NAOYA(堀口 直也)
2022-05-12  7:28                       ` David Hildenbrand
2022-05-12 11:13                         ` Miaohe Lin
2022-05-12 12:59                           ` David Hildenbrand
2022-05-16  3:25                             ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54399815-10fe-9d43-7ada-7ddb55e798cb@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).