All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: nao.horiguchi@gmail.com
Cc: tony.luck@intel.com, david@redhat.com, catalin.marinas@arm.com,
	zeil@yandex-team.ru, naoya.horiguchi@nec.com,
	linux-kernel@vger.kernel.org, mhocko@kernel.org,
	linux-mm@kvack.org, aneesh.kumar@linux.vnet.ibm.com,
	akpm@linux-foundation.org, osalvador@suse.de, will@kernel.org,
	linux-arm-kernel@lists.infradead.org, mike.kravetz@oracle.com
Subject: Re: [PATCH v6 00/12] HWPOISON: soft offline rework
Date: Mon, 10 Aug 2020 11:22:55 -0400	[thread overview]
Message-ID: <20200810152254.GC5307@lca.pw> (raw)
In-Reply-To: <20200806184923.7007-1-nao.horiguchi@gmail.com>

On Thu, Aug 06, 2020 at 06:49:11PM +0000, nao.horiguchi@gmail.com wrote:
> Hi,
> 
> This patchset is the latest version of soft offline rework patchset
> targetted for v5.9.
> 
> Since v5, I dropped some patches which tweak refcount handling in
> madvise_inject_error() to avoid the "unknown refcount page" error.
> I don't confirm the fix (that didn't reproduce with v5 in my environment),
> but this change surely call soft_offline_page() after holding refcount,
> so the error should not happen any more.

With this patchset, arm64 is still suffering from premature 512M-size hugepages
allocation failures.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,1.
- mmap and free 2147483648 bytes hugepages on node 0
- mmap and free 2147483648 bytes hugepages on node 1
madvise: Cannot allocate memory

[  292.456538][ T3685] soft offline: 0x8a000: hugepage isolation failed: 0, page count 2, type 7ffff80001000e (referenced|uptodate|dirty|head)
[  292.469113][ T3685] Soft offlining pfn 0x8c000 at process virtual address 0xffff60000000
[  292.983855][ T3685] Soft offlining pfn 0x88000 at process virtual address 0xffff40000000
[  293.271369][ T3685] Soft offlining pfn 0x8a000 at process virtual address 0xffff60000000
[  293.834030][ T3685] Soft offlining pfn 0xa000 at process virtual address 0xffff40000000
[  293.851378][ T3685] soft offline: 0xa000: hugepage migration failed -12, type 7ffff80001000e (referenced|uptodate|dirty|head)

The fresh-booted system still had 40G+ memory free before running the test.

Reverting the following commits allowed the test to run succesfully over and over again.

"mm, hwpoison: remove recalculating hpage"
"mm,hwpoison-inject: don't pin for hwpoison_filter"
"mm,hwpoison: Un-export get_hwpoison_page and make it static"
"mm,hwpoison: kill put_hwpoison_page"
"mm,hwpoison: unify THP handling for hard and soft offline"
"mm,hwpoison: rework soft offline for free pages"
"mm,hwpoison: rework soft offline for in-use pages"
"mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page"

i.e., it is not enough to only revert,

mm,hwpoison: double-check page count in __get_any_page()
mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
mm,hwpoison: return 0 if the page is already poisoned in soft-offline

> 
> Dropped patches
> - mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> - mm,madvise: Refactor madvise_inject_error
> - mm,hwpoison: remove MF_COUNT_INCREASED
> - mm,hwpoison: remove flag argument from soft offline functions
> 
> Thanks,
> Naoya Horiguchi
> 
> Quoting cover letter of v5:
> ----
> Main focus of this series is to stabilize soft offline.  Historically soft
> offlined pages have suffered from racy conditions because PageHWPoison is
> used to a little too aggressively, which (directly or indirectly) invades
> other mm code which cares little about hwpoison.  This results in unexpected
> behavior or kernel panic, which is very far from soft offline's "do not
> disturb userspace or other kernel component" policy.
> 
> Main point of this change set is to contain target page "via buddy allocator",
> where we first free the target page as we do for normal pages, and remove
> from buddy only when we confirm that it reaches free list. There is surely
> race window of page allocation, but that's fine because someone really want
> that page and the page is still working, so soft offline can happily give up.
> 
> v4 from Oscar tries to handle the race around reallocation, but that part
> seems still work in progress, so I decide to separate it for changes into
> v5.9.  Thank you for your contribution, Oscar.
> 
> ---
> Previous versions:
>   v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
>   v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
>   v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/
>   v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/
>   v5: https://lore.kernel.org/linux-mm/20200805204354.GA16406@hori.linux.bs1.fc.nec.co.jp/T/#t
> ---
> Summary:
> 
> Naoya Horiguchi (5):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
>       mm,hwpoison: double-check page count in __get_any_page()
> 
> Oscar Salvador (7):
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  include/linux/mm.h         |   3 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |   5 -
>  mm/memory-failure.c        | 307 +++++++++++++++++++++------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  60 +++++++--
>  8 files changed, 203 insertions(+), 210 deletions(-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: nao.horiguchi@gmail.com
Cc: linux-mm@kvack.org, mhocko@kernel.org, akpm@linux-foundation.org,
	mike.kravetz@oracle.com, osalvador@suse.de, tony.luck@intel.com,
	david@redhat.com, aneesh.kumar@linux.vnet.ibm.com,
	zeil@yandex-team.ru, naoya.horiguchi@nec.com,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
	will@kernel.org
Subject: Re: [PATCH v6 00/12] HWPOISON: soft offline rework
Date: Mon, 10 Aug 2020 11:22:55 -0400	[thread overview]
Message-ID: <20200810152254.GC5307@lca.pw> (raw)
In-Reply-To: <20200806184923.7007-1-nao.horiguchi@gmail.com>

On Thu, Aug 06, 2020 at 06:49:11PM +0000, nao.horiguchi@gmail.com wrote:
> Hi,
> 
> This patchset is the latest version of soft offline rework patchset
> targetted for v5.9.
> 
> Since v5, I dropped some patches which tweak refcount handling in
> madvise_inject_error() to avoid the "unknown refcount page" error.
> I don't confirm the fix (that didn't reproduce with v5 in my environment),
> but this change surely call soft_offline_page() after holding refcount,
> so the error should not happen any more.

With this patchset, arm64 is still suffering from premature 512M-size hugepages
allocation failures.

# git clone https://gitlab.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1
- start: migrate_huge_offline
- use NUMA nodes 0,1.
- mmap and free 2147483648 bytes hugepages on node 0
- mmap and free 2147483648 bytes hugepages on node 1
madvise: Cannot allocate memory

[  292.456538][ T3685] soft offline: 0x8a000: hugepage isolation failed: 0, page count 2, type 7ffff80001000e (referenced|uptodate|dirty|head)
[  292.469113][ T3685] Soft offlining pfn 0x8c000 at process virtual address 0xffff60000000
[  292.983855][ T3685] Soft offlining pfn 0x88000 at process virtual address 0xffff40000000
[  293.271369][ T3685] Soft offlining pfn 0x8a000 at process virtual address 0xffff60000000
[  293.834030][ T3685] Soft offlining pfn 0xa000 at process virtual address 0xffff40000000
[  293.851378][ T3685] soft offline: 0xa000: hugepage migration failed -12, type 7ffff80001000e (referenced|uptodate|dirty|head)

The fresh-booted system still had 40G+ memory free before running the test.

Reverting the following commits allowed the test to run succesfully over and over again.

"mm, hwpoison: remove recalculating hpage"
"mm,hwpoison-inject: don't pin for hwpoison_filter"
"mm,hwpoison: Un-export get_hwpoison_page and make it static"
"mm,hwpoison: kill put_hwpoison_page"
"mm,hwpoison: unify THP handling for hard and soft offline"
"mm,hwpoison: rework soft offline for free pages"
"mm,hwpoison: rework soft offline for in-use pages"
"mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page"

i.e., it is not enough to only revert,

mm,hwpoison: double-check page count in __get_any_page()
mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
mm,hwpoison: return 0 if the page is already poisoned in soft-offline

> 
> Dropped patches
> - mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
> - mm,madvise: Refactor madvise_inject_error
> - mm,hwpoison: remove MF_COUNT_INCREASED
> - mm,hwpoison: remove flag argument from soft offline functions
> 
> Thanks,
> Naoya Horiguchi
> 
> Quoting cover letter of v5:
> ----
> Main focus of this series is to stabilize soft offline.  Historically soft
> offlined pages have suffered from racy conditions because PageHWPoison is
> used to a little too aggressively, which (directly or indirectly) invades
> other mm code which cares little about hwpoison.  This results in unexpected
> behavior or kernel panic, which is very far from soft offline's "do not
> disturb userspace or other kernel component" policy.
> 
> Main point of this change set is to contain target page "via buddy allocator",
> where we first free the target page as we do for normal pages, and remove
> from buddy only when we confirm that it reaches free list. There is surely
> race window of page allocation, but that's fine because someone really want
> that page and the page is still working, so soft offline can happily give up.
> 
> v4 from Oscar tries to handle the race around reallocation, but that part
> seems still work in progress, so I decide to separate it for changes into
> v5.9.  Thank you for your contribution, Oscar.
> 
> ---
> Previous versions:
>   v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/
>   v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/
>   v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/
>   v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/
>   v5: https://lore.kernel.org/linux-mm/20200805204354.GA16406@hori.linux.bs1.fc.nec.co.jp/T/#t
> ---
> Summary:
> 
> Naoya Horiguchi (5):
>       mm,hwpoison: cleanup unused PageHuge() check
>       mm, hwpoison: remove recalculating hpage
>       mm,hwpoison-inject: don't pin for hwpoison_filter
>       mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
>       mm,hwpoison: double-check page count in __get_any_page()
> 
> Oscar Salvador (7):
>       mm,hwpoison: Un-export get_hwpoison_page and make it static
>       mm,hwpoison: Kill put_hwpoison_page
>       mm,hwpoison: Unify THP handling for hard and soft offline
>       mm,hwpoison: Rework soft offline for free pages
>       mm,hwpoison: Rework soft offline for in-use pages
>       mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page
>       mm,hwpoison: Return 0 if the page is already poisoned in soft-offline
> 
>  include/linux/mm.h         |   3 +-
>  include/linux/page-flags.h |   6 +-
>  include/ras/ras_event.h    |   3 +
>  mm/hwpoison-inject.c       |  18 +--
>  mm/madvise.c               |   5 -
>  mm/memory-failure.c        | 307 +++++++++++++++++++++------------------------
>  mm/migrate.c               |  11 +-
>  mm/page_alloc.c            |  60 +++++++--
>  8 files changed, 203 insertions(+), 210 deletions(-)

  parent reply	other threads:[~2020-08-10 15:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06 18:49 [PATCH v6 00/12] HWPOISON: soft offline rework nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 01/12] mm,hwpoison: cleanup unused PageHuge() check nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 02/12] mm, hwpoison: remove recalculating hpage nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 03/12] mm,hwpoison-inject: don't pin for hwpoison_filter nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 04/12] mm,hwpoison: Un-export get_hwpoison_page and make it static nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 05/12] mm,hwpoison: Kill put_hwpoison_page nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 06/12] mm,hwpoison: Unify THP handling for hard and soft offline nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 07/12] mm,hwpoison: Rework soft offline for free pages nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 08/12] mm,hwpoison: Rework soft offline for in-use pages nao.horiguchi
2020-09-18  7:58   ` osalvador
2020-09-19  0:23     ` Andrew Morton
2020-09-19  8:26       ` osalvador
2020-08-06 18:49 ` [PATCH v6 09/12] mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 10/12] mm,hwpoison: Return 0 if the page is already poisoned in soft-offline nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 11/12] mm,hwpoison: introduce MF_MSG_UNSPLIT_THP nao.horiguchi
2020-08-06 18:49 ` [PATCH v6 12/12] mm,hwpoison: double-check page count in __get_any_page() nao.horiguchi
2020-08-24 12:21   ` Oscar Salvador
2020-08-10 15:22 ` Qian Cai [this message]
2020-08-10 15:22   ` [PATCH v6 00/12] HWPOISON: soft offline rework Qian Cai
2020-08-11  3:11   ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11  3:11     ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11  3:45     ` Qian Cai
2020-08-11  3:45       ` Qian Cai
2020-08-11  3:56       ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11  3:56         ` HORIGUCHI NAOYA(堀口 直也)
2020-08-11 17:39     ` Qian Cai
2020-08-11 17:39       ` Qian Cai
2020-08-11 19:32       ` Naoya Horiguchi
2020-08-11 19:32         ` Naoya Horiguchi
2020-08-11 22:06         ` Qian Cai
2020-08-11 22:06           ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200810152254.GC5307@lca.pw \
    --to=cai@lca.pw \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=will@kernel.org \
    --cc=zeil@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.