From: Oscar Salvador <osalvador@suse.de>
To: Jane Chu <jane.chu@oracle.com>
Cc: linmiaohe@huawei.com, nao.horiguchi@gmail.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail
Date: Thu, 16 May 2024 14:47:01 +0200 [thread overview]
Message-ID: <ZkYARVW2cOZcsFYB@localhost.localdomain> (raw)
In-Reply-To: <20240510062602.901510-6-jane.chu@oracle.com>
On Fri, May 10, 2024 at 12:26:02AM -0600, Jane Chu wrote:
> When handle hwpoison in a RDMA longterm pinned thp page,
> try_to_split_thp_page() will fail. And at this point, there is
> little else the kernel could do except sending a SIGBUS to
> the user process, thus give it a chance to recover.
Well, it does need to be a RDMA longterm pinned, right?
Anything holding an extra refcount can already make us bite the dust, so
I would not make it that specific.
> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ---
> mm/memory-failure.c | 31 ++++++++++++++++++++++++++-----
> 1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 2fa884d8b5a3..15bb1c0c42e8 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1697,7 +1697,7 @@ static int identify_page_state(unsigned long pfn, struct page *p,
> return page_action(ps, p, pfn);
> }
>
> -static int try_to_split_thp_page(struct page *page)
> +static int try_to_split_thp_page(struct page *page, bool release)
> {
> int ret;
>
> @@ -1705,7 +1705,7 @@ static int try_to_split_thp_page(struct page *page)
> ret = split_huge_page(page);
> unlock_page(page);
>
> - if (unlikely(ret))
> + if (ret && release)
> put_page(page);
I would document whhen and when not we can release the page.
E.g: we cannot release it if there are still processes mapping the thp.
> +static int kill_procs_now(struct page *p, unsigned long pfn, int flags,
> + struct folio *folio)
> +{
> + LIST_HEAD(tokill);
> +
> + collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
> + kill_procs(&tokill, true, pfn, flags);
> +
> + return -EHWPOISON;
You are returning -EHWPOISON here,
> +}
> +
> /**
> * memory_failure - Handle memory failure of a page.
> * @pfn: Page Number of the corrupted page
> @@ -2313,8 +2331,11 @@ int memory_failure(unsigned long pfn, int flags)
> * page is a valid handlable page.
> */
> folio_set_has_hwpoisoned(folio);
> - if (try_to_split_thp_page(p) < 0) {
> - res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> + if (try_to_split_thp_page(p, false) < 0) {
> + pr_err("%#lx: thp split failed\n", pfn);
> + res = kill_procs_now(p, pfn, flags, folio);
> + put_page(p);
> + res = action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
just to overwrite it here with action_result(). Which one do we need?
I think we would need -EBUSY here, right? So I would drop the retcode
from kill_procs_now.
Also, do we want the extra pr_err() here.
action_result() will already provide us the pfn and the
action_page_types which will be "unsplit thp". Is not that clear enough?
I would drop that.
--
Oscar Salvador
SUSE Labs
next prev parent reply other threads:[~2024-05-16 12:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
2024-05-10 6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
2024-05-11 7:01 ` Miaohe Lin
2024-05-10 6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
2024-05-15 11:44 ` Oscar Salvador
2024-05-10 6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
2024-05-16 9:46 ` Oscar Salvador
2024-05-20 18:30 ` Jane Chu
2024-05-10 6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
2024-05-11 8:29 ` Miaohe Lin
2024-05-20 18:15 ` Jane Chu
2024-05-16 10:11 ` Oscar Salvador
2024-05-10 6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
2024-05-16 12:47 ` Oscar Salvador [this message]
2024-05-20 18:48 ` Jane Chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZkYARVW2cOZcsFYB@localhost.localdomain \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=jane.chu@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nao.horiguchi@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.