From: Jane Chu <jane.chu@oracle.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linmiaohe@huawei.com, nao.horiguchi@gmail.com,
akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages
Date: Mon, 20 May 2024 11:30:38 -0700 [thread overview]
Message-ID: <c886c628-551a-4bda-a68f-87549dfc6831@oracle.com> (raw)
In-Reply-To: <ZkXV3wDlfo3rzN1X@localhost.localdomain>
On 5/16/2024 2:46 AM, Oscar Salvador wrote:
> On Fri, May 10, 2024 at 12:26:00AM -0600, Jane Chu wrote:
>> Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
>> Attemped to document the definition of various action names, and made a few
>> adjustment to the action_result() calls.
>>
>> Signed-off-by: Jane Chu <jane.chu@oracle.com>
> ...
>> +/*
>> + * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
> "or if it"
>> + * something, then caught in a race condition which renders the effort sort
> "it was caught"
>
> I would also add to MF_IGNORED that we mark the page hwpoisoned anyway.
Thanks! Will fix, and add comments.
>
>> @@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
>> {
>> pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
>> unlock_page(p);
>> - return MF_FAILED;
>> + return MF_IGNORED;
>> }
> I was confused because I saw you replaced all MF_MSG_UNKNOWN, so I
> wondered how we can end up here until I saw this is a catch-all in case
> we fail to make sense of the page->flags.
> While you are improving this, I would suggest to add a little comment
> above the function explaining how we can reach it.
Yes, it's a catch-all versus something that suppose to happen, will add
comments.
>
>> /*
>> @@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>> if (flags & MF_ACTION_REQUIRED) {
>> folio = page_folio(p);
>> res = kill_accessing_process(current, folio_pfn(folio), flags);
>> + return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> I do not understand why are you doing this.
>
> First of all, why is this considered a failure? We did not fail this
> time, did we? We went right ahead and kill the process which was
> re-accessing the hwpoisoned page. Is that considered a failure?
Given that the goal for the m-f() handler is to isolate the poisoned
page, so in this case,
even if the kernel has killed the page, nothing else could be done to
prevent the page from
being accessed and triggering another MCE, which is, I suppose more
sever than triggering a page fault.
>
> Second, you are know supressing -EHWPOISON with whatever action_result()
> will gives us, which judging from the actual code would be -EBUSY?
> I do not think that that is right, and we should be returning
> -EHWPOISON. Or whatever error code kill_accessing_process() gives us.
>
>
>> @@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)
>> res = kill_accessing_process(current, pfn, flags);
>> if (flags & MF_COUNT_INCREASED)
>> put_page(p);
>> + action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> This is not coherent with what you did in try_memory_failure_hugetlb
> for MF_MSG_ALREADY_POISONED, I __think__ that in there we should be
> doing the same as we do here.
>
>
Good catch, thanks. In both cases, 'res' from kill_accessing_process()
should be returned.
Thanks!
-jane
>
next prev parent reply other threads:[~2024-05-20 18:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 6:25 [PATCH v2 0/5] Enhance soft hwpoison handling and injection Jane Chu
2024-05-10 6:25 ` [PATCH v2 1/5] mm/memory-failure: try to send SIGBUS even if unmap failed Jane Chu
2024-05-11 7:01 ` Miaohe Lin
2024-05-10 6:25 ` [PATCH v2 2/5] mm/madvise: Add MF_ACTION_REQUIRED to madvise(MADV_HWPOISON) Jane Chu
2024-05-15 11:44 ` Oscar Salvador
2024-05-10 6:26 ` [PATCH v2 3/5] mm/memory-failure: improve memory failure action_result messages Jane Chu
2024-05-16 9:46 ` Oscar Salvador
2024-05-20 18:30 ` Jane Chu [this message]
2024-05-10 6:26 ` [PATCH v2 4/5] mm/memory-failure: move hwpoison_filter() higher up Jane Chu
2024-05-11 8:29 ` Miaohe Lin
2024-05-20 18:15 ` Jane Chu
2024-05-16 10:11 ` Oscar Salvador
2024-05-10 6:26 ` [PATCH v2 5/5] mm/memory-failure: send SIGBUS in the event of thp split fail Jane Chu
2024-05-16 12:47 ` Oscar Salvador
2024-05-20 18:48 ` Jane Chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c886c628-551a-4bda-a68f-87549dfc6831@oracle.com \
--to=jane.chu@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nao.horiguchi@gmail.com \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).