From: Aili Yao <yaoaili@kingsoft.com>
To: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <linux-mm@kvack.org>, Tony Luck <tony.luck@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Oscar Salvador <osalvador@suse.de>,
"David Hildenbrand" <david@redhat.com>,
Borislav Petkov <bp@alien8.de>,
"Andy Lutomirski" <luto@kernel.org>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
<linux-kernel@vger.kernel.org>, <yaoaili@kingsoft.com>
Subject: Re: [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE
Date: Sat, 17 Apr 2021 13:47:51 +0800 [thread overview]
Message-ID: <20210417134751.0bee9e73@alex-virtual-machine> (raw)
In-Reply-To: <20210412224320.1747638-1-nao.horiguchi@gmail.com>
On Tue, 13 Apr 2021 07:43:17 +0900
Naoya Horiguchi <nao.horiguchi@gmail.com> wrote:
> Hi,
>
> I wrote this patchset to materialize what I think is the current
> allowable solution mentioned by the previous discussion [1].
> I simply borrowed Tony's mutex patch and Aili's return code patch,
> then I queued another one to find error virtual address in the best
> effort manner. I know that this is not a perfect solution, but
> should work for some typical case.
>
> My simple testing showed this patchset seems to work as intended,
> but if you have the related testcases, could you please test and
> let me have some feedback?
>
> Thanks,
> Naoya Horiguchi
>
> [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/
> ---
> Summary:
>
> Aili Yao (1):
> mm,hwpoison: return -EHWPOISON when page already
>
> Naoya Horiguchi (1):
> mm,hwpoison: add kill_accessing_process() to find error virtual address
>
> Tony Luck (1):
> mm/memory-failure: Use a mutex to avoid memory_failure() races
>
> arch/x86/kernel/cpu/mce/core.c | 13 +++-
> include/linux/swapops.h | 5 ++
> mm/memory-failure.c | 166 ++++++++++++++++++++++++++++++++++++++++-
> 3 files changed, 178 insertions(+), 6 deletions(-)
Hi Naoya,
Thanks for your patch and complete fix for this race issue.
I test your patches, mainly it worked as expected, but in some cases it failed, I checked it
and find some doubt places, could you help confirm it?
1. there is a compile warning:
static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr,
unsigned long end, struct mm_walk *walk)
{
struct hwp_walk *hwp = (struct hwp_walk *)walk->private;
int ret; ---- here
It seems this ret may not be initialized, and some time ret may be error retruned?
and for this:
static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift,
unsigned long poisoned_pfn, struct to_kill *tk)
{
unsigned long pfn;
I think it better to be initialized too.
2. In the function hwpoison_pte_range():
if (pfn <= hwp->pfn && hwp->pfn < pfn + PMD_SIZE) this check seem we should use PMD_SIZE/PAGE_SIZE or some macro like this?
3. unsigned long hwpoison_vaddr = addr + (hwp->pfn << PAGE_SHIFT & ~PMD_MASK); this seems not exact accurate?
4. static int set_to_kill(struct to_kill *tk, unsigned long addr, short shift)
{
if (tk->addr) { --- I am not sure about this check and if it will lead failure.
return 1;
}
In my test, it seems sometimes it will hit this branch, I don't know it's multi entry issue or multi posion issue.
when i get to this fail, there is not enough log for this, but i can't reproduce it after that.
wolud you help confirm this and if any changes, please post again and I will do the test again.
Thansk
Aili Yao
next prev parent reply other threads:[~2021-04-17 5:48 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-12 22:43 [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-12 22:43 ` [PATCH v1 1/3] mm/memory-failure: Use a mutex to avoid memory_failure() races Naoya Horiguchi
2021-04-19 17:05 ` Borislav Petkov
2021-04-20 7:46 ` HORIGUCHI NAOYA(堀口 直也)
2021-04-20 10:16 ` Borislav Petkov
2021-04-21 0:57 ` HORIGUCHI NAOYA(堀口 直也)
2021-04-12 22:43 ` [PATCH v1 2/3] mm,hwpoison: return -EHWPOISON when page already Naoya Horiguchi
2021-04-12 22:43 ` [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-17 5:47 ` Aili Yao [this message]
2021-04-19 1:09 ` [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE HORIGUCHI NAOYA(堀口 直也)
2021-04-19 2:36 ` [PATCH v2 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
2021-04-19 3:43 ` Aili Yao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210417134751.0bee9e73@alex-virtual-machine \
--to=yaoaili@kingsoft.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=nao.horiguchi@gmail.com \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.