From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Tony Luck <tony.luck@intel.com>
Cc: yazen.ghannam@amd.com, Borislav Petkov <bp@alien8.de>,
x86@kernel.org, linux-kernel@vger.kernel.org,
linux-edac@vger.kernel.org, patches@lists.linux.dev
Subject: Re: [PATCH] x86/mce: Check that memory address is usable for recovery
Date: Tue, 18 Apr 2023 13:51:37 -0400 [thread overview]
Message-ID: <5f833d37-e961-73c4-3629-74884e8b6c59@amd.com> (raw)
In-Reply-To: <ZD7TBZex278dSYmc@agluck-desk3.sc.intel.com>
On 4/18/23 13:27, Tony Luck wrote:
> On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote:
>> On 3/21/23 20:51, Tony Luck wrote:
>>> uc_decode_notifier() includes a check that "struct mce"
>>> contains a valid address for recovery. But the machine
>>> check recovery code does not include a similar check.
>>>
>>> Use mce_usable_address() to check that there is a valid
>>> address.
>>>
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>> arch/x86/kernel/cpu/mce/core.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>>> index 2eec60f50057..fa28b3f7d945 100644
>>> --- a/arch/x86/kernel/cpu/mce/core.c
>>> +++ b/arch/x86/kernel/cpu/mce/core.c
>>> @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>> /* If this triggers there is no way to recover. Die hard. */
>>> BUG_ON(!on_thread_stack() || !user_mode(regs));
>>>
>>> - if (kill_current_task)
>>> + if (kill_current_task || !mce_usable_address(&m))
>>> queue_task_work(&m, msg, kill_me_now);
>>> else
>>> queue_task_work(&m, msg, kill_me_maybe);
>>
>> I think it should be like this:
>>
>> if (mce_usable_address(&m))
>> queue_task_work(&m, msg, kill_me_maybe);
>> else
>> queue_task_work(&m, msg, kill_me_now);
>>
>> A usable address should always go through memory_failure() so that the page is
>> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
>> flag and try to kill all processes after the page is poisoned.
>>
>> I had a similar patch a while back:
>> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/
>>
>> We could also get rid of kill_me_now() like you had suggested.
>
> Can we also get rid of "kill_current_task"? It is only set when there is
> no valid return address:
>
> if (!(m.mcgstatus & MCG_STATUS_RIPV))
> kill_current_task = 1;
>
> kill_me_maybe() does an equivalent check:
>
> if (!p->mce_ripv)
> flags |= MF_MUST_KILL;
>
> Which only leaves this check to resolve in some way:
>
> if (worst != MCE_AR_SEVERITY && !kill_current_task)
> goto out;
>
I agree. And I think all these checks should be baked into the severity.
We'll need additional, fine-grained severity levels though.
The "m.cs" and "m.kflags" checks could also be baked in.
Instead of just one AR severity:
...
MCE_AR_SEVERITY,
MCE_PANIC_SEVERITY,
replace it with specific cases:
...
MCE_AR_USER_RECOV,
MCE_AR_USER_KILL,
MCE_AR_KERNEL_COPYIN,
MCE_AR_KERNEL_RECOV,
MCE_PANIC_SEVERITY,
Then the #MC handler can look like this:
if (worst < MCE_AR_USER_RECOV)
goto out;
if (severity == MCE_AR_USER_RECOV)
queue_task_work(&m, msg, kill_me_maybe);
if (severity == MCE_AR_USER_KILL)
force_sig(SIGBUS);
if (severity == MCE_AR_KERNEL_COPYIN)
queue_task_work(&m, msg, kill_me_never);
if (severity == MCE_AR_KERNEL_RECOV) {
if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
mce_panic("Failed kernel mode recovery");
}
I can take a shot at this if it seems reasonable.
What do you think?
Thanks,
Yazen
next prev parent reply other threads:[~2023-04-18 17:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-22 0:51 [PATCH] x86/mce: Check that memory address is usable for recovery Tony Luck
2023-04-18 16:41 ` Yazen Ghannam
2023-04-18 17:27 ` Tony Luck
2023-04-18 17:51 ` Yazen Ghannam [this message]
2023-04-18 18:23 ` Tony Luck
2023-04-18 18:03 ` [PATCH v2] x86/mce: Always call memory_failure() when there is a valid address Tony Luck
2023-04-18 18:17 ` Yazen Ghannam
2023-04-18 19:34 ` Luck, Tony
2023-04-19 13:08 ` Yazen Ghannam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5f833d37-e961-73c4-3629-74884e8b6c59@amd.com \
--to=yazen.ghannam@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox