linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: james.morse@arm.com (James Morse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH RESEND] arm64: fault: avoid send SIGBUS two times
Date: Mon, 11 Dec 2017 13:52:44 +0000	[thread overview]
Message-ID: <5A2E8DAC.1010400@arm.com> (raw)
In-Reply-To: <d2b0d31b-1831-3c76-1c57-96101abf4b6e@huawei.com>

Hi gengdongjiu,

On 08/12/17 04:43, gengdongjiu wrote:
> by the way, I think also change the info.si_code to "BUS_MCEERR_AR" is better, as shown [1].
> BUS_MCEERR_AR can tell user space  "Hardware memory error consumed on a error; action required".

Today its also used as the last-resort. This signal tells user-space the page
can't be re-read from disk/swap, and its been unmapped from all affected processes.

I think using it like this (tempting as it is) changes the meaning.


> so it is better than "0". In the X86 platform, it also use the "BUS_MCEERR_AR" for si_code[2] in "arch/x86/mm/fault.c".
> what do you think about it?

This is heading into kernel-first territory, I'd prefer we do that all at once
so we know everything is covered.


> [2]:
> arch/x86/mm/fault.c:
> 
> static void
> do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
>       u32 *pkey, unsigned int fault)
> {
>   ......
> #ifdef CONFIG_MEMORY_FAILURE
>     if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {

These VM_FAULT flags indicate memory_failure() has run, tried to re-read the
memory from disk/swap, failed, and unmapped the page from all affected processes.


>         printk(KERN_ERR
>     "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>             tsk->comm, tsk->pid, address);
>         code = BUS_MCEERR_AR;
>     }
> #endif
>     force_sig_info_fault(SIGBUS, code, address, tsk, pkey, fault);
> }

This is x86's page fault handler, not its Machine-Check-Exception handler.

arm64's page fault handler does this too, from do_page_fault():
>	} else if (fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) {
>		sig = SIGBUS;
>		code = BUS_MCEERR_AR;


If you're seeing this, its likely due to the race Xie XiuQi spotted where the
recovery action has been queued, then we return to user-space before its done.

I had a go at tackling this, adding helpers to kick the assorted queues, which
we can do if we took the exception from user-space. Where I got stuck is whether
we should still force a signal, and how signals get merged. I'll try and spend
some more time on that this week.



Thanks,

James

  reply	other threads:[~2017-12-11 13:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-05 15:02 [PATCH RESEND] arm64: fault: avoid send SIGBUS two times Dongjiu Geng
2017-12-06 16:15 ` Will Deacon
2017-12-07  5:55   ` gengdongjiu
2017-12-07 14:32     ` James Morse
2017-12-08  4:43       ` gengdongjiu
2017-12-11 13:52         ` James Morse [this message]
2017-12-11 15:38           ` 答复: " gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5A2E8DAC.1010400@arm.com \
    --to=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).