[PATCH 0/2] Fix some machine check application recovery cases

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Fix some machine check application recovery cases
@ 2014-05-20 17:35 Tony Luck
  2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
  2014-05-20 16:46 ` [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Tony Luck
  0 siblings, 2 replies; 31+ messages in thread
From: Tony Luck @ 2014-05-20 17:35 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: Andi Kleen, Borislav Petkov, Chen Gong

Tesing recovery in mult-threaded applications showed a couple
of issues in our code.

Tony Luck (2):
  memory-failure: Send right signal code to correct thread
  memory-failure: Don't let collect_procs() skip over processes for
    MF_ACTION_REQUIRED

 mm/memory-failure.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-20 17:35 [PATCH 0/2] Fix some machine check application recovery cases Tony Luck
@ 2014-05-20 16:28 ` Tony Luck
  2014-05-20 17:54   ` Naoya Horiguchi
                     ` (2 more replies)
  2014-05-20 16:46 ` [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Tony Luck
  1 sibling, 3 replies; 31+ messages in thread
From: Tony Luck @ 2014-05-20 16:28 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: Andi Kleen, Borislav Petkov, Chen Gong

When a thread in a multi-threaded application hits a machine
check because of an uncorrectable error in memory - we want to
send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
Currently we fail to do that if the active thread is not the
primary thread in the process. collect_procs() just finds primary
threads and this test:
	if ((flags & MF_ACTION_REQUIRED) && t == current) {
will see that the thread we found isn't the current thread
and so send a si.si_code = BUS_MCEERR_AO to the primary
(and nothing to the active thread at this time).

We can fix this by checking whether "current" shares the same
mm with the process that collect_procs() said owned the page.
If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).

Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 mm/memory-failure.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 35ef28acf137..642c8434b166 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -204,9 +204,9 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
 #endif
 	si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
 
-	if ((flags & MF_ACTION_REQUIRED) && t == current) {
+	if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
 		si.si_code = BUS_MCEERR_AR;
-		ret = force_sig_info(SIGBUS, &si, t);
+		ret = force_sig_info(SIGBUS, &si, current);
 	} else {
 		/*
 		 * Don't use force here, it's convenient if the signal
-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
@ 2014-05-20 17:54   ` Naoya Horiguchi
       [not found]   ` <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>
  2014-05-23  3:34   ` Chen, Gong
  2 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-20 17:54 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-kernel, linux-mm, Andi Kleen, bp, gong.chen

On Tue, May 20, 2014 at 09:28:00AM -0700, Tony Luck wrote:
> When a thread in a multi-threaded application hits a machine
> check because of an uncorrectable error in memory - we want to
> send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
> Currently we fail to do that if the active thread is not the
> primary thread in the process. collect_procs() just finds primary
> threads and this test:
> 	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> will see that the thread we found isn't the current thread
> and so send a si.si_code = BUS_MCEERR_AO to the primary
> (and nothing to the active thread at this time).
> 
> We can fix this by checking whether "current" shares the same
> mm with the process that collect_procs() said owned the page.
> If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).
> 
> Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Looks good to me, thank you.
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

and I think this is worth going into stable trees.

Naoya

> ---
>  mm/memory-failure.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 35ef28acf137..642c8434b166 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -204,9 +204,9 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
>  #endif
>  	si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
>  
> -	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> +	if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
>  		si.si_code = BUS_MCEERR_AR;
> -		ret = force_sig_info(SIGBUS, &si, t);
> +		ret = force_sig_info(SIGBUS, &si, current);
>  	} else {
>  		/*
>  		 * Don't use force here, it's convenient if the signal
> -- 
> 1.8.4.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>]

* RE: [PATCH 1/2] memory-failure: Send right signal code to correct thread
       [not found]   ` <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>
@ 2014-05-20 20:56     ` Luck, Tony
  0 siblings, 0 replies; 31+ messages in thread
From: Luck, Tony @ 2014-05-20 20:56 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andi Kleen,
	bp@suse.de, gong.chen@linux.jf.intel.com

> Looks good to me, thank you.
> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks for your time reviewing this

> and I think this is worth going into stable trees.

Good point. I should dig in the git history and make one of those
fancy "Fixes: sha1 title" tags too.

-Tony
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
  2014-05-20 17:54   ` Naoya Horiguchi
       [not found]   ` <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>
@ 2014-05-23  3:34   ` Chen, Gong
  2014-05-23 16:48     ` Tony Luck
  2 siblings, 1 reply; 31+ messages in thread
From: Chen, Gong @ 2014-05-23  3:34 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-kernel, linux-mm, Andi Kleen, Borislav Petkov, Chen Gong

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

On Tue, May 20, 2014 at 09:28:00AM -0700, Luck, Tony wrote:
> When a thread in a multi-threaded application hits a machine
> check because of an uncorrectable error in memory - we want to
> send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
> Currently we fail to do that if the active thread is not the
> primary thread in the process. collect_procs() just finds primary
> threads and this test:
> 	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> will see that the thread we found isn't the current thread
> and so send a si.si_code = BUS_MCEERR_AO to the primary
> (and nothing to the active thread at this time).
> 
> We can fix this by checking whether "current" shares the same
> mm with the process that collect_procs() said owned the page.
> If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).
> 
> Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  mm/memory-failure.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 35ef28acf137..642c8434b166 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -204,9 +204,9 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
>  #endif
>  	si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
>  
> -	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> +	if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
>  		si.si_code = BUS_MCEERR_AR;
> -		ret = force_sig_info(SIGBUS, &si, t);
> +		ret = force_sig_info(SIGBUS, &si, current);
>  	} else {
>  		/*
>  		 * Don't use force here, it's convenient if the signal
> -- 
> 1.8.4.1
Very interesting. I remembered there was a thread about AO error. Here is
the link: http://www.spinics.net/lists/linux-mm/msg66653.html.
According to this link, I have two concerns:

1) how to handle the similar scenario like it in this link. I mean once
the main thread doesn't handle AR error but a thread does this, if SIGBUS
can't be handled at once.
2) why that patch isn't merged. From that thread, Naoya should mean
"acknowledge" :-).

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-23  3:34   ` Chen, Gong
@ 2014-05-23 16:48     ` Tony Luck
  2014-05-27 16:16       ` Kamil Iskra
  0 siblings, 1 reply; 31+ messages in thread
From: Tony Luck @ 2014-05-23 16:48 UTC (permalink / raw)
  To: Tony Luck, Linux Kernel Mailing List, linux-mm@kvack.org,
	Andi Kleen, Borislav Petkov, Chen Gong, iskra

Added Kamil (hope I got the right one - the spinics.net archive obfuscates
the e-mail addresses).

>> -     if ((flags & MF_ACTION_REQUIRED) && t == current) {
>> +     if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
>>               si.si_code = BUS_MCEERR_AR;
>> -             ret = force_sig_info(SIGBUS, &si, t);
>> +             ret = force_sig_info(SIGBUS, &si, current);
>>       } else {
>>               /*
>>                * Don't use force here, it's convenient if the signal
>> --
>> 1.8.4.1
> Very interesting. I remembered there was a thread about AO error. Here is
> the link: http://www.spinics.net/lists/linux-mm/msg66653.html.
> According to this link, I have two concerns:
>
> 1) how to handle the similar scenario like it in this link. I mean once
> the main thread doesn't handle AR error but a thread does this, if SIGBUS
> can't be handled at once.
> 2) why that patch isn't merged. From that thread, Naoya should mean
> "acknowledge" :-).

That's an interesting thread ... and looks like it helps out in a case
where there are only AO signals.

But the "AR" case complicates things. Kamil points out at the start
of the thread:
> Also, do I understand it correctly that "action required" faults *must* be
> handled by the thread that triggered the error?  I guess it makes sense for
> it to be that way, even if it circumvents the "dedicated handling thread"
> idea...
this is absolutely true ... in the BUS_MCEERR_AR case the current
thread is executing an instruction that is attempting to consume poison
data ... and we cannot let that instruction retire, so we have to signal that
thread - if it can fix the problem by mapping a new page to the location
that was lost, and refilling it with the right data - the handler can return
to resume - otherwise it can longjmp() somewhere or exit.

This means that the idea of having a multi-threaded application where
just one thread has a SIGBUS handler and we gently steer the
BUS_MCEERR_AO signals to that thread to be handled is flawed.
Every thread needs to have a SIGBUS handler - so that we can handle
the "AR" case. [Digression: what does happen to a process with a thread
with no SIGBUS handler if we in fact send it a SIGBUS? Does just that
thread die (default action for SIGBUS)? Or does the whole process get
killed?  If just one thread is terminated ... then perhaps someone could
write a recovery aware application that worked like this - though it sounds
like that would be working blindfold with one hand tied behind your back.
How would the remaining threads know why their buddy just died? The
siginfo_t describing the problem isn't available]

If we want steerable AO signals to a dedicated thread - we'd have to
use different signals for AO & AR. So every thread can have an AR
handler, but just one have the AO handler.  Or something more exotic
with prctl to designate the preferred target for AO signals?

Or just live with the fact that every thread needs a handler for AR ...
and have the application internally pass AO activity from the
thread that originally got the SIGBUS to some worker thread.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-23 16:48     ` Tony Luck
@ 2014-05-27 16:16       ` Kamil Iskra
  2014-05-27 17:50         ` Naoya Horiguchi
       [not found]         ` <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Kamil Iskra @ 2014-05-27 16:16 UTC (permalink / raw)
  To: Tony Luck
  Cc: Tony Luck, Linux Kernel Mailing List, linux-mm@kvack.org,
	Andi Kleen, Borislav Petkov, Chen Gong

On Fri, May 23, 2014 at 09:48:42 -0700, Tony Luck wrote:

Tony,

> Added Kamil (hope I got the right one - the spinics.net archive obfuscates
> the e-mail addresses).

Yes, you got the right address :-).

> >> -     if ((flags & MF_ACTION_REQUIRED) && t == current) {
> >> +     if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
> >>               si.si_code = BUS_MCEERR_AR;
> >> -             ret = force_sig_info(SIGBUS, &si, t);
> >> +             ret = force_sig_info(SIGBUS, &si, current);
> >>       } else {
> >>               /*
> >>                * Don't use force here, it's convenient if the signal
> >> --
> >> 1.8.4.1
> > Very interesting. I remembered there was a thread about AO error. Here is
> > the link: http://www.spinics.net/lists/linux-mm/msg66653.html.
> > According to this link, I have two concerns:
> >
> > 1) how to handle the similar scenario like it in this link. I mean once
> > the main thread doesn't handle AR error but a thread does this, if SIGBUS
> > can't be handled at once.
> > 2) why that patch isn't merged. From that thread, Naoya should mean
> > "acknowledge" :-).
> That's an interesting thread ... and looks like it helps out in a case
> where there are only AO signals.

Unfortunately, I got distracted by other pressing work at the time and
didn't follow up on my patch/didn't follow the correct kernel workflow on
patch submission procedures.  I haven't checked any developments in that
area so I don't even know if my patch is still applicable -- do you think
it makes sense for me to revisit the issue at this time, or will the patch
that you are working on make my old patch redundant?

> But the "AR" case complicates things. Kamil points out at the start
> of the thread:
> > Also, do I understand it correctly that "action required" faults *must* be
> > handled by the thread that triggered the error?  I guess it makes sense for
> > it to be that way, even if it circumvents the "dedicated handling thread"
> > idea...
> this is absolutely true ... in the BUS_MCEERR_AR case the current
> thread is executing an instruction that is attempting to consume poison
> data ... and we cannot let that instruction retire, so we have to signal that
> thread - if it can fix the problem by mapping a new page to the location
> that was lost, and refilling it with the right data - the handler can return
> to resume - otherwise it can longjmp() somewhere or exit.

Exactly.

> This means that the idea of having a multi-threaded application where
> just one thread has a SIGBUS handler and we gently steer the
> BUS_MCEERR_AO signals to that thread to be handled is flawed.
> Every thread needs to have a SIGBUS handler - so that we can handle
> the "AR" case. [Digression: what does happen to a process with a thread
> with no SIGBUS handler if we in fact send it a SIGBUS? Does just that
> thread die (default action for SIGBUS)? Or does the whole process get
> killed?  If just one thread is terminated ... then perhaps someone could
> write a recovery aware application that worked like this - though it sounds
> like that would be working blindfold with one hand tied behind your back.
> How would the remaining threads know why their buddy just died? The
> siginfo_t describing the problem isn't available]

I believe I experimented with this and the whole process would get killed.

> If we want steerable AO signals to a dedicated thread - we'd have to
> use different signals for AO & AR. So every thread can have an AR
> handler, but just one have the AO handler.  Or something more exotic
> with prctl to designate the preferred target for AO signals?
> 
> Or just live with the fact that every thread needs a handler for AR ...
> and have the application internally pass AO activity from the
> thread that originally got the SIGBUS to some worker thread.

Yes, you make a very valid point that my patch was not complete... but
then, neither was what was there before it.  So my patch was only an
incremental improvement, enough to play with when artificially injecting
fault events, but not enough to *really* solve the problem.  If you have a
complete solution in mind instead, that would be great.

Kamil

-- 
Kamil Iskra, PhD
Argonne National Laboratory, Mathematics and Computer Science Division
9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA
phone: +1-630-252-7197  fax: +1-630-252-5986

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-27 16:16       ` Kamil Iskra
@ 2014-05-27 17:50         ` Naoya Horiguchi
       [not found]         ` <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-27 17:50 UTC (permalink / raw)
  To: iskra
  Cc: tony.luck, Tony Luck, linux-kernel, linux-mm, Andi Kleen,
	Borislav Petkov, gong.chen

On Tue, May 27, 2014 at 11:16:13AM -0500, Kamil Iskra wrote:
> On Fri, May 23, 2014 at 09:48:42 -0700, Tony Luck wrote:
> 
> Tony,
> 
> > Added Kamil (hope I got the right one - the spinics.net archive obfuscates
> > the e-mail addresses).
> 
> Yes, you got the right address :-).
> 
> > >> -     if ((flags & MF_ACTION_REQUIRED) && t == current) {
> > >> +     if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
> > >>               si.si_code = BUS_MCEERR_AR;
> > >> -             ret = force_sig_info(SIGBUS, &si, t);
> > >> +             ret = force_sig_info(SIGBUS, &si, current);
> > >>       } else {
> > >>               /*
> > >>                * Don't use force here, it's convenient if the signal
> > >> --
> > >> 1.8.4.1
> > > Very interesting. I remembered there was a thread about AO error. Here is
> > > the link: http://www.spinics.net/lists/linux-mm/msg66653.html.
> > > According to this link, I have two concerns:
> > >
> > > 1) how to handle the similar scenario like it in this link. I mean once
> > > the main thread doesn't handle AR error but a thread does this, if SIGBUS
> > > can't be handled at once.
> > > 2) why that patch isn't merged. From that thread, Naoya should mean
> > > "acknowledge" :-).
> > That's an interesting thread ... and looks like it helps out in a case
> > where there are only AO signals.
> 
> Unfortunately, I got distracted by other pressing work at the time and
> didn't follow up on my patch/didn't follow the correct kernel workflow on
> patch submission procedures.  I haven't checked any developments in that
> area so I don't even know if my patch is still applicable -- do you think
> it makes sense for me to revisit the issue at this time, or will the patch
> that you are working on make my old patch redundant?
> 
> > But the "AR" case complicates things. Kamil points out at the start
> > of the thread:
> > > Also, do I understand it correctly that "action required" faults *must* be
> > > handled by the thread that triggered the error?  I guess it makes sense for
> > > it to be that way, even if it circumvents the "dedicated handling thread"
> > > idea...
> > this is absolutely true ... in the BUS_MCEERR_AR case the current
> > thread is executing an instruction that is attempting to consume poison
> > data ... and we cannot let that instruction retire, so we have to signal that
> > thread - if it can fix the problem by mapping a new page to the location
> > that was lost, and refilling it with the right data - the handler can return
> > to resume - otherwise it can longjmp() somewhere or exit.
> 
> Exactly.
> 
> > This means that the idea of having a multi-threaded application where
> > just one thread has a SIGBUS handler and we gently steer the
> > BUS_MCEERR_AO signals to that thread to be handled is flawed.
> > Every thread needs to have a SIGBUS handler - so that we can handle
> > the "AR" case. [Digression: what does happen to a process with a thread
> > with no SIGBUS handler if we in fact send it a SIGBUS? Does just that
> > thread die (default action for SIGBUS)? Or does the whole process get
> > killed?  If just one thread is terminated ... then perhaps someone could
> > write a recovery aware application that worked like this - though it sounds
> > like that would be working blindfold with one hand tied behind your back.
> > How would the remaining threads know why their buddy just died? The
> > siginfo_t describing the problem isn't available]
> 
> I believe I experimented with this and the whole process would get killed.
> 
> > If we want steerable AO signals to a dedicated thread - we'd have to
> > use different signals for AO & AR.

I think that user process can distinguish which signal it got via
(struct sigaction)->si_code, so we don't need different signals.
If it's right, the followings solves Kamil's problem?
 - apply Kamil's patch
 - make sure that every thread in a recovery aware application should have
   a SIGBUS handler, inside which
   * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
   * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread

One concern is that with Kamil's patch, some existing user who expects
that only the main thread of "early kill" process receives SIGBUS(BUS_MCEERR_AO)
could be surprised by this change, because other threads become to get SIGBUS
and if those threads are not prepared for it, they're just killed (IOW, behavior
of these threads could change.)
Good example is qemu, is it safe from Kamil's change?

Thanks,
Naoya Horiguchi

> So every thread can have an AR
> > handler, but just one have the AO handler.  Or something more exotic
> > with prctl to designate the preferred target for AO signals?
> > 
> > Or just live with the fact that every thread needs a handler for AR ...
> > and have the application internally pass AO activity from the
> > thread that originally got the SIGBUS to some worker thread.
> 
> Yes, you make a very valid point that my patch was not complete... but
> then, neither was what was there before it.  So my patch was only an
> incremental improvement, enough to play with when artificially injecting
> fault events, but not enough to *really* solve the problem.  If you have a
> complete solution in mind instead, that would be great.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>]

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
       [not found]         ` <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-27 22:53           ` Tony Luck
  2014-05-28  0:15             ` Naoya Horiguchi
       [not found]             ` <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Tony Luck @ 2014-05-27 22:53 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kamil Iskra, Linux Kernel Mailing List, linux-mm@kvack.org,
	Andi Kleen, Borislav Petkov, Chen Gong

>  - make sure that every thread in a recovery aware application should have
>    a SIGBUS handler, inside which
>    * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
>    * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread

But how does the kernel know which is the special thread that
should see the "AO" signal?  Broadcasting the signal to all
threads seems to be just as likely to cause problems to
an application as the h/w broadcasting MCE to all processors.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
  2014-05-27 22:53           ` Tony Luck
@ 2014-05-28  0:15             ` Naoya Horiguchi
       [not found]             ` <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-28  0:15 UTC (permalink / raw)
  To: tony.luck
  Cc: iskra, linux-kernel, linux-mm, Andi Kleen, Borislav Petkov,
	gong.chen

On Tue, May 27, 2014 at 03:53:55PM -0700, Tony Luck wrote:
> >  - make sure that every thread in a recovery aware application should have
> >    a SIGBUS handler, inside which
> >    * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
> >    * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread
> 
> But how does the kernel know which is the special thread that
> should see the "AO" signal?  Broadcasting the signal to all
> threads seems to be just as likely to cause problems to
> an application as the h/w broadcasting MCE to all processors.

I thought that kernel doesn't have to know about which thread is the
special one if the AO signal is broadcasted to all threads, because
in such case the special thread always gets the AO signal.

The reported problem happens only the application sets PF_MCE_EARLY flag,
and such application is surely recovery aware, so we can assume that the
coders must implement SIGBUS handler for all threads. Then all other threads
but the special one can intentionally ignore AO signal. This is to avoid the
default behavior for SIGBUS ("kill all threads" as Kamil said in the previous
email.)

And I hope that downside of signal broadcasting is smaller than MCE
broadcasting because the range of broadcasting is limited to a process group,
not to the whole system.

# I don't intend to rule out other possibilities like adding another prctl
# flag, so if you have a patch, that's would be great.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>]

* Re: [PATCH 1/2] memory-failure: Send right signal code to correct thread
       [not found]             ` <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-28  5:09               ` Tony Luck
  2014-05-28 18:47                 ` [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread Naoya Horiguchi
       [not found]                 ` <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 31+ messages in thread
From: Tony Luck @ 2014-05-28  5:09 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: iskra@mcs.anl.gov, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Andi Kleen, Borislav Petkov,
	gong.chen@linux.jf.intel.com

I'm exploring options to see what writers of threaded applications might want/need. I'm very doubtful that they would really want "broadcast to all threads". What if there are hundreds or thousands of threads? We send the signals from the context of the thread that hit the error. But that might take a while. Meanwhile any of those threads that were already scheduled on other CPUs are back running again. So there are big races even if we broadcast.

Sent from my iPhone

> On May 27, 2014, at 17:15, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> On Tue, May 27, 2014 at 03:53:55PM -0700, Tony Luck wrote:
>>> - make sure that every thread in a recovery aware application should have
>>>   a SIGBUS handler, inside which
>>>   * code for SIGBUS(BUS_MCEERR_AR) is enabled for every thread
>>>   * code for SIGBUS(BUS_MCEERR_AO) is enabled only for a dedicated thread
>> 
>> But how does the kernel know which is the special thread that
>> should see the "AO" signal?  Broadcasting the signal to all
>> threads seems to be just as likely to cause problems to
>> an application as the h/w broadcasting MCE to all processors.
> 
> I thought that kernel doesn't have to know about which thread is the
> special one if the AO signal is broadcasted to all threads, because
> in such case the special thread always gets the AO signal.
> 
> The reported problem happens only the application sets PF_MCE_EARLY flag,
> and such application is surely recovery aware, so we can assume that the
> coders must implement SIGBUS handler for all threads. Then all other threads
> but the special one can intentionally ignore AO signal. This is to avoid the
> default behavior for SIGBUS ("kill all threads" as Kamil said in the previous
> email.)
> 
> And I hope that downside of signal broadcasting is smaller than MCE
> broadcasting because the range of broadcasting is limited to a process group,
> not to the whole system.
> 
> # I don't intend to rule out other possibilities like adding another prctl
> # flag, so if you have a patch, that's would be great.
> 
> Thanks,
> Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
  2014-05-28  5:09               ` Tony Luck
@ 2014-05-28 18:47                 ` Naoya Horiguchi
       [not found]                 ` <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-28 18:47 UTC (permalink / raw)
  To: tony.luck
  Cc: iskra, linux-kernel, linux-mm, Andi Kleen, Borislav Petkov,
	gong.chen

On Tue, May 27, 2014 at 10:09:54PM -0700, Tony Luck wrote:
> I'm exploring options to see what writers of threaded applications might want/need. I'm very doubtful that they would really want "broadcast to all threads". What if there are hundreds or thousands of threads? We send the signals from the context of the thread that hit the error. But that might take a while. Meanwhile any of those threads that were already scheduled on other CPUs are back running again. So there are big races even if we broadcast.

I see, so this approach is not good. I studied another approach and found
that we have PF_MCE_EARLY flags on each thread, so we can implement
a dedicated thread by setting the flag on that thread. IOW, current code
assumes that PF_MCE_EARLY is always set on the main thread (otherwise ignored),
so we can change this behavior.

The following patch makes kernel aware of PF_MCE_EARLY flag on threads.
Could you take a look?

Thanks,
Naoya Horiguchi
---
Date: Wed, 28 May 2014 03:38:33 -0400
Subject: [PATCH] mm/memory-failure.c: support dedicated thread to handle
 SIGBUS(BUS_MCEERR_AO)

Currently memory error handler handles action optional errors in the deferred
manner by default. And if a recovery aware application wants to handle it
immediately, it can do it by setting PF_MCE_EARLY flag. However, such signal
can be sent only to the main thread, so it's problematic if the application
wants to have a dedicated thread to handler such signals.

So this patch adds dedicated thread support to memory error handler. We have
PF_MCE_EARLY flags for each thread separately, so with this patch AO signal
is sent to the thread with PF_MCE_EARLY flag set, not the main thread. If
you want to implement a dedicated thread, you call prctl() to set PF_MCE_EARLY
on the thread.

Memory error handler collects processes to be killed, so this patch lets it
check PF_MCE_EARLY flag on each thread in the collecting routines.

No behavioral change for all non-early kill cases.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 Documentation/vm/hwpoison.txt |  5 ++++
 mm/memory-failure.c           | 68 ++++++++++++++++++++++++++++++-------------
 2 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/Documentation/vm/hwpoison.txt b/Documentation/vm/hwpoison.txt
index 550068466605..1906fd3bea0e 100644
--- a/Documentation/vm/hwpoison.txt
+++ b/Documentation/vm/hwpoison.txt
@@ -84,6 +84,11 @@ PR_MCE_KILL
 		PR_MCE_KILL_EARLY: Early kill
 		PR_MCE_KILL_LATE:  Late kill
 		PR_MCE_KILL_DEFAULT: Use system global default
+	Note that if you want to have a dedicated thread which handles
+	the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
+	call prctl() on the thread. Otherwise, the SIGBUS is sent to
+	the main thread.
+
 PR_MCE_KILL_GET
 	return current mode
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a18007ada3cb..3bd0428b2534 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -294,6 +294,46 @@ struct to_kill {
  */
 
 /*
+ * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
+ * on behalf of the thread group. Return task_struct of the (first found)
+ * dedicated thread if found, and return NULL otherwise.
+ */
+static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
+{
+	struct task_struct *t;
+	rcu_read_lock();
+	for_each_thread(tsk, t)
+		if (t->flags & PF_MCE_PROCESS && t->flags & PF_MCE_EARLY)
+			goto found;
+	t = NULL;
+found:
+	rcu_read_unlock();
+	return t;
+}
+
+/*
+ * Determine whether a given process is "early kill" process which expects
+ * to be signaled when some page under the process is hwpoisoned.
+ * Return task_struct of the dedicated thread (main thread unless explicitly
+ * specified) if the process is "early kill," and otherwise returns NULL.
+ */
+static struct task_struct *task_early_kill(struct task_struct *tsk,
+					   int force_early)
+{
+	struct task_struct *t;
+	if (!tsk->mm)
+		return NULL;
+	if (force_early)
+		return tsk;
+	t = find_early_kill_thread(tsk);
+	if (t)
+		return t;
+	if (sysctl_memory_failure_early_kill)
+		return tsk;
+	return NULL;
+}
+
+/*
  * Schedule a process for later kill.
  * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM.
  * TBD would GFP_NOIO be enough?
@@ -380,17 +420,6 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
 	}
 }
 
-static int task_early_kill(struct task_struct *tsk, int force_early)
-{
-	if (!tsk->mm)
-		return 0;
-	if (force_early)
-		return 1;
-	if (tsk->flags & PF_MCE_PROCESS)
-		return !!(tsk->flags & PF_MCE_EARLY);
-	return sysctl_memory_failure_early_kill;
-}
-
 /*
  * Collect processes when the error hit an anonymous page.
  */
@@ -410,16 +439,16 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
-
-		if (!task_early_kill(tsk, force_early))
+		struct task_struct *t = task_early_kill(tsk, force_early);
+		if (!t)
 			continue;
 		anon_vma_interval_tree_foreach(vmac, &av->rb_root,
 					       pgoff, pgoff) {
 			vma = vmac->vma;
 			if (!page_mapped_in_vma(page, vma))
 				continue;
-			if (vma->vm_mm == tsk->mm)
-				add_to_kill(tsk, page, vma, to_kill, tkc);
+			if (vma->vm_mm == t->mm)
+				add_to_kill(t, page, vma, to_kill, tkc);
 		}
 	}
 	read_unlock(&tasklist_lock);
@@ -440,10 +469,9 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
 		pgoff_t pgoff = page_pgoff(page);
-
-		if (!task_early_kill(tsk, force_early))
+		struct task_struct *t = task_early_kill(tsk, force_early);
+		if (!t)
 			continue;
-
 		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
 				      pgoff) {
 			/*
@@ -453,8 +481,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 			 * Assume applications who requested early kill want
 			 * to be informed of all such data corruptions.
 			 */
-			if (vma->vm_mm == tsk->mm)
-				add_to_kill(tsk, page, vma, to_kill, tkc);
+			if (vma->vm_mm == t->mm)
+				add_to_kill(t, page, vma, to_kill, tkc);
 		}
 	}
 	read_unlock(&tasklist_lock);
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

[parent not found: <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>]

* Re: [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
       [not found]                 ` <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-28 22:00                   ` Tony Luck
  2014-05-29  1:45                     ` Naoya Horiguchi
                                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Tony Luck @ 2014-05-28 22:00 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kamil Iskra, Linux Kernel Mailing List, linux-mm@kvack.org,
	Andi Kleen, Borislav Petkov, Chen Gong

On Wed, May 28, 2014 at 11:47 AM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> Could you take a look?

It looks good - and should be a workable API for
application writers to use.

> @@ -84,6 +84,11 @@ PR_MCE_KILL
>                 PR_MCE_KILL_EARLY: Early kill
>                 PR_MCE_KILL_LATE:  Late kill
>                 PR_MCE_KILL_DEFAULT: Use system global default
> +       Note that if you want to have a dedicated thread which handles
> +       the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
> +       call prctl() on the thread. Otherwise, the SIGBUS is sent to
> +       the main thread.

Perhaps be more explicit here that the user should call
prctl(PR_MCE_KILL_EARLY) on the designated thread
to get this behavior?  The user could also mark more than
one thread in this way - in which case the kernel will pick
the first one it sees (is that oldest, or newest?) that is marked.
Not sure if this would ever be useful unless you want to pass
responsibility around in an application that is dynamically
creating and removing threads.

> +               if (t->flags & PF_MCE_PROCESS && t->flags & PF_MCE_EARLY)

This is correct - but made me twitch to add extra brackets:

                  if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))

or
                  if ((t->flags & (PF_MCE_PROCESS|PF_MCE_EARLY)) ==
PF_MCE_PROCESS|PF_MCE_EARLY)

[oops, no ... that's too long and no clearer]

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
  2014-05-28 22:00                   ` Tony Luck
@ 2014-05-29  1:45                     ` Naoya Horiguchi
       [not found]                     ` <5386915f.4772e50a.0657.ffffcda4SMTPIN_ADDED_BROKEN@mx.google.com>
       [not found]                     ` <1401327939-cvm7qh0m@n-horiguchi@ah.jp.nec.com>
  2 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-29  1:45 UTC (permalink / raw)
  To: tony.luck
  Cc: iskra, linux-kernel, linux-mm, Andi Kleen, Borislav Petkov,
	gong.chen

On Wed, May 28, 2014 at 03:00:11PM -0700, Tony Luck wrote:
> On Wed, May 28, 2014 at 11:47 AM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
> > Could you take a look?
> 
> It looks good - and should be a workable API for
> application writers to use.
> 
> > @@ -84,6 +84,11 @@ PR_MCE_KILL
> >                 PR_MCE_KILL_EARLY: Early kill
> >                 PR_MCE_KILL_LATE:  Late kill
> >                 PR_MCE_KILL_DEFAULT: Use system global default
> > +       Note that if you want to have a dedicated thread which handles
> > +       the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
> > +       call prctl() on the thread. Otherwise, the SIGBUS is sent to
> > +       the main thread.
> 
> Perhaps be more explicit here that the user should call
> prctl(PR_MCE_KILL_EARLY) on the designated thread
> to get this behavior?

OK.

>  The user could also mark more than
> one thread in this way - in which case the kernel will pick
> the first one it sees (is that oldest, or newest?) that is marked.
> Not sure if this would ever be useful unless you want to pass
> responsibility around in an application that is dynamically
> creating and removing threads.

I'm not sure which is better to send signal to first-found marked thread
or to all marked threads. If we have a good reason to do the latter,
I'm ok about it. Any idea?

> 
> > +               if (t->flags & PF_MCE_PROCESS && t->flags & PF_MCE_EARLY)
> 
> This is correct - but made me twitch to add extra brackets:
> 
>                   if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))

OK, I'll take this.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <5386915f.4772e50a.0657.ffffcda4SMTPIN_ADDED_BROKEN@mx.google.com>]

* Re: [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
       [not found]                     ` <5386915f.4772e50a.0657.ffffcda4SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-05-29 17:03                       ` Tony Luck
  2014-05-29 18:38                         ` Naoya Horiguchi
  0 siblings, 1 reply; 31+ messages in thread
From: Tony Luck @ 2014-05-29 17:03 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Kamil Iskra, Linux Kernel Mailing List, linux-mm@kvack.org,
	Andi Kleen, Borislav Petkov, Chen Gong

> OK, I'll take this.

If you didn't already apply it, then add a "Reviewed-by: Tony Luck
<tony.luck@intel,com>"

I see that this patch is on top of my earlier ones (includes the
"force_early" argument).
That means you have both of those queued too?

Thanks

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
  2014-05-29 17:03                       ` Tony Luck
@ 2014-05-29 18:38                         ` Naoya Horiguchi
  2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
  0 siblings, 1 reply; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-29 18:38 UTC (permalink / raw)
  To: tony.luck
  Cc: iskra, linux-kernel, linux-mm, Andi Kleen, Borislav Petkov,
	gong.chen

On Thu, May 29, 2014 at 10:03:17AM -0700, Tony Luck wrote:
> > OK, I'll take this.
> 
> If you didn't already apply it, then add a "Reviewed-by: Tony Luck
> <tony.luck@intel,com>"

Thank you.

> I see that this patch is on top of my earlier ones (includes the
> "force_early" argument).

Right.

> That means you have both of those queued too?

Yes, so I'll publish my tree and ask Andrew to pull it later.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 0/3] HWPOISON: improve memory error handling for multithread process
  2014-05-29 18:38                         ` Naoya Horiguchi
@ 2014-05-30  6:51                           ` Naoya Horiguchi
  2014-05-30  6:51                             ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
                                               ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  6:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

This patchset is the summary of recent discussion about memory error handling
on multithread application. Patch 1 and 2 is for action required errors, and
patch 3 is for action optional errors.

This patchset is based on mmotm-2014-05-21-16-57.

Patches are also available on the following tree/branch.
  git@github.com:Naoya-Horiguchi/linux.git hwpoison/master

Thanks,
Naoya Horiguchi
---
Summary:

Naoya Horiguchi (1):
      mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO)

Tony Luck (2):
      memory-failure: Send right signal code to correct thread
      memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED

 Documentation/vm/hwpoison.txt |  5 +++
 mm/memory-failure.c           | 75 ++++++++++++++++++++++++++++++-------------
 2 files changed, 58 insertions(+), 22 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/3] memory-failure: Send right signal code to correct thread
  2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
@ 2014-05-30  6:51                             ` Naoya Horiguchi
  2014-06-02 22:44                               ` Andrew Morton
  2014-05-30  6:51                             ` [PATCH 2/3] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Naoya Horiguchi
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  6:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

From: Tony Luck <tony.luck@intel.com>

When a thread in a multi-threaded application hits a machine
check because of an uncorrectable error in memory - we want to
send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
Currently we fail to do that if the active thread is not the
primary thread in the process. collect_procs() just finds primary
threads and this test:
	if ((flags & MF_ACTION_REQUIRED) && t == current) {
will see that the thread we found isn't the current thread
and so send a si.si_code = BUS_MCEERR_AO to the primary
(and nothing to the active thread at this time).

We can fix this by checking whether "current" shares the same
mm with the process that collect_procs() said owned the page.
If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).

Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Gong <gong.chen@linux.jf.intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/memory-failure.c mmotm-2014-05-21-16-57/mm/memory-failure.c
index e3154d99b87f..b73098ee91e6 100644
--- mmotm-2014-05-21-16-57.orig/mm/memory-failure.c
+++ mmotm-2014-05-21-16-57/mm/memory-failure.c
@@ -204,9 +204,9 @@ static int kill_proc(struct task_struct *t, unsigned long addr, int trapno,
 #endif
 	si.si_addr_lsb = page_size_order(page) + PAGE_SHIFT;
 
-	if ((flags & MF_ACTION_REQUIRED) && t == current) {
+	if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
 		si.si_code = BUS_MCEERR_AR;
-		ret = force_sig_info(SIGBUS, &si, t);
+		ret = force_sig_info(SIGBUS, &si, current);
 	} else {
 		/*
 		 * Don't use force here, it's convenient if the signal
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] memory-failure: Send right signal code to correct thread
  2014-05-30  6:51                             ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
@ 2014-06-02 22:44                               ` Andrew Morton
  2014-06-03  1:12                                 ` Naoya Horiguchi
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2014-06-02 22:44 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

On Fri, 30 May 2014 02:51:08 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> From: Tony Luck <tony.luck@intel.com>
> 
> When a thread in a multi-threaded application hits a machine
> check because of an uncorrectable error in memory - we want to
> send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
> Currently we fail to do that if the active thread is not the
> primary thread in the process. collect_procs() just finds primary
> threads and this test:
> 	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> will see that the thread we found isn't the current thread
> and so send a si.si_code = BUS_MCEERR_AO to the primary
> (and nothing to the active thread at this time).
> 
> We can fix this by checking whether "current" shares the same
> mm with the process that collect_procs() said owned the page.
> If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).
> 
> Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Cc: Andi Kleen <andi@firstfloor.org>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Chen Gong <gong.chen@linux.jf.intel.com>
> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

You were on the patch delivery path, so it should have included your
signed-off-by.  Documentation/SubmittingPatches section 12 has the
details.

I have made that change to my copies of patches 1 and 2.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/3] memory-failure: Send right signal code to correct thread
  2014-06-02 22:44                               ` Andrew Morton
@ 2014-06-03  1:12                                 ` Naoya Horiguchi
  0 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-06-03  1:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

On Mon, Jun 02, 2014 at 03:44:31PM -0700, Andrew Morton wrote:
> On Fri, 30 May 2014 02:51:08 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > From: Tony Luck <tony.luck@intel.com>
> > 
> > When a thread in a multi-threaded application hits a machine
> > check because of an uncorrectable error in memory - we want to
> > send the SIGBUS with si.si_code = BUS_MCEERR_AR to that thread.
> > Currently we fail to do that if the active thread is not the
> > primary thread in the process. collect_procs() just finds primary
> > threads and this test:
> > 	if ((flags & MF_ACTION_REQUIRED) && t == current) {
> > will see that the thread we found isn't the current thread
> > and so send a si.si_code = BUS_MCEERR_AO to the primary
> > (and nothing to the active thread at this time).
> > 
> > We can fix this by checking whether "current" shares the same
> > mm with the process that collect_procs() said owned the page.
> > If so, we send the SIGBUS to current (with code BUS_MCEERR_AR).
> > 
> > Reported-by: Otto Bruggeman <otto.g.bruggeman@intel.com>
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Cc: Andi Kleen <andi@firstfloor.org>
> > Cc: Borislav Petkov <bp@suse.de>
> > Cc: Chen Gong <gong.chen@linux.jf.intel.com>
> > Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> 
> You were on the patch delivery path, so it should have included your
> signed-off-by.  Documentation/SubmittingPatches section 12 has the
> details.

Sorry, I didn't know that.

> I have made that change to my copies of patches 1 and 2.

Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/3] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED
  2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
  2014-05-30  6:51                             ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
@ 2014-05-30  6:51                             ` Naoya Horiguchi
  2014-05-30  6:51                             ` [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Naoya Horiguchi
  2014-05-30 17:25                             ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Luck, Tony
  3 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  6:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

From: Tony Luck <tony.luck@intel.com>

When Linux sees an "action optional" machine check (where h/w has
reported an error that is not in the current execution path) we
generally do not want to signal a process, since most processes
do not have a SIGBUS handler - we'd just prematurely terminate the
process for a problem that they might never actually see.

task_early_kill() decides whether to consider a process - and it
checks whether this specific process has been marked for early signals
with "prctl", or if the system administrator has requested early
signals for all processes using /proc/sys/vm/memory_failure_early_kill.

But for MF_ACTION_REQUIRED case we must not defer. The error is in
the execution path of the current thread so we must send the SIGBUS
immediatley.

Fix by passing a flag argument through collect_procs*() to
task_early_kill() so it knows whether we can defer or must
take action.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Gong <gong.chen@linux.jf.intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/mm/memory-failure.c mmotm-2014-05-21-16-57/mm/memory-failure.c
index b73098ee91e6..fbcdb1d54c55 100644
--- mmotm-2014-05-21-16-57.orig/mm/memory-failure.c
+++ mmotm-2014-05-21-16-57/mm/memory-failure.c
@@ -380,10 +380,12 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
 	}
 }
 
-static int task_early_kill(struct task_struct *tsk)
+static int task_early_kill(struct task_struct *tsk, int force_early)
 {
 	if (!tsk->mm)
 		return 0;
+	if (force_early)
+		return 1;
 	if (tsk->flags & PF_MCE_PROCESS)
 		return !!(tsk->flags & PF_MCE_EARLY);
 	return sysctl_memory_failure_early_kill;
@@ -393,7 +395,7 @@ static int task_early_kill(struct task_struct *tsk)
  * Collect processes when the error hit an anonymous page.
  */
 static void collect_procs_anon(struct page *page, struct list_head *to_kill,
-			      struct to_kill **tkc)
+			      struct to_kill **tkc, int force_early)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -409,7 +411,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
 
-		if (!task_early_kill(tsk))
+		if (!task_early_kill(tsk, force_early))
 			continue;
 		anon_vma_interval_tree_foreach(vmac, &av->rb_root,
 					       pgoff, pgoff) {
@@ -428,7 +430,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
  * Collect processes when the error hit a file mapped page.
  */
 static void collect_procs_file(struct page *page, struct list_head *to_kill,
-			      struct to_kill **tkc)
+			      struct to_kill **tkc, int force_early)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -439,7 +441,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	for_each_process(tsk) {
 		pgoff_t pgoff = page_pgoff(page);
 
-		if (!task_early_kill(tsk))
+		if (!task_early_kill(tsk, force_early))
 			continue;
 
 		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
@@ -465,7 +467,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
  * First preallocate one tokill structure outside the spin locks,
  * so that we can kill at least one process reasonably reliable.
  */
-static void collect_procs(struct page *page, struct list_head *tokill)
+static void collect_procs(struct page *page, struct list_head *tokill,
+				int force_early)
 {
 	struct to_kill *tk;
 
@@ -476,9 +479,9 @@ static void collect_procs(struct page *page, struct list_head *tokill)
 	if (!tk)
 		return;
 	if (PageAnon(page))
-		collect_procs_anon(page, tokill, &tk);
+		collect_procs_anon(page, tokill, &tk, force_early);
 	else
-		collect_procs_file(page, tokill, &tk);
+		collect_procs_file(page, tokill, &tk, force_early);
 	kfree(tk);
 }
 
@@ -963,7 +966,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * there's nothing that can be done.
 	 */
 	if (kill)
-		collect_procs(ppage, &tokill);
+		collect_procs(ppage, &tokill, flags & MF_ACTION_REQUIRED);
 
 	ret = try_to_unmap(ppage, ttu);
 	if (ret != SWAP_SUCCESS)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO)
  2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
  2014-05-30  6:51                             ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
  2014-05-30  6:51                             ` [PATCH 2/3] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Naoya Horiguchi
@ 2014-05-30  6:51                             ` Naoya Horiguchi
  2014-06-02 22:42                               ` Andrew Morton
  2014-05-30 17:25                             ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Luck, Tony
  3 siblings, 1 reply; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-30  6:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

Currently memory error handler handles action optional errors in the deferred
manner by default. And if a recovery aware application wants to handle it
immediately, it can do it by setting PF_MCE_EARLY flag. However, such signal
can be sent only to the main thread, so it's problematic if the application
wants to have a dedicated thread to handler such signals.

So this patch adds dedicated thread support to memory error handler. We have
PF_MCE_EARLY flags for each thread separately, so with this patch AO signal
is sent to the thread with PF_MCE_EARLY flag set, not the main thread. If
you want to implement a dedicated thread, you call prctl() to set PF_MCE_EARLY
on the thread.

Memory error handler collects processes to be killed, so this patch lets it
check PF_MCE_EARLY flag on each thread in the collecting routines.

No behavioral change for all non-early kill cases.

ChangeLog:
- document more specifically
- add parenthesis in find_early_kill_thread()
- move position of find_early_kill_thread() and task_early_kill()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Kamil Iskra <iskra@mcs.anl.gov>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chen Gong <gong.chen@linux.jf.intel.com>
---
 Documentation/vm/hwpoison.txt |  5 ++++
 mm/memory-failure.c           | 58 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git mmotm-2014-05-21-16-57.orig/Documentation/vm/hwpoison.txt mmotm-2014-05-21-16-57/Documentation/vm/hwpoison.txt
index 550068466605..6ae89a9edf2a 100644
--- mmotm-2014-05-21-16-57.orig/Documentation/vm/hwpoison.txt
+++ mmotm-2014-05-21-16-57/Documentation/vm/hwpoison.txt
@@ -84,6 +84,11 @@ PR_MCE_KILL
 		PR_MCE_KILL_EARLY: Early kill
 		PR_MCE_KILL_LATE:  Late kill
 		PR_MCE_KILL_DEFAULT: Use system global default
+	Note that if you want to have a dedicated thread which handles
+	the SIGBUS(BUS_MCEERR_AO) on behalf of the process, you should
+	call prctl(PR_MCE_KILL_EARLY) on the designated thread. Otherwise,
+	the SIGBUS is sent to the main thread.
+
 PR_MCE_KILL_GET
 	return current mode
 
diff --git mmotm-2014-05-21-16-57.orig/mm/memory-failure.c mmotm-2014-05-21-16-57/mm/memory-failure.c
index fbcdb1d54c55..9751e19ab13b 100644
--- mmotm-2014-05-21-16-57.orig/mm/memory-failure.c
+++ mmotm-2014-05-21-16-57/mm/memory-failure.c
@@ -380,15 +380,44 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
 	}
 }
 
-static int task_early_kill(struct task_struct *tsk, int force_early)
+/*
+ * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
+ * on behalf of the thread group. Return task_struct of the (first found)
+ * dedicated thread if found, and return NULL otherwise.
+ */
+static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
+{
+	struct task_struct *t;
+	rcu_read_lock();
+	for_each_thread(tsk, t)
+		if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
+			goto found;
+	t = NULL;
+found:
+	rcu_read_unlock();
+	return t;
+}
+
+/*
+ * Determine whether a given process is "early kill" process which expects
+ * to be signaled when some page under the process is hwpoisoned.
+ * Return task_struct of the dedicated thread (main thread unless explicitly
+ * specified) if the process is "early kill," and otherwise returns NULL.
+ */
+static struct task_struct *task_early_kill(struct task_struct *tsk,
+					   int force_early)
 {
+	struct task_struct *t;
 	if (!tsk->mm)
-		return 0;
+		return NULL;
 	if (force_early)
-		return 1;
-	if (tsk->flags & PF_MCE_PROCESS)
-		return !!(tsk->flags & PF_MCE_EARLY);
-	return sysctl_memory_failure_early_kill;
+		return tsk;
+	t = find_early_kill_thread(tsk);
+	if (t)
+		return t;
+	if (sysctl_memory_failure_early_kill)
+		return tsk;
+	return NULL;
 }
 
 /*
@@ -410,16 +439,16 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	read_lock(&tasklist_lock);
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
-
-		if (!task_early_kill(tsk, force_early))
+		struct task_struct *t = task_early_kill(tsk, force_early);
+		if (!t)
 			continue;
 		anon_vma_interval_tree_foreach(vmac, &av->rb_root,
 					       pgoff, pgoff) {
 			vma = vmac->vma;
 			if (!page_mapped_in_vma(page, vma))
 				continue;
-			if (vma->vm_mm == tsk->mm)
-				add_to_kill(tsk, page, vma, to_kill, tkc);
+			if (vma->vm_mm == t->mm)
+				add_to_kill(t, page, vma, to_kill, tkc);
 		}
 	}
 	read_unlock(&tasklist_lock);
@@ -440,10 +469,9 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	read_lock(&tasklist_lock);
 	for_each_process(tsk) {
 		pgoff_t pgoff = page_pgoff(page);
-
-		if (!task_early_kill(tsk, force_early))
+		struct task_struct *t = task_early_kill(tsk, force_early);
+		if (!t)
 			continue;
-
 		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
 				      pgoff) {
 			/*
@@ -453,8 +481,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 			 * Assume applications who requested early kill want
 			 * to be informed of all such data corruptions.
 			 */
-			if (vma->vm_mm == tsk->mm)
-				add_to_kill(tsk, page, vma, to_kill, tkc);
+			if (vma->vm_mm == t->mm)
+				add_to_kill(t, page, vma, to_kill, tkc);
 		}
 	}
 	read_unlock(&tasklist_lock);
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO)
  2014-05-30  6:51                             ` [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Naoya Horiguchi
@ 2014-06-02 22:42                               ` Andrew Morton
  2014-06-03  1:03                                 ` Naoya Horiguchi
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2014-06-02 22:42 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

On Fri, 30 May 2014 02:51:10 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> Currently memory error handler handles action optional errors in the deferred
> manner by default. And if a recovery aware application wants to handle it
> immediately, it can do it by setting PF_MCE_EARLY flag. However, such signal
> can be sent only to the main thread, so it's problematic if the application
> wants to have a dedicated thread to handler such signals.
> 
> So this patch adds dedicated thread support to memory error handler. We have
> PF_MCE_EARLY flags for each thread separately, so with this patch AO signal
> is sent to the thread with PF_MCE_EARLY flag set, not the main thread. If
> you want to implement a dedicated thread, you call prctl() to set PF_MCE_EARLY
> on the thread.
> 
> Memory error handler collects processes to be killed, so this patch lets it
> check PF_MCE_EARLY flag on each thread in the collecting routines.
> 
> No behavioral change for all non-early kill cases.
> 
> ...
>
> --- mmotm-2014-05-21-16-57.orig/mm/memory-failure.c
> +++ mmotm-2014-05-21-16-57/mm/memory-failure.c
> @@ -380,15 +380,44 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
>  	}
>  }
>  
> -static int task_early_kill(struct task_struct *tsk, int force_early)
> +/*
> + * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
> + * on behalf of the thread group. Return task_struct of the (first found)
> + * dedicated thread if found, and return NULL otherwise.
> + */
> +static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
> +{
> +	struct task_struct *t;
> +	rcu_read_lock();
> +	for_each_thread(tsk, t)
> +		if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
> +			goto found;
> +	t = NULL;
> +found:
> +	rcu_read_unlock();
> +	return t;
> +}
> +
> +/*
> + * Determine whether a given process is "early kill" process which expects
> + * to be signaled when some page under the process is hwpoisoned.
> + * Return task_struct of the dedicated thread (main thread unless explicitly
> + * specified) if the process is "early kill," and otherwise returns NULL.
> + */
> +static struct task_struct *task_early_kill(struct task_struct *tsk,
> +					   int force_early)
>  {
> +	struct task_struct *t;
>  	if (!tsk->mm)
> -		return 0;
> +		return NULL;
>  	if (force_early)
> -		return 1;
> -	if (tsk->flags & PF_MCE_PROCESS)
> -		return !!(tsk->flags & PF_MCE_EARLY);
> -	return sysctl_memory_failure_early_kill;
> +		return tsk;
> +	t = find_early_kill_thread(tsk);
> +	if (t)
> +		return t;
> +	if (sysctl_memory_failure_early_kill)
> +		return tsk;
> +	return NULL;
>  }

The above two functions are to be called under
read_lock(tasklist_lock), which is rather important...

Given this requirement, did find_early_kill_thread() need rcu_read_lock()?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO)
  2014-06-02 22:42                               ` Andrew Morton
@ 2014-06-03  1:03                                 ` Naoya Horiguchi
  0 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-06-03  1:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

On Mon, Jun 02, 2014 at 03:42:07PM -0700, Andrew Morton wrote:
> On Fri, 30 May 2014 02:51:10 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> 
> > Currently memory error handler handles action optional errors in the deferred
> > manner by default. And if a recovery aware application wants to handle it
> > immediately, it can do it by setting PF_MCE_EARLY flag. However, such signal
> > can be sent only to the main thread, so it's problematic if the application
> > wants to have a dedicated thread to handler such signals.
> > 
> > So this patch adds dedicated thread support to memory error handler. We have
> > PF_MCE_EARLY flags for each thread separately, so with this patch AO signal
> > is sent to the thread with PF_MCE_EARLY flag set, not the main thread. If
> > you want to implement a dedicated thread, you call prctl() to set PF_MCE_EARLY
> > on the thread.
> > 
> > Memory error handler collects processes to be killed, so this patch lets it
> > check PF_MCE_EARLY flag on each thread in the collecting routines.
> > 
> > No behavioral change for all non-early kill cases.
> > 
> > ...
> >
> > --- mmotm-2014-05-21-16-57.orig/mm/memory-failure.c
> > +++ mmotm-2014-05-21-16-57/mm/memory-failure.c
> > @@ -380,15 +380,44 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
> >  	}
> >  }
> >  
> > -static int task_early_kill(struct task_struct *tsk, int force_early)
> > +/*
> > + * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
> > + * on behalf of the thread group. Return task_struct of the (first found)
> > + * dedicated thread if found, and return NULL otherwise.
> > + */
> > +static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
> > +{
> > +	struct task_struct *t;
> > +	rcu_read_lock();
> > +	for_each_thread(tsk, t)
> > +		if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
> > +			goto found;
> > +	t = NULL;
> > +found:
> > +	rcu_read_unlock();
> > +	return t;
> > +}
> > +
> > +/*
> > + * Determine whether a given process is "early kill" process which expects
> > + * to be signaled when some page under the process is hwpoisoned.
> > + * Return task_struct of the dedicated thread (main thread unless explicitly
> > + * specified) if the process is "early kill," and otherwise returns NULL.
> > + */
> > +static struct task_struct *task_early_kill(struct task_struct *tsk,
> > +					   int force_early)
> >  {
> > +	struct task_struct *t;
> >  	if (!tsk->mm)
> > -		return 0;
> > +		return NULL;
> >  	if (force_early)
> > -		return 1;
> > -	if (tsk->flags & PF_MCE_PROCESS)
> > -		return !!(tsk->flags & PF_MCE_EARLY);
> > -	return sysctl_memory_failure_early_kill;
> > +		return tsk;
> > +	t = find_early_kill_thread(tsk);
> > +	if (t)
> > +		return t;
> > +	if (sysctl_memory_failure_early_kill)
> > +		return tsk;
> > +	return NULL;
> >  }
> 
> The above two functions are to be called under
> read_lock(tasklist_lock), which is rather important...
> 
> Given this requirement, did find_early_kill_thread() need rcu_read_lock()?

Right, we don't need this rcu_read_lock(). The following hunk should fix it.

Thanks,
Naoya Horiguchi

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index b0f48e34dec5..6fdc9a2eeb2f 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -297,18 +297,17 @@ struct to_kill {
  * Find a dedicated thread which is supposed to handle SIGBUS(BUS_MCEERR_AO)
  * on behalf of the thread group. Return task_struct of the (first found)
  * dedicated thread if found, and return NULL otherwise.
+ *
+ * We already hold read_lock(&tasklist_lock) in the caller, so we don't
+ * have to call rcu_read_lock/unlock() in this function.
  */
 static struct task_struct *find_early_kill_thread(struct task_struct *tsk)
 {
 	struct task_struct *t;
-	rcu_read_lock();
 	for_each_thread(tsk, t)
 		if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
-			goto found;
-	t = NULL;
-found:
-	rcu_read_unlock();
-	return t;
+			return t;
+	return NULL;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/3] HWPOISON: improve memory error handling for multithread process
  2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
                                               ` (2 preceding siblings ...)
  2014-05-30  6:51                             ` [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Naoya Horiguchi
@ 2014-05-30 17:25                             ` Luck, Tony
  2014-05-30 18:24                               ` Naoya Horiguchi
       [not found]                               ` <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>
  3 siblings, 2 replies; 31+ messages in thread
From: Luck, Tony @ 2014-05-30 17:25 UTC (permalink / raw)
  To: Naoya Horiguchi, Andrew Morton
  Cc: Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

> This patchset is the summary of recent discussion about memory error handling
> on multithread application. Patch 1 and 2 is for action required errors, and
> patch 3 is for action optional errors.

Naoya,

You suggested early in the discussion (when there were just two patches) that
they deserved a "Cc: stable@vger.kernel.org".  I agreed, and still think the same
way.

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/3] HWPOISON: improve memory error handling for multithread process
  2014-05-30 17:25                             ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Luck, Tony
@ 2014-05-30 18:24                               ` Naoya Horiguchi
       [not found]                               ` <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-30 18:24 UTC (permalink / raw)
  To: Tony Luck
  Cc: Andrew Morton, Andi Kleen, Kamil Iskra, Borislav Petkov,
	Chen Gong, linux-kernel, linux-mm

On Fri, May 30, 2014 at 05:25:39PM +0000, Luck, Tony wrote:
> > This patchset is the summary of recent discussion about memory error handling
> > on multithread application. Patch 1 and 2 is for action required errors, and
> > patch 3 is for action optional errors.
> 
> Naoya,
> 
> You suggested early in the discussion (when there were just two patches) that
> they deserved a "Cc: stable@vger.kernel.org".  I agreed, and still think the same
> way.

Correct. AR error handling was added in v3.2-rc5, so adding
"Cc: stable@vger.kernel.org # v3.2+" is fine.

Thanks,
Naoya Horiguchi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>]

* Re: [PATCH 0/3] HWPOISON: improve memory error handling for multithread process
       [not found]                               ` <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2014-06-02 22:43                                 ` Andrew Morton
  2014-06-02 23:37                                   ` Luck, Tony
  0 siblings, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2014-06-02 22:43 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Tony Luck, Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel, linux-mm

On Fri, 30 May 2014 14:24:52 -0400 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> On Fri, May 30, 2014 at 05:25:39PM +0000, Luck, Tony wrote:
> > > This patchset is the summary of recent discussion about memory error handling
> > > on multithread application. Patch 1 and 2 is for action required errors, and
> > > patch 3 is for action optional errors.
> > 
> > Naoya,
> > 
> > You suggested early in the discussion (when there were just two patches) that
> > they deserved a "Cc: stable@vger.kernel.org".  I agreed, and still think the same
> > way.
> 
> Correct. AR error handling was added in v3.2-rc5, so adding
> "Cc: stable@vger.kernel.org # v3.2+" is fine.

I'm not sure that "[PATCH 3/3] mm/memory-failure.c: support dedicated
thread to handle SIGBUS(BUS_MCEERR_AO)" is a -stable thing?  That's a
feature addition more than a bugfix?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/3] HWPOISON: improve memory error handling for multithread process
  2014-06-02 22:43                                 ` Andrew Morton
@ 2014-06-02 23:37                                   ` Luck, Tony
  0 siblings, 0 replies; 31+ messages in thread
From: Luck, Tony @ 2014-06-02 23:37 UTC (permalink / raw)
  To: Andrew Morton, Naoya Horiguchi
  Cc: Andi Kleen, Kamil Iskra, Borislav Petkov, Chen Gong,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org

> I'm not sure that "[PATCH 3/3] mm/memory-failure.c: support dedicated
> thread to handle SIGBUS(BUS_MCEERR_AO)" is a -stable thing?  That's a
> feature addition more than a bugfix?

No - the old behavior was crazy - someone with a multithreaded process might
well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then that
thread would see the SIGBUS  with si_code = BUS_MCEERR_A0 - even if that
thread wasn't the main thread for the process.

Perhaps the description for the commit should better reflect that?

-Tony



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <1401327939-cvm7qh0m@n-horiguchi@ah.jp.nec.com>]

* Re: [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread
       [not found]                     ` <1401327939-cvm7qh0m@n-horiguchi@ah.jp.nec.com>
@ 2014-05-30 19:52                       ` Kamil Iskra
  0 siblings, 0 replies; 31+ messages in thread
From: Kamil Iskra @ 2014-05-30 19:52 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: tony.luck, linux-kernel, linux-mm, Andi Kleen, Borislav Petkov,
	gong.chen

On Wed, May 28, 2014 at 21:45:41 -0400, Naoya Horiguchi wrote:

> >  The user could also mark more than
> > one thread in this way - in which case the kernel will pick
> > the first one it sees (is that oldest, or newest?) that is marked.
> > Not sure if this would ever be useful unless you want to pass
> > responsibility around in an application that is dynamically
> > creating and removing threads.
> 
> I'm not sure which is better to send signal to first-found marked thread
> or to all marked threads. If we have a good reason to do the latter,
> I'm ok about it. Any idea?

Well, it would be more flexible if the signal were sent to all marked
threads, but I don't know if that constitutes a good enough reason to add
the extra complexity involved.  Sometimes better is the enemy of good, and
in this case the patch you proposed should be good enough for any practical
case I can think of.

Naoya, Tony, thank you for taking the leadership on this issue and seeing
it through, and for the courtesy of keeping me in the loop!

Kamil

-- 
Kamil Iskra, PhD
Argonne National Laboratory, Mathematics and Computer Science Division
9700 South Cass Avenue, Building 240, Argonne, IL 60439, USA
phone: +1-630-252-7197  fax: +1-630-252-5986

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED
  2014-05-20 17:35 [PATCH 0/2] Fix some machine check application recovery cases Tony Luck
  2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
@ 2014-05-20 16:46 ` Tony Luck
  2014-05-20 17:59   ` Naoya Horiguchi
  1 sibling, 1 reply; 31+ messages in thread
From: Tony Luck @ 2014-05-20 16:46 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: Andi Kleen, Borislav Petkov, Chen Gong

When Linux sees an "action optional" machine check (where h/w has
reported an error that is not in the current execution path) we
generally do not want to signal a process, since most processes
do not have a SIGBUS handler - we'd just prematurely terminate the
process for a problem that they might never actually see.

task_early_kill() decides whether to consider a process - and it
checks whether this specific process has been marked for early signals
with "prctl", or if the system administrator has requested early
signals for all processes using /proc/sys/vm/memory_failure_early_kill.

But for MF_ACTION_REQUIRED case we must not defer. The error is in
the execution path of the current thread so we must send the SIGBUS
immediatley.

Fix by passing a flag argument through collect_procs*() to
task_early_kill() so it knows whether we can defer or must
take action.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 mm/memory-failure.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 642c8434b166..f0967f72991c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -380,10 +380,12 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
 	}
 }
 
-static int task_early_kill(struct task_struct *tsk)
+static int task_early_kill(struct task_struct *tsk, int force_early)
 {
 	if (!tsk->mm)
 		return 0;
+	if (force_early)
+		return 1;
 	if (tsk->flags & PF_MCE_PROCESS)
 		return !!(tsk->flags & PF_MCE_EARLY);
 	return sysctl_memory_failure_early_kill;
@@ -393,7 +395,7 @@ static int task_early_kill(struct task_struct *tsk)
  * Collect processes when the error hit an anonymous page.
  */
 static void collect_procs_anon(struct page *page, struct list_head *to_kill,
-			      struct to_kill **tkc)
+			      struct to_kill **tkc, int force_early)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -409,7 +411,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
 	for_each_process (tsk) {
 		struct anon_vma_chain *vmac;
 
-		if (!task_early_kill(tsk))
+		if (!task_early_kill(tsk, force_early))
 			continue;
 		anon_vma_interval_tree_foreach(vmac, &av->rb_root,
 					       pgoff, pgoff) {
@@ -428,7 +430,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
  * Collect processes when the error hit a file mapped page.
  */
 static void collect_procs_file(struct page *page, struct list_head *to_kill,
-			      struct to_kill **tkc)
+			      struct to_kill **tkc, int force_early)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -439,7 +441,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
 	for_each_process(tsk) {
 		pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
 
-		if (!task_early_kill(tsk))
+		if (!task_early_kill(tsk, force_early))
 			continue;
 
 		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
@@ -465,7 +467,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
  * First preallocate one tokill structure outside the spin locks,
  * so that we can kill at least one process reasonably reliable.
  */
-static void collect_procs(struct page *page, struct list_head *tokill)
+static void collect_procs(struct page *page, struct list_head *tokill,
+				int force_early)
 {
 	struct to_kill *tk;
 
@@ -476,9 +479,9 @@ static void collect_procs(struct page *page, struct list_head *tokill)
 	if (!tk)
 		return;
 	if (PageAnon(page))
-		collect_procs_anon(page, tokill, &tk);
+		collect_procs_anon(page, tokill, &tk, force_early);
 	else
-		collect_procs_file(page, tokill, &tk);
+		collect_procs_file(page, tokill, &tk, force_early);
 	kfree(tk);
 }
 
@@ -963,7 +966,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	 * there's nothing that can be done.
 	 */
 	if (kill)
-		collect_procs(ppage, &tokill);
+		collect_procs(ppage, &tokill, flags & MF_ACTION_REQUIRED);
 
 	ret = try_to_unmap(ppage, ttu);
 	if (ret != SWAP_SUCCESS)
-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED
  2014-05-20 16:46 ` [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Tony Luck
@ 2014-05-20 17:59   ` Naoya Horiguchi
  0 siblings, 0 replies; 31+ messages in thread
From: Naoya Horiguchi @ 2014-05-20 17:59 UTC (permalink / raw)
  To: Tony Luck; +Cc: linux-kernel, linux-mm, Andi Kleen, bp, gong.chen

On Tue, May 20, 2014 at 09:46:43AM -0700, Tony Luck wrote:
> When Linux sees an "action optional" machine check (where h/w has
> reported an error that is not in the current execution path) we
> generally do not want to signal a process, since most processes
> do not have a SIGBUS handler - we'd just prematurely terminate the
> process for a problem that they might never actually see.
> 
> task_early_kill() decides whether to consider a process - and it
> checks whether this specific process has been marked for early signals
> with "prctl", or if the system administrator has requested early
> signals for all processes using /proc/sys/vm/memory_failure_early_kill.
> 
> But for MF_ACTION_REQUIRED case we must not defer. The error is in
> the execution path of the current thread so we must send the SIGBUS
> immediatley.
> 
> Fix by passing a flag argument through collect_procs*() to
> task_early_kill() so it knows whether we can defer or must
> take action.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

Thanks,
Naoya Horiguchi

> ---
>  mm/memory-failure.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 642c8434b166..f0967f72991c 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -380,10 +380,12 @@ static void kill_procs(struct list_head *to_kill, int forcekill, int trapno,
>  	}
>  }
>  
> -static int task_early_kill(struct task_struct *tsk)
> +static int task_early_kill(struct task_struct *tsk, int force_early)
>  {
>  	if (!tsk->mm)
>  		return 0;
> +	if (force_early)
> +		return 1;
>  	if (tsk->flags & PF_MCE_PROCESS)
>  		return !!(tsk->flags & PF_MCE_EARLY);
>  	return sysctl_memory_failure_early_kill;
> @@ -393,7 +395,7 @@ static int task_early_kill(struct task_struct *tsk)
>   * Collect processes when the error hit an anonymous page.
>   */
>  static void collect_procs_anon(struct page *page, struct list_head *to_kill,
> -			      struct to_kill **tkc)
> +			      struct to_kill **tkc, int force_early)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> @@ -409,7 +411,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
>  	for_each_process (tsk) {
>  		struct anon_vma_chain *vmac;
>  
> -		if (!task_early_kill(tsk))
> +		if (!task_early_kill(tsk, force_early))
>  			continue;
>  		anon_vma_interval_tree_foreach(vmac, &av->rb_root,
>  					       pgoff, pgoff) {
> @@ -428,7 +430,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
>   * Collect processes when the error hit a file mapped page.
>   */
>  static void collect_procs_file(struct page *page, struct list_head *to_kill,
> -			      struct to_kill **tkc)
> +			      struct to_kill **tkc, int force_early)
>  {
>  	struct vm_area_struct *vma;
>  	struct task_struct *tsk;
> @@ -439,7 +441,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
>  	for_each_process(tsk) {
>  		pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>  
> -		if (!task_early_kill(tsk))
> +		if (!task_early_kill(tsk, force_early))
>  			continue;
>  
>  		vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff,
> @@ -465,7 +467,8 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
>   * First preallocate one tokill structure outside the spin locks,
>   * so that we can kill at least one process reasonably reliable.
>   */
> -static void collect_procs(struct page *page, struct list_head *tokill)
> +static void collect_procs(struct page *page, struct list_head *tokill,
> +				int force_early)
>  {
>  	struct to_kill *tk;
>  
> @@ -476,9 +479,9 @@ static void collect_procs(struct page *page, struct list_head *tokill)
>  	if (!tk)
>  		return;
>  	if (PageAnon(page))
> -		collect_procs_anon(page, tokill, &tk);
> +		collect_procs_anon(page, tokill, &tk, force_early);
>  	else
> -		collect_procs_file(page, tokill, &tk);
> +		collect_procs_file(page, tokill, &tk, force_early);
>  	kfree(tk);
>  }
>  
> @@ -963,7 +966,7 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
>  	 * there's nothing that can be done.
>  	 */
>  	if (kill)
> -		collect_procs(ppage, &tokill);
> +		collect_procs(ppage, &tokill, flags & MF_ACTION_REQUIRED);
>  
>  	ret = try_to_unmap(ppage, ttu);
>  	if (ret != SWAP_SUCCESS)
> -- 
> 1.8.4.1
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2014-06-03  1:12 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-20 17:35 [PATCH 0/2] Fix some machine check application recovery cases Tony Luck
2014-05-20 16:28 ` [PATCH 1/2] memory-failure: Send right signal code to correct thread Tony Luck
2014-05-20 17:54   ` Naoya Horiguchi
     [not found]   ` <1400608486-alyqz521@n-horiguchi@ah.jp.nec.com>
2014-05-20 20:56     ` Luck, Tony
2014-05-23  3:34   ` Chen, Gong
2014-05-23 16:48     ` Tony Luck
2014-05-27 16:16       ` Kamil Iskra
2014-05-27 17:50         ` Naoya Horiguchi
     [not found]         ` <5384d07e.4504e00a.2680.ffff8c31SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-27 22:53           ` Tony Luck
2014-05-28  0:15             ` Naoya Horiguchi
     [not found]             ` <53852abb.867ce00a.3cef.3c7eSMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-28  5:09               ` Tony Luck
2014-05-28 18:47                 ` [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread Naoya Horiguchi
     [not found]                 ` <53862f6c.91148c0a.5fb0.2d0cSMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-28 22:00                   ` Tony Luck
2014-05-29  1:45                     ` Naoya Horiguchi
     [not found]                     ` <5386915f.4772e50a.0657.ffffcda4SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-29 17:03                       ` Tony Luck
2014-05-29 18:38                         ` Naoya Horiguchi
2014-05-30  6:51                           ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Naoya Horiguchi
2014-05-30  6:51                             ` [PATCH 1/3] memory-failure: Send right signal code to correct thread Naoya Horiguchi
2014-06-02 22:44                               ` Andrew Morton
2014-06-03  1:12                                 ` Naoya Horiguchi
2014-05-30  6:51                             ` [PATCH 2/3] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Naoya Horiguchi
2014-05-30  6:51                             ` [PATCH 3/3] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) Naoya Horiguchi
2014-06-02 22:42                               ` Andrew Morton
2014-06-03  1:03                                 ` Naoya Horiguchi
2014-05-30 17:25                             ` [PATCH 0/3] HWPOISON: improve memory error handling for multithread process Luck, Tony
2014-05-30 18:24                               ` Naoya Horiguchi
     [not found]                               ` <5388cd0e.463edd0a.755d.6f61SMTPIN_ADDED_BROKEN@mx.google.com>
2014-06-02 22:43                                 ` Andrew Morton
2014-06-02 23:37                                   ` Luck, Tony
     [not found]                     ` <1401327939-cvm7qh0m@n-horiguchi@ah.jp.nec.com>
2014-05-30 19:52                       ` [PATCH] mm/memory-failure.c: support dedicated thread to handle SIGBUS(BUS_MCEERR_AO) thread Kamil Iskra
2014-05-20 16:46 ` [PATCH 2/2] memory-failure: Don't let collect_procs() skip over processes for MF_ACTION_REQUIRED Tony Luck
2014-05-20 17:59   ` Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).