From: Borislav Petkov <bp@amd64.org>
To: Avi Kivity <avi@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@amd64.org>,
Ingo Molnar <mingo@elte.hu>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Huang, Ying" <ying.huang@intel.com>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier
Date: Mon, 13 Jun 2011 11:55:21 +0200 [thread overview]
Message-ID: <20110613095521.GA26316@aftab> (raw)
In-Reply-To: <4DF5C36A.1040707@redhat.com>
On Mon, Jun 13, 2011 at 03:59:38AM -0400, Avi Kivity wrote:
> On 06/13/2011 08:31 AM, Tony Luck wrote:
> > On Sun, Jun 12, 2011 at 3:38 PM, Borislav Petkov<bp@amd64.org> wrote:
> > > On Thu, Jun 09, 2011 at 05:36:42PM -0400, Luck, Tony wrote:
> > >> From: Tony Luck<tony.luck@intel.com>
> > >>
> > >> Existing user return notifier mechanism is designed to catch a specific
> > >> cpu just as it returns to run any task in user mode. We also need a
> > >> mechanism to catch a specific task.
> > >
> > > Why do we need that? I mean, in the remaining patches we end up either
> > > running memory_failure() or sending signals to a task. Can't we do it
> > > all in the user return notifier and not have a different notifier for
> > > each policy?
> >
> > Unless I'm mis-reading the user-return-notifier code, it is possible that
> > we'll context switch before we get to the notifier. At that point the
> > user-return-notifier TIF bit is passed on from our task to the newly
> > run-able task. But our task is still viable, so another cpu could grab
> > it and start running it ... then we have a race ... will the new task
> > that inherited the notifier unmap the page fast enough, or will there
> > be a loud BANG as the original task runs right into the machine
> > check again.
>
> Right. user-return-notifiers are really a per-cpu notifier, unrelated
> to any specific task. The use of per-task flags was an optimization.
>
> If running into the MCE again is really bad, then you need something
> more, since other threads (or other processes) could run into the same
> page as well.
Well, the #MC handler runs on all CPUs on Intel so what we could do is
set the current task to TASK_STOPPED or _UNINTERRUPTIBLE or something
that doesn't make it viable for scheduling anymore.
Then we can take our time running the notifier since the "problematic"
task won't get scheduled until we're done. Then, when we finish
analyzing the MCE, we either kill it so it has to handle SIGKILL the
next time it gets scheduled or we unmap its page with error in it so
that it #PFs on the next run.
But no, I don't think we can catch all possible situations where a page
is mapped by multiple tasks ...
> If not, do we care? Let it hit the MCE again, as long as
> we'll catch it eventually.
... and in that case we are going to have to let it hit again. Or is
there a way to get to the tasklist of all the tasks mapping a page in
atomic context, stop them from scheduling and run the notifier work in
process context?
Hmmm..
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-06-13 9:55 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 21:25 [RFC] reworked machine check recovery patches Luck, Tony
2011-06-09 21:29 ` [PATCH 01/10] MCE: fixes for mce severity table Luck, Tony
2011-06-09 21:30 ` [PATCH 02/10] MCE: save most severe error information Luck, Tony
2011-06-10 8:06 ` Hidetoshi Seto
2011-06-10 18:08 ` Tony Luck
2011-06-09 21:31 ` [PATCH 03/10] MCE: introduce mce_gather_info() Luck, Tony
2011-06-09 21:32 ` [PATCH 04/10] MCE: Move ADDR/MISC reading code into common function Luck, Tony
2011-06-10 9:33 ` Borislav Petkov
2011-06-10 18:17 ` Tony Luck
2011-06-09 21:33 ` [PATCH 05/10] MCE: Mask out address mask bits below address granuality Luck, Tony
2011-06-10 8:07 ` Hidetoshi Seto
2011-06-10 9:46 ` Borislav Petkov
2011-06-10 19:06 ` Tony Luck
2011-06-11 0:12 ` Andi Kleen
2011-06-10 9:42 ` Borislav Petkov
2011-06-10 19:09 ` Tony Luck
2011-06-09 21:34 ` [PATCH 06/10] HWPOISON: Handle hwpoison in current process Luck, Tony
2011-06-10 8:07 ` Hidetoshi Seto
2011-06-10 20:36 ` Tony Luck
2011-06-09 21:35 ` [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier Luck, Tony
2011-06-10 8:08 ` Hidetoshi Seto
2011-06-10 20:42 ` Tony Luck
2011-06-11 10:24 ` Borislav Petkov
2011-06-12 8:31 ` Avi Kivity
2011-06-12 8:29 ` Avi Kivity
2011-06-12 10:24 ` Borislav Petkov
2011-06-12 10:30 ` Avi Kivity
2011-06-12 13:53 ` Borislav Petkov
2011-06-09 21:36 ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Luck, Tony
2011-06-12 22:38 ` Borislav Petkov
2011-06-13 5:31 ` Tony Luck
2011-06-13 7:59 ` Avi Kivity
2011-06-13 9:55 ` Borislav Petkov [this message]
2011-06-13 11:40 ` Avi Kivity
2011-06-13 12:40 ` Borislav Petkov
2011-06-13 12:47 ` Avi Kivity
2011-06-13 15:12 ` Borislav Petkov
2011-06-13 16:31 ` Avi Kivity
2011-06-13 17:13 ` Tony Luck
2011-06-14 2:50 ` Hidetoshi Seto
2011-06-14 2:51 ` [PATCH 1/2] x86, mce: introduce mce_memory_failure_process Hidetoshi Seto
2011-06-14 2:53 ` [PATCH 2/2] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
2011-06-14 18:02 ` Tony Luck
2011-06-14 18:28 ` Tony Luck
2011-06-15 1:29 ` Hidetoshi Seto
2011-06-15 2:10 ` Tony Luck
2011-06-15 3:17 ` Hidetoshi Seto
2011-06-14 3:09 ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Tony Luck
2011-06-14 11:40 ` Avi Kivity
2011-06-14 13:33 ` Borislav Petkov
2011-06-14 13:43 ` Avi Kivity
2011-06-14 17:13 ` Luck, Tony
2011-06-15 8:51 ` Avi Kivity
2011-06-14 16:59 ` Luck, Tony
2011-06-15 8:52 ` Avi Kivity
2011-06-13 16:43 ` Tony Luck
2011-06-09 21:37 ` [PATCH 09/10] MCE: run through processors with more severe problems first Luck, Tony
2011-06-10 8:09 ` Hidetoshi Seto
2011-06-10 20:49 ` Tony Luck
2011-06-13 22:03 ` Tony Luck
2011-06-14 1:27 ` Hidetoshi Seto
2011-06-14 3:04 ` Tony Luck
2011-06-09 21:38 ` [PATCH 10/10] MCE: Add Action-Required support Luck, Tony
2011-06-10 8:06 ` Hidetoshi Seto
2011-06-10 21:00 ` Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110613095521.GA26316@aftab \
--to=bp@amd64.org \
--cc=avi@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox