From: Borislav Petkov <bp@amd64.org>
To: Tony Luck <tony.luck@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
Ingo Molnar <mingo@elte.hu>, Borislav Petkov <bp@amd64.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Huang, Ying" <ying.huang@intel.com>, Avi Kivity <avi@redhat.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier
Date: Sat, 11 Jun 2011 12:24:51 +0200 [thread overview]
Message-ID: <20110611102451.GD9573@aftab> (raw)
In-Reply-To: <BANLkTikkYevvkUeQz4YYGauF52YcDRdokQ@mail.gmail.com>
Just a couple of notes:
On Fri, Jun 10, 2011 at 04:42:33PM -0400, Tony Luck wrote:
> 2011/6/10 Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>:
> > Now I'm reconsidering the MCE event notification mechanism.
> > One of something nervous is whether it is really required to process
> > "_AO" memory poisoning (i.e. mce_process_ring()) here in a process
> > context that unfortunately interrupted by MCE (or preempted after that).
> > I'm uncertain how long walking though the task_list for unmap will takes,
> > and not sure it is acceptable if the unlucky thread is a kind of latency
> > sensitive...
> >
> > If we can move mce_process_ring() to worker thread completely, what
> > we have to do will be:
> > 1) from NMI context, request non-NMI context by irq_work()
> > 2) from (irq) context, wake up loggers and schedule work if required
> > 3) from worker thread, process "_AO" memory poisoning etc.
> >
> > So now question is why user_return_notifier is needed here.
> > Is it just an alternative of irq_work() for !LOCAL_APIC ?
I think that an x86 CPU with hw poisoning features support makes
LOCAL_APIC's presence almost axiomatic ...
> I switched this notification over from old TIF_MCE_NOTIFY to
> using user_return_notifier to preserve existing semantics, and
> free up that TIF bit for the task notifier for the action required case.
>
> You are right that we should have some better method - the
> shot-gun approach of hitting every task on every cpu in the hope
> that one of them will run our code soon sounds like overkill.
> Using some NMI-saf method to wake a single worker thread
> sounds a good idea for a subsequent clean up.
and we should go ahead and do it correctly from the get-go, methinks:
for the AR case, we definitely need to go to userspace _but_ the
worker thread that does the recovery for us has to be the _next_ thing
scheduled seamlessly following the process that was running when the MCE
happened.
If schedule_work() cannot guarantee us that then we need to do something
else like yield_to() to the worker thread, thus not penalizing the
unlucky thread. Maybe PeterZ, Ingo could comment on whether that is a ok
thing to do. Botomline is, we absolutely want to execute the AR handling
thread _before_ we return to normal process scheduling and thus going to
the risk of executing the thread that caused the AR MCE.
Btw, Tony, you could do away without atomic NMI lists if we change the
user return notifier a bit:
diff --git a/include/linux/user-return-notifier.h b/include/linux/user-return-notifier.h
index 9c4a445..eb3c977 100644
--- a/include/linux/user-return-notifier.h
+++ b/include/linux/user-return-notifier.h
@@ -26,6 +26,11 @@ static inline void propagate_user_return_notify(struct task_struct *prev,
void fire_user_return_notifiers(void);
+static inline void void set_user_return_notifier(struct task_struct *p)
+{
+ set_tsk_thread_flag(p, TIF_USER_RETURN_NOTIFY);
+}
+
static inline void clear_user_return_notifier(struct task_struct *p)
{
clear_tsk_thread_flag(p, TIF_USER_RETURN_NOTIFY);
diff --git a/kernel/user-return-notifier.c b/kernel/user-return-notifier.c
index 92cb706..c54c5d2 100644
--- a/kernel/user-return-notifier.c
+++ b/kernel/user-return-notifier.c
@@ -1,4 +1,3 @@
-
#include <linux/user-return-notifier.h>
#include <linux/percpu.h>
#include <linux/sched.h>
@@ -13,12 +12,23 @@ static DEFINE_PER_CPU(struct hlist_head, return_notifier_list);
*/
void user_return_notifier_register(struct user_return_notifier *urn)
{
- set_tsk_thread_flag(current, TIF_USER_RETURN_NOTIFY);
hlist_add_head(&urn->link, &__get_cpu_var(return_notifier_list));
}
EXPORT_SYMBOL_GPL(user_return_notifier_register);
/*
+ * Request a constant notification when the current cpu returns to userspace.
+ * Must be called in atomic context. The notifier will also be called in atomic
+ * context.
+ */
+void user_return_notifier_register_and_enable(struct user_return_notifier *urn)
+{
+ set_tsk_thread_flag(current, TIF_USER_RETURN_NOTIFY);
+ hlist_add_head(&urn->link, &__get_cpu_var(return_notifier_list));
+}
+EXPORT_SYMBOL_GPL(user_return_notifier_register_and_enable);
+
+/*
* Removes a registered user return notifier. Must be called from atomic
* context, and from the same cpu registration occurred in.
*/
--
and then in the #MC handler you do user_return_notifier_set(),
having registered it during MCE core init. memory_failure() does
clear_user_return_notifier(current) after it finishes. The _enable
version is what others like KVM would still use. Or something to that
effect.
Hmm...
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-06-11 10:25 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 21:25 [RFC] reworked machine check recovery patches Luck, Tony
2011-06-09 21:29 ` [PATCH 01/10] MCE: fixes for mce severity table Luck, Tony
2011-06-09 21:30 ` [PATCH 02/10] MCE: save most severe error information Luck, Tony
2011-06-10 8:06 ` Hidetoshi Seto
2011-06-10 18:08 ` Tony Luck
2011-06-09 21:31 ` [PATCH 03/10] MCE: introduce mce_gather_info() Luck, Tony
2011-06-09 21:32 ` [PATCH 04/10] MCE: Move ADDR/MISC reading code into common function Luck, Tony
2011-06-10 9:33 ` Borislav Petkov
2011-06-10 18:17 ` Tony Luck
2011-06-09 21:33 ` [PATCH 05/10] MCE: Mask out address mask bits below address granuality Luck, Tony
2011-06-10 8:07 ` Hidetoshi Seto
2011-06-10 9:46 ` Borislav Petkov
2011-06-10 19:06 ` Tony Luck
2011-06-11 0:12 ` Andi Kleen
2011-06-10 9:42 ` Borislav Petkov
2011-06-10 19:09 ` Tony Luck
2011-06-09 21:34 ` [PATCH 06/10] HWPOISON: Handle hwpoison in current process Luck, Tony
2011-06-10 8:07 ` Hidetoshi Seto
2011-06-10 20:36 ` Tony Luck
2011-06-09 21:35 ` [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier Luck, Tony
2011-06-10 8:08 ` Hidetoshi Seto
2011-06-10 20:42 ` Tony Luck
2011-06-11 10:24 ` Borislav Petkov [this message]
2011-06-12 8:31 ` Avi Kivity
2011-06-12 8:29 ` Avi Kivity
2011-06-12 10:24 ` Borislav Petkov
2011-06-12 10:30 ` Avi Kivity
2011-06-12 13:53 ` Borislav Petkov
2011-06-09 21:36 ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Luck, Tony
2011-06-12 22:38 ` Borislav Petkov
2011-06-13 5:31 ` Tony Luck
2011-06-13 7:59 ` Avi Kivity
2011-06-13 9:55 ` Borislav Petkov
2011-06-13 11:40 ` Avi Kivity
2011-06-13 12:40 ` Borislav Petkov
2011-06-13 12:47 ` Avi Kivity
2011-06-13 15:12 ` Borislav Petkov
2011-06-13 16:31 ` Avi Kivity
2011-06-13 17:13 ` Tony Luck
2011-06-14 2:50 ` Hidetoshi Seto
2011-06-14 2:51 ` [PATCH 1/2] x86, mce: introduce mce_memory_failure_process Hidetoshi Seto
2011-06-14 2:53 ` [PATCH 2/2] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
2011-06-14 18:02 ` Tony Luck
2011-06-14 18:28 ` Tony Luck
2011-06-15 1:29 ` Hidetoshi Seto
2011-06-15 2:10 ` Tony Luck
2011-06-15 3:17 ` Hidetoshi Seto
2011-06-14 3:09 ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Tony Luck
2011-06-14 11:40 ` Avi Kivity
2011-06-14 13:33 ` Borislav Petkov
2011-06-14 13:43 ` Avi Kivity
2011-06-14 17:13 ` Luck, Tony
2011-06-15 8:51 ` Avi Kivity
2011-06-14 16:59 ` Luck, Tony
2011-06-15 8:52 ` Avi Kivity
2011-06-13 16:43 ` Tony Luck
2011-06-09 21:37 ` [PATCH 09/10] MCE: run through processors with more severe problems first Luck, Tony
2011-06-10 8:09 ` Hidetoshi Seto
2011-06-10 20:49 ` Tony Luck
2011-06-13 22:03 ` Tony Luck
2011-06-14 1:27 ` Hidetoshi Seto
2011-06-14 3:04 ` Tony Luck
2011-06-09 21:38 ` [PATCH 10/10] MCE: Add Action-Required support Luck, Tony
2011-06-10 8:06 ` Hidetoshi Seto
2011-06-10 21:00 ` Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110611102451.GD9573@aftab \
--to=bp@amd64.org \
--cc=a.p.zijlstra@chello.nl \
--cc=avi@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox