All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, Borislav Petkov <bp@amd64.org>,
	linux-kernel@vger.kernel.org, "Huang,
	Ying" <ying.huang@intel.com>, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier
Date: Fri, 10 Jun 2011 17:08:15 +0900	[thread overview]
Message-ID: <4DF1D0EF.7090109@jp.fujitsu.com> (raw)
In-Reply-To: <4df13cae2729896ba5@agluck-desktop.sc.intel.com>

(2011/06/10 6:35), Luck, Tony wrote:
> From: Tony Luck <tony.luck@intel.com>
> 
> Ingo wrote:
>> We already have a generic facility to do such things at
>> return-to-userspace: _TIF_USER_RETURN_NOTIFY.
> 
> This just a proof of concept patch ... before this can become
> real the user-return-notifier code would have to be made NMI
> safe (currently it uses hlist_add_head/hlist_del, which would
> need to be changed to Ying's NMI-safe single threaded lists).
> 
> Reviewed-by: Avi Kivity <avi@redhat.com>
> NOT-Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/Kconfig                 |    1 +
>  arch/x86/kernel/cpu/mcheck/mce.c |   47 +++++++++++++++++++++++++++----------
>  arch/x86/kernel/signal.c         |    6 -----
>  3 files changed, 35 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cc6c53a..054e127 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -845,6 +845,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
>  
>  config X86_MCE
>  	bool "Machine Check / overheating reporting"
> +	select USER_RETURN_NOTIFIER
>  	---help---
>  	  Machine Check support allows the processor to notify the
>  	  kernel if it detects a problem (e.g. overheating, data corruption).
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index ffc8d11..28d223e 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -38,6 +38,7 @@
>  #include <linux/mm.h>
>  #include <linux/debugfs.h>
>  #include <linux/edac_mce.h>
> +#include <linux/user-return-notifier.h>
>  
>  #include <asm/processor.h>
>  #include <asm/hw_irq.h>
> @@ -69,6 +70,15 @@ atomic_t mce_entry;
>  
>  DEFINE_PER_CPU(unsigned, mce_exception_count);
>  
> +struct mce_notify {
> +	struct user_return_notifier urn;
> +	bool registered;
> +};
> +
> +static void mce_do_notify(struct user_return_notifier *urn);
> +
> +static DEFINE_PER_CPU(struct mce_notify, mce_notify);
> +
>  /*
>   * Tolerant levels:
>   *   0: always panic on uncorrected errors, log corrected errors
> @@ -947,6 +957,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  	int i;
>  	int worst = 0;
>  	int severity;
> +	struct mce_notify *np;
>  	/*
>  	 * Establish sequential order between the CPUs entering the machine
>  	 * check handler.
> @@ -1099,7 +1110,12 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  		force_sig(SIGBUS, current);
>  
>  	/* notify userspace ASAP */
> -	set_thread_flag(TIF_MCE_NOTIFY);
> +	np = &__get_cpu_var(mce_notify);
> +	if (np->registered == 0) {
> +		np->urn.on_user_return = mce_do_notify;
> +		user_return_notifier_register(&np->urn);
> +		np->registered = 1;
> +	}
>  
>  	if (worst > 0)
>  		mce_report_event(regs);
> @@ -1116,28 +1132,35 @@ void __attribute__((weak)) memory_failure(unsigned long pfn, int vector)
>  	printk(KERN_ERR "Action optional memory failure at %lx ignored\n", pfn);
>  }
>  
> +static void mce_process_ring(void)
> +{
> +	unsigned long pfn;
> +
> +	mce_notify_irq();
> +	while (mce_ring_get(&pfn))
> +		memory_failure(pfn, MCE_VECTOR);
> +}
> +
>  /*
>   * Called after mce notification in process context. This code
>   * is allowed to sleep. Call the high level VM handler to process
>   * any corrupted pages.
>   * Assume that the work queue code only calls this one at a time
>   * per CPU.
> - * Note we don't disable preemption, so this code might run on the wrong
> - * CPU. In this case the event is picked up by the scheduled work queue.
> - * This is merely a fast path to expedite processing in some common
> - * cases.
>   */
> -void mce_notify_process(void)
> +static void mce_do_notify(struct user_return_notifier *urn)
>  {
> -	unsigned long pfn;
> -	mce_notify_irq();
> -	while (mce_ring_get(&pfn))
> -		memory_failure(pfn, MCE_VECTOR);
> +	struct mce_notify *np = container_of(urn, struct mce_notify, urn);
> +
> +	user_return_notifier_unregister(urn);
> +	np->registered = 0;
> +
> +	mce_process_ring();
>  }

Now I'm reconsidering the MCE event notification mechanism.
One of something nervous is whether it is really required to process
"_AO" memory poisoning (i.e. mce_process_ring()) here in a process
context that unfortunately interrupted by MCE (or preempted after that).
I'm uncertain how long walking though the task_list for unmap will takes,
and not sure it is acceptable if the unlucky thread is a kind of latency
sensitive...

If we can move mce_process_ring() to worker thread completely, what
we have to do will be:
 1) from NMI context, request non-NMI context by irq_work()
 2) from (irq) context, wake up loggers and schedule work if required
 3) from worker thread, process "_AO" memory poisoning etc.

So now question is why user_return_notifier is needed here.
Is it just an alternative of irq_work() for !LOCAL_APIC ?


Thanks,
H.Seto

>  
>  static void mce_process_work(struct work_struct *dummy)
>  {
> -	mce_notify_process();
> +	mce_process_ring();
>  }
>  
>  #ifdef CONFIG_X86_MCE_INTEL
> @@ -1218,8 +1241,6 @@ int mce_notify_irq(void)
>  	/* Not more than two messages every minute */
>  	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
>  
> -	clear_thread_flag(TIF_MCE_NOTIFY);
> -
>  	if (test_and_clear_bit(0, &mce_need_notify)) {
>  		wake_up_interruptible(&mce_wait);
>  
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 4fd173c..44efc22 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -838,12 +838,6 @@ static void do_signal(struct pt_regs *regs)
>  void
>  do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
>  {
> -#ifdef CONFIG_X86_MCE
> -	/* notify userspace of pending MCEs */
> -	if (thread_info_flags & _TIF_MCE_NOTIFY)
> -		mce_notify_process();
> -#endif /* CONFIG_X86_64 && CONFIG_X86_MCE */
> -
>  	/* deal with pending signal delivery */
>  	if (thread_info_flags & _TIF_SIGPENDING)
>  		do_signal(regs);


  reply	other threads:[~2011-06-10  8:08 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09 21:25 [RFC] reworked machine check recovery patches Luck, Tony
2011-06-09 21:29 ` [PATCH 01/10] MCE: fixes for mce severity table Luck, Tony
2011-06-09 21:30 ` [PATCH 02/10] MCE: save most severe error information Luck, Tony
2011-06-10  8:06   ` Hidetoshi Seto
2011-06-10 18:08     ` Tony Luck
2011-06-09 21:31 ` [PATCH 03/10] MCE: introduce mce_gather_info() Luck, Tony
2011-06-09 21:32 ` [PATCH 04/10] MCE: Move ADDR/MISC reading code into common function Luck, Tony
2011-06-10  9:33   ` Borislav Petkov
2011-06-10 18:17     ` Tony Luck
2011-06-09 21:33 ` [PATCH 05/10] MCE: Mask out address mask bits below address granuality Luck, Tony
2011-06-10  8:07   ` Hidetoshi Seto
2011-06-10  9:46     ` Borislav Petkov
2011-06-10 19:06     ` Tony Luck
2011-06-11  0:12       ` Andi Kleen
2011-06-10  9:42   ` Borislav Petkov
2011-06-10 19:09     ` Tony Luck
2011-06-09 21:34 ` [PATCH 06/10] HWPOISON: Handle hwpoison in current process Luck, Tony
2011-06-10  8:07   ` Hidetoshi Seto
2011-06-10 20:36     ` Tony Luck
2011-06-09 21:35 ` [PATCH 07/10] MCE: replace mce.c use of TIF_MCE_NOTIFY with user_return_notifier Luck, Tony
2011-06-10  8:08   ` Hidetoshi Seto [this message]
2011-06-10 20:42     ` Tony Luck
2011-06-11 10:24       ` Borislav Petkov
2011-06-12  8:31       ` Avi Kivity
2011-06-12  8:29   ` Avi Kivity
2011-06-12 10:24     ` Borislav Petkov
2011-06-12 10:30       ` Avi Kivity
2011-06-12 13:53         ` Borislav Petkov
2011-06-09 21:36 ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Luck, Tony
2011-06-12 22:38   ` Borislav Petkov
2011-06-13  5:31     ` Tony Luck
2011-06-13  7:59       ` Avi Kivity
2011-06-13  9:55         ` Borislav Petkov
2011-06-13 11:40           ` Avi Kivity
2011-06-13 12:40             ` Borislav Petkov
2011-06-13 12:47               ` Avi Kivity
2011-06-13 15:12                 ` Borislav Petkov
2011-06-13 16:31                   ` Avi Kivity
2011-06-13 17:13                     ` Tony Luck
2011-06-14  2:50                       ` Hidetoshi Seto
2011-06-14  2:51                         ` [PATCH 1/2] x86, mce: introduce mce_memory_failure_process Hidetoshi Seto
2011-06-14  2:53                         ` [PATCH 2/2] x86, mce: rework use of TIF_MCE_NOTIFY Hidetoshi Seto
2011-06-14 18:02                           ` Tony Luck
2011-06-14 18:28                             ` Tony Luck
2011-06-15  1:29                               ` Hidetoshi Seto
2011-06-15  2:10                                 ` Tony Luck
2011-06-15  3:17                                   ` Hidetoshi Seto
2011-06-14  3:09                         ` [PATCH 08/10] NOTIFIER: Take over TIF_MCE_NOTIFY and implement task return notifier Tony Luck
2011-06-14 11:40                       ` Avi Kivity
2011-06-14 13:33                         ` Borislav Petkov
2011-06-14 13:43                           ` Avi Kivity
2011-06-14 17:13                             ` Luck, Tony
2011-06-15  8:51                               ` Avi Kivity
2011-06-14 16:59                         ` Luck, Tony
2011-06-15  8:52                           ` Avi Kivity
2011-06-13 16:43               ` Tony Luck
2011-06-09 21:37 ` [PATCH 09/10] MCE: run through processors with more severe problems first Luck, Tony
2011-06-10  8:09   ` Hidetoshi Seto
2011-06-10 20:49     ` Tony Luck
2011-06-13 22:03       ` Tony Luck
2011-06-14  1:27         ` Hidetoshi Seto
2011-06-14  3:04           ` Tony Luck
2011-06-09 21:38 ` [PATCH 10/10] MCE: Add Action-Required support Luck, Tony
2011-06-10  8:06   ` Hidetoshi Seto
2011-06-10 21:00     ` Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DF1D0EF.7090109@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=avi@redhat.com \
    --cc=bp@amd64.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.