All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: Tony Luck <tony.luck@intel.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	"Huang, Ying" <ying.huang@intel.com>,
	Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [PATCH 5/6] x86, mce: handle "action required" errors
Date: Wed, 14 Dec 2011 17:04:10 +0100	[thread overview]
Message-ID: <20111214160409.GG23589@aftab> (raw)
In-Reply-To: <80cbf65ae6e4bd610523cc8568b0c2dcb8c629b6.1323803130.git.tony.luck@intel.com>

On Mon, Dec 12, 2011 at 01:47:45PM -0800, Tony Luck wrote:
[..]

> - * Called after mce notification in process context. This code
> - * is allowed to sleep. Call the high level VM handler to process
> - * any corrupted pages.
> - * Assume that the work queue code only calls this one at a time
> - * per CPU.
> - * Note we don't disable preemption, so this code might run on the wrong
> - * CPU. In this case the event is picked up by the scheduled work queue.
> - * This is merely a fast path to expedite processing in some common
> - * cases.
> + * Called in process context that interrupted by MCE and marked with
> + * TIF_MCE_NOTFY, just before returning to errorneous userland.
> + * This code is allowed to sleep.
> + * Attempt possible recovery such as calling the high level VM handler to
> + * process any corrupted pages, and kill/signal current process if required.
>   */
>  void mce_notify_process(void)
>  {
> +	__u64	paddr = paddr;
>  	unsigned long pfn;
> -	mce_notify_irq();
> -	while (mce_ring_get(&pfn))
> -		memory_failure(pfn, MCE_VECTOR, 0);
> +
> +	if (!mce_find_info(&paddr))
> +		mce_panic("Lost address", NULL, NULL);

Wouldn't it be good to return struct mce_info *mi here in addition to
&paddr...

> +	pfn = paddr >> PAGE_SHIFT;
> +
> +	clear_thread_flag(TIF_MCE_NOTIFY);
> +
> +	pr_err("Uncorrected hardware memory error in user-access at %llx",
> +		 paddr);
> +	if (memory_failure(pfn, MCE_VECTOR, MF_ACTION_REQUIRED) < 0) {
> +		pr_err("Memory error not recovered");
> +		force_sig(SIGBUS, current);
> +	} else {
> +		pr_err("Memory error recovered");
> +		mce_clear_info();

so that you don't need to iterate again over the mce_info array but do:

	mce_clear_info(mi);

?

This assumes, of course, that you have only one AR MCE per task, per
return to userspace. I guess this is fine for now.

> +	}
>  }
>  
>  static void mce_process_work(struct work_struct *dummy)
>  {
> -	mce_notify_process();
> +	unsigned long pfn;
> +
> +	while (mce_ring_get(&pfn))
> +		memory_failure(pfn, MCE_VECTOR, 0);
>  }
>  
>  #ifdef CONFIG_X86_MCE_INTEL
> @@ -1232,8 +1246,6 @@ int mce_notify_irq(void)
>  	/* Not more than two messages every minute */
>  	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
>  
> -	clear_thread_flag(TIF_MCE_NOTIFY);
> -
>  	if (test_and_clear_bit(0, &mce_need_notify)) {
>  		/* wake processes polling /dev/mcelog */
>  		wake_up_interruptible(&mce_chrdev_wait);
> -- 
> 1.7.3.1
> 

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

  parent reply	other threads:[~2011-12-14 16:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-13 19:05 [PATCH 0/6] x86, mce: machine check recovery for applications Tony Luck
2011-12-08 22:49 ` [PATCH 6/6] x86, mce: Recognise machine check bank signature for data path error Tony Luck
2011-12-14 15:47   ` Borislav Petkov
2011-12-12 21:06 ` [PATCH 4/6] x86, mce: Add mechanism to safely save information in MCE handler Tony Luck
2011-12-14  7:52   ` Ingo Molnar
2011-12-12 21:47 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-14  9:28   ` Chen Gong
2011-12-14 21:30     ` Tony Luck
2011-12-15  2:56       ` Chen Gong
2011-12-14 16:04   ` Borislav Petkov [this message]
2011-12-14 19:05     ` Luck, Tony
2011-12-13 17:24 ` [PATCH 1/6] HWPOISON: clean up memory_failure() vs. __memory_failure() Tony Luck
2011-12-14  7:47   ` Ingo Molnar
2011-12-14 16:07     ` Borislav Petkov
2011-12-14 16:55       ` Ingo Molnar
2011-12-14 17:21         ` Luck, Tony
2011-12-15  6:44           ` Ingo Molnar
2011-12-15 18:05             ` Tony Luck
2011-12-15 18:09               ` Ingo Molnar
2011-12-13 17:27 ` [PATCH 2/6] HWPOISON: Add code to handle "action required" errors Tony Luck
2011-12-13 17:48 ` [PATCH 3/6] x86, mce: create helper function to save addr/misc when needed Tony Luck
2011-12-16  0:13   ` Hidetoshi Seto
  -- strict thread matches above, loose matches on Subject: below --
2011-12-15 19:59 [PATCH 0/6] x86, mce: machine check recovery for applications [updated] Tony Luck
2011-12-15 19:02 ` [PATCH 5/6] x86, mce: handle "action required" errors Tony Luck
2011-12-16  0:14   ` Hidetoshi Seto
2011-12-16  0:29     ` Tony Luck
2011-12-16  0:51     ` Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111214160409.GG23589@aftab \
    --to=bp@amd64.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.