public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: hpa@zytor.com, linux-kernel@vger.kernel.org, mingo@elte.hu,
	tglx@linutronix.de
Subject: Re: [PATCH] [20/28] x86: MCE: Switch x86 machine check handler to Monarch election.
Date: Fri, 17 Apr 2009 20:24:03 +0900	[thread overview]
Message-ID: <49E866D3.1020003@jp.fujitsu.com> (raw)
In-Reply-To: <20090407150803.1AF0C1D046E@basil.firstfloor.org>

Andi Kleen wrote:
> +/*
> + * Check if a timeout waiting for other CPUs happened.
> + */
> +static int mce_timed_out(u64 *t)
> +{
> +	/*
> +	 * The others already did panic for some reason.
> +	 * Bail out like in a timeout.
> +	 * rmb() to tell the compiler that system_state
> +	 * might have been modified by someone else.
> +	 */
> +	rmb();
> +	if (atomic_read(&mce_paniced))
> +		wait_for_panic();
> +	if (!monarch_timeout)
> +		goto out;
> +	if ((s64)*t < SPINUNIT) {
> +		/* CHECKME: Make panic default for 1 too? */
> +		if (tolerant < 1)
> +			mce_panic("Timeout synchronizing machine check over CPUs",
> +				  NULL, NULL);

Assuming that if we came here from mce_start() and panic, then I suppose no mce
log would be appeared on the console since no cpu have invoked mce_log(&m) yet.
Is it expected behavior?

> +		cpu_missing = 1;
> +		return 1;
> +	}
> +	*t -= SPINUNIT;
> +out:
> +	touch_nmi_watchdog();
> +	return 0;
> +}
(snip)
> +/*
> + * Start of Monarch synchronization. This waits until all CPUs have
> + * entered the ecception handler and then determines if any of them
                  ^^^^^^^^^
                  exception

> + * saw a fatal event that requires panic. Then it executes them
> + * in the entry order.
> + * TBD double check parallel CPU hotunplug
> + */
> +static int mce_start(int no_way_out, int *order)
> +{
> +	int nwo;
> +	int cpus = num_online_cpus();
> +	static atomic_t global_nwo;
> +	u64 timeout = (u64)monarch_timeout * NSEC_PER_USEC;
> +
> +	if (!timeout) {
> +		*order = -1;
> +		return no_way_out;
> +	}
> +
> +	atomic_add(no_way_out, &global_nwo);
> +
> +	/*
> +	 * Wait for everyone.
> +	 */
> +	while (atomic_read(&mce_callin) != cpus) {
> +		if (mce_timed_out(&timeout)) {
> +			atomic_set(&global_nwo, 0);
> +			*order = -1;
> +			return no_way_out;
> +		}
> +		ndelay(SPINUNIT);
> +	}
> +
> +	/*
> +	 * Cache the global no_way_out state.
> +	 */
> +	nwo = atomic_read(&global_nwo);
> +
> +	/*
> +	 * Monarch starts executing now, the others wait.
> +	 */
> +	if (*order == 1) {
> +		atomic_set(&global_nwo, 0);

Monarch should clear global_nwo after all Subjects have read it.
Or it should be cleared by last Subject instead.

> +		atomic_set(&mce_executing, 1);
> +		return nwo;
> +	}
> +
> +	/*
> +	 * Now start the scanning loop one by one
> +	 * in the original callin order.
> +	 * This way when there are any shared banks it will
> +	 * be only seen by one CPU before cleared, avoiding duplicates.
> +	 */
> +	while (atomic_read(&mce_executing) < *order) {
> +		if (mce_timed_out(&timeout)) {
> +			atomic_set(&global_nwo, 0);
> +			*order = -1;
> +			return no_way_out;
> +		}
> +		ndelay(SPINUNIT);
> +	}
> +	return nwo;
> +}

Thanks,
H.Seto


  reply	other threads:[~2009-04-17 11:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-07 15:07 [PATCH] [0/28] x86: MCE: Feature series for 2.6.31 Andi Kleen
2009-04-07 15:07 ` [PATCH] [1/28] x86: Fix panic with interrupts off (needed for MCE) Andi Kleen
2009-04-20  0:26   ` Hidetoshi Seto
2009-04-20  5:36     ` Andi Kleen
2009-04-07 15:07 ` [PATCH] [2/28] x86: MCE: Synchronize core after machine check handling Andi Kleen
2009-04-07 15:07 ` [PATCH] [3/28] x86: MCE: Remove assumption that RIP MSR is exact Andi Kleen
2009-04-07 15:07 ` [PATCH] [4/28] x86: MCE: Use symbolic macros to access MCG_CAP register Andi Kleen
2009-04-07 15:07 ` [PATCH] [5/28] x86: MCE: Use extended sysattrs for the check_interval attribute Andi Kleen
2009-04-07 15:07 ` [PATCH] [6/28] x86: MCE: Add machine check exception count in /proc/interrupts Andi Kleen
2009-04-08  5:00   ` Hidetoshi Seto
2009-04-08  9:56     ` Andi Kleen
2009-04-07 15:07 ` [PATCH] [7/28] x86: MCE: Log corrected errors when panicing Andi Kleen
2009-04-07 15:07 ` [PATCH] [8/28] x86: MCE: Remove unused mce_events variable Andi Kleen
2009-04-07 15:07 ` [PATCH] [9/28] x86: MCE: Remove machine check handler idle notify on 64bit Andi Kleen
2009-04-07 15:07 ` [PATCH] [10/28] x86: MCE: Remove oops_begin() use in 64bit machine check Andi Kleen
2009-04-07 15:07 ` [PATCH] [11/28] x86: MCE: Remove mce_init unused argument Andi Kleen
2009-04-07 15:07 ` [PATCH] [12/28] x86: MCE: Rename and align out2 label Andi Kleen
2009-04-07 15:07 ` [PATCH] [13/28] x86: MCE: Implement bootstrapping for machine check wakeups Andi Kleen
2009-04-07 15:07 ` [PATCH] [14/28] x86: MCE: Add MSR read wrappers for easier error injection Andi Kleen
2009-04-17 11:23   ` Hidetoshi Seto
2009-04-17 13:00     ` Andi Kleen
2009-04-17 23:55       ` H. Peter Anvin
2009-04-07 15:07 ` [PATCH] [15/28] x86: MCE: Remove TSC print heuristic Andi Kleen
2009-04-07 15:07 ` [PATCH] [16/28] x86: MCE: Drop BKL in mce_open Andi Kleen
2009-04-07 15:07 ` [PATCH] [17/28] x86: MCE: Add table driven machine check grading Andi Kleen
2009-04-07 15:08 ` [PATCH] [18/28] x86: MCE: Check early in exception handler if panic is needed Andi Kleen
2009-04-07 15:08 ` [PATCH] [19/28] x86: MCE: Implement panic synchronization Andi Kleen
2009-04-07 15:08 ` [PATCH] [20/28] x86: MCE: Switch x86 machine check handler to Monarch election Andi Kleen
2009-04-17 11:24   ` Hidetoshi Seto [this message]
2009-04-17 13:09     ` Andi Kleen
2009-04-17 13:53       ` [PATCH] [20/28] x86: MCE: Switch x86 machine check handler to Monarch election. II Andi Kleen
2009-04-07 15:08 ` [PATCH] [21/28] x86: MCE: Store record length into memory struct mce anchor Andi Kleen
2009-04-07 15:08 ` [PATCH] [22/28] x86: MCE: Default to panic timeout for machine checks Andi Kleen
2009-04-17 11:24   ` Hidetoshi Seto
2009-04-17 13:12     ` Andi Kleen
2009-04-07 15:08 ` [PATCH] [23/28] x86: MCE: Improve documentation Andi Kleen
2009-04-08  5:12   ` Hidetoshi Seto
2009-04-07 15:08 ` [PATCH] [24/28] x86: MCE: Support more than 256 CPUs in struct mce Andi Kleen
2009-04-07 15:08 ` [PATCH] [25/28] x86: MCE: Extend struct mce user interface with more information Andi Kleen
2009-04-07 15:08 ` [PATCH] [26/28] Export add_timer_on for modules Andi Kleen
2009-04-07 15:08 ` [PATCH] [27/28] MCE: Add basic error injection infrastructure Andi Kleen
2009-04-07 15:08 ` [PATCH] [28/28] x86: MCE: Implement new status bits Andi Kleen
2009-04-17 11:24   ` Hidetoshi Seto
2009-04-17 13:17     ` Andi Kleen
2009-04-17 11:24 ` [PATCH] [0/28] x86: MCE: Feature series for 2.6.31 Hidetoshi Seto
2009-04-17 13:28   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E866D3.1020003@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox