stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jonghwan Choi <jhbird.choi@samsung.com>
Subject: [ 36/74] xen/smp: initialize IPI vectors before marking CPU online
Date: Mon, 26 Aug 2013 18:08:34 -0700	[thread overview]
Message-ID: <20130827010432.247202762@linuxfoundation.org> (raw)
In-Reply-To: <20130827010424.535365435@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chuck Anderson <chuck.anderson@oracle.com>

commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 upstream.

An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

	kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period.  It needs to send the CPU a reschedule IPI if it is not
offline.  rcu_implicit_offline_qs() does this check:

	/*
	 * If the CPU is offline, it is in a quiescent state.  We can
	 * trust its state not to change because interrupts are disabled.
	 */
	if (cpu_is_offline(rdp->cpu)) {
		rdp->offline_fqs++;
		return 1;
	}

	Else the CPU is online.  Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()).  See start_secondary():

	set_cpu_online(smp_processor_id(), true);
	...
	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

	/*
	 * Wait until the cpu which brought this one up marked it
	 * online before enabling interrupts. If we don't do that then
	 * we can end up waking up the softirq thread before this cpu
	 * reached the active state, which makes the scheduler unhappy
	 * and schedule the softirq thread on the wrong cpu. This is
	 * only observable with forced threaded interrupts, but in
	 * theory it could also happen w/o them. It's just way harder
	 * to achieve.
	 */
	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
		cpu_relax();

	/* enable local interrupts */
	local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

	kernel BUG at drivers/xen/events.c:1328!

	xen_send_IPI_one()
	xen_smp_send_reschedule()
	rcu_implicit_offline_qs()
	rcu_implicit_dynticks_qs()
	force_qs_rnp()
	force_quiescent_state()
	__rcu_process_callbacks()
	rcu_process_callbacks()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

	xen_smp_send_reschedule()
	wake_up_idle_cpu()
	add_timer_on()
	clocksource_watchdog()
	call_timer_fn()
	run_timer_softirq()
	__do_softirq()
	call_softirq()
	do_softirq()
	irq_exit()
	xen_evtchn_do_upcall()
	xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

	/*
	 * Cycle through CPUs to check if the CPUs stay synchronized
	 * to each other.
	 */
	next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
	if (next_cpu >= nr_cpu_ids)
		next_cpu = cpumask_first(cpu_online_mask);
	watchdog_timer.expires += WATCHDOG_INTERVAL;
	add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

   It is incorrect to send an IPI to an offline CPU (cpu not present in
   the cpu_online_mask). There are numerous places where we check this
   and warn/complain.

2. At the IPI receiver side:

   It is incorrect to let the world know of our presence (by setting
   ourselves in global bitmasks) until our initialization steps are complete
   to such an extent that we can handle the consequences (such as
   receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/xen/smp.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -667,8 +667,15 @@ static void __init xen_hvm_smp_prepare_c
 static int __cpuinit xen_hvm_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int rc;
-	rc = native_cpu_up(cpu, tidle);
-	WARN_ON (xen_smp_intr_init(cpu));
+	/*
+	 * xen_smp_intr_init() needs to run before native_cpu_up()
+	 * so that IPI vectors are set up on the booting CPU before
+	 * it is marked online in native_cpu_up().
+	*/
+	rc = xen_smp_intr_init(cpu);
+	WARN_ON(rc);
+	if (!rc)
+		rc =  native_cpu_up(cpu, tidle);
 	return rc;
 }
 



  parent reply	other threads:[~2013-08-27  1:08 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-27  1:07 [ 00/74] 3.10.10-stable review Greg Kroah-Hartman
2013-08-27  1:07 ` [ 01/74] KVM: s390: move kvm_guest_enter,exit closer to sie Greg Kroah-Hartman
2013-08-27  1:08 ` [ 02/74] mac80211: dont wait for TX status forever Greg Kroah-Hartman
2013-08-27  1:08 ` [ 03/74] ACPI: add _STA evaluation at do_acpi_find_child() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 04/74] ACPI: Try harder to resolve _ADR collisions for bridges Greg Kroah-Hartman
2013-08-27  1:08 ` [ 05/74] ARC: gdbserver breakage in Big-Endian configuration #1 Greg Kroah-Hartman
2013-08-27  1:08 ` [ 06/74] ARC: gdbserver breakage in Big-Endian configuration #2 Greg Kroah-Hartman
2013-08-27  1:08 ` [ 07/74] ARM: at91: at91sam9x5 RTC is not compatible with at91rm9200 one Greg Kroah-Hartman
2013-08-27  1:08 ` [ 08/74] NFC: llcp: Fix non blocking sockets connections Greg Kroah-Hartman
2013-08-27  1:08 ` [ 09/74] iwlwifi: mvm: correctly configure MCAST in AP mode Greg Kroah-Hartman
2013-08-27  1:08 ` [ 10/74] iwlwifi: mvm: fix " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 11/74] iwlwifi: mvm: properly tell the fw that a STA is awake Greg Kroah-Hartman
2013-08-27  1:08 ` [ 12/74] iwlwifi: mvm: dont set the MCAST queue in STAs queue list Greg Kroah-Hartman
2013-08-27  1:08 ` [ 13/74] iwlwifi: mvm: take the seqno from packet if transmit failed Greg Kroah-Hartman
2013-08-27  1:08 ` [ 14/74] iwlwifi: mvm: unregister leds when registration failed Greg Kroah-Hartman
2013-08-27  1:08 ` [ 15/74] iwlwifi: bump required firmware API version for 3160/7260 Greg Kroah-Hartman
2013-08-27  1:08 ` [ 16/74] iwlwifi: mvm: adjust firmware D3 configuration API Greg Kroah-Hartman
2013-08-27  1:08 ` [ 17/74] tracing: Do not call kmem_cache_free() on allocation failure Greg Kroah-Hartman
2013-08-27  1:08 ` [ 18/74] tracing/kprobe: Wait for disabling all running kprobe handlers Greg Kroah-Hartman
2013-08-27  1:08 ` [ 19/74] tracing: Introduce trace_create_cpu_file() and tracing_get_cpu() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 20/74] tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 21/74] tracing: Change tracing_buffers_fops " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 22/74] tracing: Change tracing_stats_fops " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 23/74] tracing: Change tracing_entries_fops " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 24/74] tracing: Change tracing_fops/snapshot_fops " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 25/74] ftrace: Add check for NULL regs if ops has SAVE_REGS set Greg Kroah-Hartman
2013-08-27  1:08 ` [ 26/74] tracing: Turn event/id->i_private into call->event.type Greg Kroah-Hartman
2013-08-27  1:08 ` [ 27/74] tracing: Change event_enable/disable_read() to verify i_private != NULL Greg Kroah-Hartman
2013-08-27  1:08 ` [ 28/74] tracing: Change event_filter_read/write " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 29/74] tracing: Change f_start() to take event_mutex and " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 30/74] tracing: Introduce remove_event_file_dir() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 31/74] tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private Greg Kroah-Hartman
2013-08-27  1:08 ` [ 32/74] tracing: trace_remove_event_call() should fail if call/file is in use Greg Kroah-Hartman
2013-08-27  1:08 ` [ 33/74] tracing/kprobes: Fail to unregister if probe event files are " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 34/74] tracing/uprobes: " Greg Kroah-Hartman
2013-08-27  1:08 ` [ 35/74] ftrace: Check module functions being traced on reload Greg Kroah-Hartman
2013-08-27  1:08 ` Greg Kroah-Hartman [this message]
2013-08-27  1:08 ` [ 37/74] ARC: [lib] strchr breakage in Big-endian configuration Greg Kroah-Hartman
2013-08-27  1:08 ` [ 38/74] zd1201: do not use stack as URB transfer_buffer Greg Kroah-Hartman
2013-08-27  1:08 ` [ 39/74] VFS: collect_mounts() should return an ERR_PTR Greg Kroah-Hartman
2013-08-27  1:08 ` [ 40/74] x86: Dont clear olpc_ofw_header when sentinel is detected Greg Kroah-Hartman
2013-08-27  1:08 ` [ 41/74] xen/events: initialize local per-cpu mask for all possible events Greg Kroah-Hartman
2013-08-27  1:08 ` [ 42/74] xen/events: mask events when changing their VCPU binding Greg Kroah-Hartman
2013-08-27  1:08 ` [ 43/74] ARM: davinci: nand: specify ecc strength Greg Kroah-Hartman
2013-08-27  1:08 ` [ 44/74] ARM: at91/DT: fix at91sam9n12ek memory node Greg Kroah-Hartman
2013-08-27  1:08 ` [ 45/74] arm64: perf: fix array out of bounds access in armpmu_map_hw_event() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 46/74] arm64: perf: fix event validation for software group leaders Greg Kroah-Hartman
2013-08-27  1:08 ` [ 47/74] ARM: 7816/1: CONFIG_KUSER_HELPERS: fix help text Greg Kroah-Hartman
2013-08-27  1:08 ` [ 48/74] staging: comedi: bug-fix NULL pointer dereference on failed attach Greg Kroah-Hartman
2013-08-27  1:08 ` [ 49/74] drm/radeon/r7xx: fix copy paste typo in golden register setup Greg Kroah-Hartman
2013-08-27  1:08 ` [ 50/74] drm/radeon: fix UVD message buffer validation Greg Kroah-Hartman
2013-08-27  1:08 ` [ 51/74] drm/radeon: fix WREG32_OR macro setting bits in a register Greg Kroah-Hartman
2013-08-27  1:08 ` [ 52/74] drm/i915: Invalidate TLBs for the rings after a reset Greg Kroah-Hartman
2013-08-27  1:08 ` [ 53/74] of: fdt: fix memory initialization for expanded DT Greg Kroah-Hartman
2013-08-27  1:08 ` [ 54/74] nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error Greg Kroah-Hartman
2013-08-27  1:08 ` [ 55/74] nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection Greg Kroah-Hartman
2013-08-27  1:08 ` [ 56/74] drivers/platform/olpc/olpc-ec.c: initialise earlier Greg Kroah-Hartman
2013-08-27  1:08 ` [ 57/74] usb: phy: fix build breakage Greg Kroah-Hartman
2013-08-27  1:08 ` [ 58/74] sata_fsl: save irqs while coalescing Greg Kroah-Hartman
2013-08-27  1:08 ` [ 59/74] Hostap: copying wrong data prism2_ioctl_giwaplist() Greg Kroah-Hartman
2013-08-27  1:08 ` [ 60/74] libata: apply behavioral quirks to sil3826 PMP Greg Kroah-Hartman
2013-08-27  1:08 ` [ 61/74] iwlwifi: dvm: fix calling ieee80211_chswitch_done() with NULL Greg Kroah-Hartman
2013-08-27  1:09 ` [ 62/74] iwlwifi: pcie: disable L1 Active after pci_enable_device Greg Kroah-Hartman
2013-08-27  1:09 ` [ 63/74] SCSI: zfcp: fix lock imbalance by reworking request queue locking Greg Kroah-Hartman
2013-08-27  1:09 ` [ 64/74] SCSI: zfcp: fix schedule-inside-lock in scsi_device list loops Greg Kroah-Hartman
2013-08-27  1:09 ` [ 65/74] SCSI: lpfc: Dont force CONFIG_GENERIC_CSUM on Greg Kroah-Hartman
2013-08-27  1:09 ` [ 66/74] SCSI: sg: Fix user memory corruption when SG_IO is interrupted by a signal Greg Kroah-Hartman
2013-08-27  1:09 ` [ 67/74] Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction" Greg Kroah-Hartman
2013-08-27  1:09 ` [ 68/74] x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member Greg Kroah-Hartman
2013-08-27  1:09 ` [ 69/74] x86/xen: do not identity map UNUSABLE regions in the machine E820 Greg Kroah-Hartman
2013-08-27  1:09 ` [ 70/74] mei: me: fix reset state machine Greg Kroah-Hartman
2013-08-27  1:09 ` [ 71/74] mei: dont have to clean the state on power up Greg Kroah-Hartman
2013-08-27  1:09 ` [ 72/74] mei: me: fix waiting for hw ready Greg Kroah-Hartman
2013-08-27  1:09 ` [ 73/74] md: bcache: io.c: fix a potential NULL pointer dereference Greg Kroah-Hartman
2013-08-27  1:09 ` [ 74/74] bcache: FUA fixes Greg Kroah-Hartman
2013-08-27  4:31 ` [ 00/74] 3.10.10-stable review Guenter Roeck
2013-08-27 22:27   ` Greg Kroah-Hartman
2013-08-27 20:41 ` Shuah Khan
2013-08-27 22:27   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130827010432.247202762@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=jhbird.choi@samsung.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).