All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Prarit Bhargava <prarit@redhat.com>,
	"H. Peter Anvin" <hpa@linux.intel.com>
Subject: [17/67] x86, mtrr: Use stop machine context to rendezvous all the cpus
Date: Wed, 11 Aug 2010 17:05:32 -0700	[thread overview]
Message-ID: <20100812000614.210307305@clark.site> (raw)
In-Reply-To: <20100812000641.GA6348@kroah.com>

2.6.35-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Suresh Siddha <suresh.b.siddha@intel.com>

commit 68f202e4e87cfab4439568bf397fcc5c7cf8d729 upstream.

Use the stop machine context rather than IPI's to rendezvous all the cpus for
MTRR initialization that happens during cpu bringup or for MTRR modifications
during runtime.

This avoids deadlock scenario (reported by Prarit) like:

cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
cpu B waits for the same lock with irqs disabled using write_lock_irq
cpu C doing set_mtrr() (during AP bringup for example), which will try to
rendezvous all the cpus using IPI's

This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the lock and thus not
reaching the rendezvous point.

Using stop cpu (run in the process context of per cpu based keventd) to do
this rendezvous, avoids this deadlock scenario.

Also make sure all the cpu's are in the rendezvous handler before we proceed
with the local_irq_save() on each cpu. This lock step disabling irqs on all
the cpus will avoid other deadlock scenarios (for example involving
with the blocking smp_call_function's etc).

   [ This problem is very old. Marking -stable only for 2.6.35 as the
     stop_one_cpu_nowait() API is present only in 2.6.35. Any older
     kernel interested in this fix need to do some more work in backporting
     this patch. ]

Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/x86/kernel/cpu/mtrr/main.c |   56 ++++++++++++++++++++++++++++++----------
 arch/x86/kernel/smpboot.c       |    7 +++++
 2 files changed, 50 insertions(+), 13 deletions(-)

--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -35,6 +35,7 @@
 
 #include <linux/types.h> /* FIXME: kvm_para.h needs this */
 
+#include <linux/stop_machine.h>
 #include <linux/kvm_para.h>
 #include <linux/uaccess.h>
 #include <linux/module.h>
@@ -143,22 +144,28 @@ struct set_mtrr_data {
 	mtrr_type	smp_type;
 };
 
+static DEFINE_PER_CPU(struct cpu_stop_work, mtrr_work);
+
 /**
- * ipi_handler - Synchronisation handler. Executed by "other" CPUs.
+ * mtrr_work_handler - Synchronisation handler. Executed by "other" CPUs.
  * @info: pointer to mtrr configuration data
  *
  * Returns nothing.
  */
-static void ipi_handler(void *info)
+static int mtrr_work_handler(void *info)
 {
 #ifdef CONFIG_SMP
 	struct set_mtrr_data *data = info;
 	unsigned long flags;
 
+	atomic_dec(&data->count);
+	while (!atomic_read(&data->gate))
+		cpu_relax();
+
 	local_irq_save(flags);
 
 	atomic_dec(&data->count);
-	while (!atomic_read(&data->gate))
+	while (atomic_read(&data->gate))
 		cpu_relax();
 
 	/*  The master has cleared me to execute  */
@@ -173,12 +180,13 @@ static void ipi_handler(void *info)
 	}
 
 	atomic_dec(&data->count);
-	while (atomic_read(&data->gate))
+	while (!atomic_read(&data->gate))
 		cpu_relax();
 
 	atomic_dec(&data->count);
 	local_irq_restore(flags);
 #endif
+	return 0;
 }
 
 static inline int types_compatible(mtrr_type type1, mtrr_type type2)
@@ -198,7 +206,7 @@ static inline int types_compatible(mtrr_
  *
  * This is kinda tricky, but fortunately, Intel spelled it out for us cleanly:
  *
- * 1. Send IPI to do the following:
+ * 1. Queue work to do the following on all processors:
  * 2. Disable Interrupts
  * 3. Wait for all procs to do so
  * 4. Enter no-fill cache mode
@@ -215,14 +223,17 @@ static inline int types_compatible(mtrr_
  * 15. Enable interrupts.
  *
  * What does that mean for us? Well, first we set data.count to the number
- * of CPUs. As each CPU disables interrupts, it'll decrement it once. We wait
- * until it hits 0 and proceed. We set the data.gate flag and reset data.count.
- * Meanwhile, they are waiting for that flag to be set. Once it's set, each
+ * of CPUs. As each CPU announces that it started the rendezvous handler by
+ * decrementing the count, We reset data.count and set the data.gate flag
+ * allowing all the cpu's to proceed with the work. As each cpu disables
+ * interrupts, it'll decrement data.count once. We wait until it hits 0 and
+ * proceed. We clear the data.gate flag and reset data.count. Meanwhile, they
+ * are waiting for that flag to be cleared. Once it's cleared, each
  * CPU goes through the transition of updating MTRRs.
  * The CPU vendors may each do it differently,
  * so we call mtrr_if->set() callback and let them take care of it.
  * When they're done, they again decrement data->count and wait for data.gate
- * to be reset.
+ * to be set.
  * When we finish, we wait for data.count to hit 0 and toggle the data.gate flag
  * Everyone then enables interrupts and we all continue on.
  *
@@ -234,6 +245,9 @@ set_mtrr(unsigned int reg, unsigned long
 {
 	struct set_mtrr_data data;
 	unsigned long flags;
+	int cpu;
+
+	preempt_disable();
 
 	data.smp_reg = reg;
 	data.smp_base = base;
@@ -246,10 +260,15 @@ set_mtrr(unsigned int reg, unsigned long
 	atomic_set(&data.gate, 0);
 
 	/* Start the ball rolling on other CPUs */
-	if (smp_call_function(ipi_handler, &data, 0) != 0)
-		panic("mtrr: timed out waiting for other CPUs\n");
+	for_each_online_cpu(cpu) {
+		struct cpu_stop_work *work = &per_cpu(mtrr_work, cpu);
+
+		if (cpu == smp_processor_id())
+			continue;
+
+		stop_one_cpu_nowait(cpu, mtrr_work_handler, &data, work);
+	}
 
-	local_irq_save(flags);
 
 	while (atomic_read(&data.count))
 		cpu_relax();
@@ -259,6 +278,16 @@ set_mtrr(unsigned int reg, unsigned long
 	smp_wmb();
 	atomic_set(&data.gate, 1);
 
+	local_irq_save(flags);
+
+	while (atomic_read(&data.count))
+		cpu_relax();
+
+	/* Ok, reset count and toggle gate */
+	atomic_set(&data.count, num_booting_cpus() - 1);
+	smp_wmb();
+	atomic_set(&data.gate, 0);
+
 	/* Do our MTRR business */
 
 	/*
@@ -279,7 +308,7 @@ set_mtrr(unsigned int reg, unsigned long
 
 	atomic_set(&data.count, num_booting_cpus() - 1);
 	smp_wmb();
-	atomic_set(&data.gate, 0);
+	atomic_set(&data.gate, 1);
 
 	/*
 	 * Wait here for everyone to have seen the gate change
@@ -289,6 +318,7 @@ set_mtrr(unsigned int reg, unsigned long
 		cpu_relax();
 
 	local_irq_restore(flags);
+	preempt_enable();
 }
 
 /**
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -816,6 +816,13 @@ do_rest:
 			if (cpumask_test_cpu(cpu, cpu_callin_mask))
 				break;	/* It has booted */
 			udelay(100);
+			/*
+			 * Allow other tasks to run while we wait for the
+			 * AP to come online. This also gives a chance
+			 * for the MTRR work(triggered by the AP coming online)
+			 * to be completed in the stop machine context.
+			 */
+			schedule();
 		}
 
 		if (cpumask_test_cpu(cpu, cpu_callin_mask))



  parent reply	other threads:[~2010-08-12  0:18 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-12  0:06 [00/67] 2.6.35.2 stable review Greg KH
2010-08-12  0:05 ` [01/67] x86, vmware: Preset lpj values when on VMware Greg KH
2010-08-12  0:05 ` [02/67] ata_piix: fix locking around SIDPR access Greg KH
2010-08-12  0:05 ` [03/67] perf, powerpc: fsl_emb: Restore setting perf_sample_data.period Greg KH
2010-08-12  0:05 ` [04/67] powerpc: fix build with make 3.82 Greg KH
     [not found] ` <20100812000641.GA6348-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2010-08-12  0:05   ` [05/67] x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages Greg KH
2010-08-12  0:05     ` Greg KH
2010-08-12  0:05 ` [06/67] x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN Greg KH
2010-08-12  0:05 ` [07/67] pcmcia: avoid buffer overflow in pcmcia_setup_isa_irq Greg KH
2010-08-12  0:05 ` [08/67] x86: Add memory modify constraints to xchg() and cmpxchg() Greg KH
2010-08-12  0:05 ` [09/67] staging: rt2870: Add USB ID for Belkin F6D4050 v2 Greg KH
2010-08-12  0:05 ` [10/67] Staging: line6: needs to select SND_PCM Greg KH
2010-08-12  0:05 ` [11/67] Staging: panel: Prevent double-calling of parport_release - fix oops Greg KH
2010-08-12  0:05 ` [12/67] staging: hv: Fix Kconfig dependency of hv_blkvsc Greg KH
2010-08-12  0:05 ` [13/67] serial: add support for OX16PCI958 card Greg KH
2010-08-12  0:05 ` [14/67] PCI: Do not run NVidia quirks related to MSI with MSI disabled Greg KH
2010-08-12  0:05 ` [15/67] PCI: disable MSI on VIA K8M800 Greg KH
2010-08-12  0:05 ` [16/67] solos-pci: Fix race condition in tasklet RX handling Greg KH
2010-08-12  0:05 ` Greg KH [this message]
2010-08-12  0:05 ` [18/67] ALSA: hda - Add PC-beep whitelist for an Intel board Greg KH
2010-08-12  0:05 ` [19/67] Char: nozomi, fix tty->count counting Greg KH
2010-08-12  0:05 ` [20/67] Char: nozomi, set tty->driver_data appropriately Greg KH
2010-08-12  0:05 ` [21/67] mm: fix corruption of hibernation caused by reusing swap during image saving Greg KH
2010-08-12  0:05 ` [22/67] drivers/video/w100fb.c: ignore void return value / fix build failure Greg KH
2010-08-12  0:05 ` [23/67] iwlwifi: fix TX tracer Greg KH
2010-08-12  0:05 ` [24/67] rtl8180: avoid potential NULL deref in rtl8180_beacon_work Greg KH
2010-08-12  0:05 ` [25/67] ipmi: fix ACPI detection with regspacing Greg KH
2010-08-12  0:05 ` [26/67] ide-cd: Do not access completed requests in the irq handler Greg KH
2010-08-12  0:05 ` [27/67] md: move revalidate_disk() back outside open_mutex Greg KH
2010-08-12  0:05 ` [28/67] md: fix another deadlock with removing sysfs attributes Greg KH
2010-08-12  0:05 ` [29/67] md/raid10: fix deadlock with unaligned read during resync Greg KH
2010-08-12  0:05 ` [30/67] e100/e1000*/igb*/ixgb*: Add missing read memory barrier Greg KH
2010-08-12  0:05 ` [31/67] ioat2: catch and recover from broken vtd configurations v6 Greg KH
2010-08-12  0:05 ` [32/67] Fix sget() race with failing mount Greg KH
2010-08-12  0:05 ` [33/67] blkdev: cgroup whitelist permission fix Greg KH
2010-08-12  0:05 ` [34/67] eCryptfs: Handle ioctl calls with unlocked and compat functions Greg KH
2010-08-12  0:05 ` [35/67] ecryptfs: release reference to lower mount if interpose fails Greg KH
2010-08-12  0:05 ` [36/67] fs/ecryptfs/file.c: introduce missing free Greg KH
2010-08-12  0:05 ` [37/67] drbd: Initialize all members of sync_conf to their defaults [Bugz 315] Greg KH
2010-08-12  0:05 ` [38/67] drbd: Disable delay probes for the upcomming release Greg KH
2010-08-12  3:15   ` [Stable-review] " Ben Hutchings
2010-08-12 10:24     ` Lars Ellenberg
2010-08-12  0:05 ` [39/67] bio, fs: update RWA_MASK, READA and SWRITE to match the corresponding BIO_RW_* bits Greg KH
2010-08-12  0:05 ` [40/67] signalfd: fill in ssi_int for posix timers and message queues Greg KH
2010-08-12  0:05 ` [41/67] [ARM] pxa/cm-x300: fix ffuart registration Greg KH
2010-08-12  0:05 ` [42/67] smsc911x: Add spinlocks around registers access Greg KH
2010-08-12  0:05 ` [43/67] ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID Greg KH
2010-08-12  0:05 ` [44/67] ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h> Greg KH
2010-08-12  0:06 ` [45/67] USB: musb: use correct register widths in register dumps Greg KH
2010-08-12  0:06 ` [46/67] USB: EHCI: remove PCI assumption Greg KH
2010-08-12  0:06 ` [47/67] USB: resizing usbmon binary interface buffer causes protection faults Greg KH
2010-08-12  0:06 ` [48/67] USB delay init quirk for logitech Harmony 700-series devices Greg KH
2010-08-12  0:06 ` [49/67] USB: serial: enabling support for Segway RMP in ftdi_sio Greg KH
2010-08-12  0:06 ` [50/67] USB: option: Huawei ETS 1220 support added Greg KH
2010-08-12  0:06 ` [51/67] USB: option: add huawei k3765 k4505 devices to work properly Greg KH
2010-08-12  0:06 ` [52/67] USB: ftdi_sio: device id for Navitator Greg KH
2010-08-12  0:06 ` [53/67] USB: cp210x: Add four new device IDs Greg KH
2010-08-12  0:06 ` [54/67] USB: usbtest: avoid to free coherent buffer in atomic context Greg KH
2010-08-12  0:06 ` [55/67] USB: fix thread-unsafe anchor utiliy routines Greg KH
2010-08-12  0:06 ` [56/67] USB: serial: fix stalled writes Greg KH
2010-08-12  0:06 ` [57/67] Bluetooth: Added support for controller shipped with iMac i5 Greg KH
2010-08-12  0:06 ` [58/67] sched: Revert nohz_ratelimit() for now Greg KH
2010-08-12  0:06 ` [59/67] mtd: mxc_nand: fix unbalanced enable for IRQ Greg KH
2010-08-12  0:06 ` [60/67] mtd: gen_nand: fix support for multiple chips Greg KH
2010-08-12  1:07   ` Marek Vasut
2010-08-12  0:06 ` [61/67] l2tp: fix export of header file for userspace Greg KH
2010-08-12  0:06 ` [62/67] jfs: dont allow os2 xattr namespace overlap with others Greg KH
2010-08-12  0:06 ` [63/67] net: Fix NETDEV_NOTIFY_PEERS to not conflict with NETDEV_BONDING_DESLAVE Greg KH
2010-08-12  0:06 ` [64/67] irq: Add new IRQ flag IRQF_NO_SUSPEND Greg KH
2010-08-12  0:06   ` Greg KH
2010-08-12  0:06 ` [65/67] xen: Do not suspend IPI IRQs Greg KH
2010-08-12  0:06   ` Greg KH
2010-08-12  0:06 ` [66/67] crypto: testmgr - add an option to disable cryptoalgos self-tests Greg KH
2010-08-12  0:06 ` [67/67] ext4: fix freeze deadlock under IO Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100812000614.210307305@clark.site \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=hpa@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prarit@redhat.com \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=suresh.b.siddha@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.