All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Don Zickus <dzickus@redhat.com>,
	Joe Mario <jmario@redhat.com>, Rik van Riel <riel@redhat.com>,
	Paul Turner <pjt@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: [ 052/110] sched/x86: Optimize switch_mm() for multi-threaded workloads
Date: Tue, 24 Sep 2013 17:14:50 -0700	[thread overview]
Message-ID: <20130925001329.046577656@linuxfoundation.org> (raw)
In-Reply-To: <20130925001323.387158698@linuxfoundation.org>

3.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Rik van Riel <riel@redhat.com>

commit 8f898fbbe5ee5e20a77c4074472a1fd088dc47d1 upstream.

Dick Fowles, Don Zickus and Joe Mario have been working on
improvements to perf, and noticed heavy cache line contention
on the mm_cpumask, running linpack on a 60 core / 120 thread
system.

The cause turned out to be unnecessary atomic accesses to the
mm_cpumask. When in lazy TLB mode, the CPU is only removed from
the mm_cpumask if there is a TLB flush event.

Most of the time, no such TLB flush happens, and the kernel
skips the TLB reload. It can also skip the atomic memory
set & test.

Here is a summary of Joe's test results:

 * The __schedule function dropped from 24% of all program cycles down
   to 5.5%.

 * The cacheline contention/hotness for accesses to that bitmask went
   from being the 1st/2nd hottest - down to the 84th hottest (0.3% of
   all shared misses which is now quite cold)

 * The average load latency for the bit-test-n-set instruction in
   __schedule dropped from 10k-15k cycles down to an average of 600 cycles.

 * The linpack program results improved from 133 GFlops to 144 GFlops.
   Peak GFlops rose from 133 to 153.

Reported-by: Don Zickus <dzickus@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Tested-by: Joe Mario <jmario@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Paul Turner <pjt@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130731221421.616d3d20@annuminas.surriel.com
[ Made the comments consistent around the modified code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/mmu_context.h |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -45,22 +45,28 @@ static inline void switch_mm(struct mm_s
 		/* Re-load page tables */
 		load_cr3(next->pgd);
 
-		/* stop flush ipis for the previous mm */
+		/* Stop flush ipis for the previous mm */
 		cpumask_clear_cpu(cpu, mm_cpumask(prev));
 
-		/*
-		 * load the LDT, if the LDT is different:
-		 */
+		/* Load the LDT, if the LDT is different: */
 		if (unlikely(prev->context.ldt != next->context.ldt))
 			load_LDT_nolock(&next->context);
 	}
 #ifdef CONFIG_SMP
-	else {
+	  else {
 		this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
 		BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);
 
-		if (!cpumask_test_and_set_cpu(cpu, mm_cpumask(next))) {
-			/* We were in lazy tlb mode and leave_mm disabled
+		if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
+			/*
+			 * On established mms, the mm_cpumask is only changed
+			 * from irq context, from ptep_clear_flush() while in
+			 * lazy tlb mode, and here. Irqs are blocked during
+			 * schedule, protecting us from simultaneous changes.
+			 */
+			cpumask_set_cpu(cpu, mm_cpumask(next));
+			/*
+			 * We were in lazy tlb mode and leave_mm disabled
 			 * tlb flush IPI delivery. We must reload CR3
 			 * to make sure to use no freed page tables.
 			 */



  parent reply	other threads:[~2013-09-25  1:08 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-25  0:13 [ 000/110] 3.10.13-stable review Greg Kroah-Hartman
2013-09-25  0:13 ` [ 001/110] SCSI: Allow MPT Fusion SAS 3.0 driver to be built into the kernel Greg Kroah-Hartman
2013-09-25  0:14 ` [ 002/110] UBI: Fix PEB leak in wear_leveling_worker() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 003/110] SCSI: sd: Fix potential out-of-bounds access Greg Kroah-Hartman
2013-09-25  0:14 ` [ 004/110] crypto: api - Fix race condition in larval lookup Greg Kroah-Hartman
2013-09-25  0:14   ` Greg Kroah-Hartman
2013-09-25  0:14 ` [ 005/110] powerpc: Handle unaligned ldbrx/stdbrx Greg Kroah-Hartman
2013-09-25  0:14 ` [ 006/110] powerpc: Default arch idle could cede processor on pseries Greg Kroah-Hartman
2013-09-25  0:14 ` [ 007/110] xen-gnt: prevent adding duplicate gnt callbacks Greg Kroah-Hartman
2013-09-25  0:14 ` [ 008/110] ARM: xen: only set pm function ptrs for Xen guests Greg Kroah-Hartman
2013-09-25  0:14 ` [ 009/110] cpuidle: coupled: abort idle if pokes are pending Greg Kroah-Hartman
2013-09-25  0:14 ` [ 010/110] cpuidle: coupled: fix race condition between pokes and safe state Greg Kroah-Hartman
2013-09-25  0:14 ` [ 011/110] ARM: dts: at91: cpus/cpu node dts updates Greg Kroah-Hartman
2013-09-25  0:14 ` [ 012/110] ARM: dts: sunxi: cpus/cpu nodes " Greg Kroah-Hartman
2013-09-25  0:14 ` [ 013/110] ARM: dts: add missing cpu #address-cell values Greg Kroah-Hartman
2013-09-25  0:14 ` [ 014/110] ARM: KVM: Fix 64-bit coprocessor handling Greg Kroah-Hartman
2013-09-25  0:14 ` [ 015/110] arm64: perf: fix group validation when using enable_on_exec Greg Kroah-Hartman
2013-09-25  0:14 ` [ 016/110] arm64: perf: fix ARMv8 EVTYPE_MASK to include NSH bit Greg Kroah-Hartman
2013-09-25  0:14 ` [ 017/110] ARM: PCI: versatile: Fix map_irq function to match hardware Greg Kroah-Hartman
2013-09-25  0:14 ` [ 018/110] ARM: PCI: versatile: Fix PCI I/O Greg Kroah-Hartman
2013-09-25  0:14 ` [ 019/110] ARM: PCI: versatile: Fix SMAP register offsets Greg Kroah-Hartman
2013-09-25  0:14 ` [ 020/110] KVM: PPC: Book3S: Fix compile error in XICS emulation Greg Kroah-Hartman
2013-09-25  0:14 ` [ 021/110] xhci-plat: Dont enable legacy PCI interrupts Greg Kroah-Hartman
2013-09-25  0:14 ` [ 022/110] usb: xhci: Disable runtime PM suspend for quirky controllers Greg Kroah-Hartman
2013-09-25  0:14 ` [ 023/110] usb: dwc3: gadget: dont request IRQs in atomic Greg Kroah-Hartman
2013-09-25  0:14 ` [ 024/110] tty: disassociate_ctty() sends the extra SIGCONT Greg Kroah-Hartman
2013-09-25  0:14 ` [ 025/110] cifs: ensure that srv_mutex is held when dealing with ssocket pointer Greg Kroah-Hartman
2013-09-25  0:14 ` [ 026/110] CIFS: Fix a memory leak when a lease break comes Greg Kroah-Hartman
2013-09-25  0:14 ` [ 027/110] CIFS: Fix missing lease break Greg Kroah-Hartman
2013-09-25  0:14 ` [ 028/110] USB: OHCI: Allow runtime PM without system sleep Greg Kroah-Hartman
2013-09-25  0:14 ` [ 029/110] net: Check the correct namespace when spoofing pid over SCM_RIGHTS Greg Kroah-Hartman
2013-09-25  0:14 ` [ 030/110] staging: comedi: dt282x: dt282x_ai_insn_read() always fails Greg Kroah-Hartman
2013-09-25  0:14 ` [ 031/110] iio: mxs-lradc: Fix misuse of iio->trig Greg Kroah-Hartman
2013-09-25  0:14 ` [ 032/110] iio: mxs-lradc: Remove useless check in read_raw Greg Kroah-Hartman
2013-09-25  0:14 ` [ 033/110] ACPI / LPSS: dont crash if a device has no MMIO resources Greg Kroah-Hartman
2013-09-25  0:14 ` [ 034/110] USB: mos7720: use GFP_ATOMIC under spinlock Greg Kroah-Hartman
2013-09-25  0:14 ` [ 035/110] USB: mos7720: fix big-endian control requests Greg Kroah-Hartman
2013-09-25  0:14 ` [ 036/110] usb: ehci-mxc: check for pdata before dereferencing Greg Kroah-Hartman
2013-09-25  0:14 ` [ 037/110] USB: cdc-wdm: fix race between interrupt handler and tasklet Greg Kroah-Hartman
2013-09-25  0:14 ` [ 038/110] usb: gadget: uvc: Fix error handling in uvc_queue_buffer() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 039/110] usb: Dont fail port power resume on device disconnect Greg Kroah-Hartman
2013-09-25  0:14 ` [ 040/110] USB: fix build error when CONFIG_PM_SLEEP isnt enabled Greg Kroah-Hartman
2013-09-25  0:14 ` [ 041/110] usb: config->desc.bLength may not exceed amount of data returned by the device Greg Kroah-Hartman
2013-09-25  0:14 ` [ 042/110] USB: handle LPM errors during device suspend correctly Greg Kroah-Hartman
2013-09-25  0:14 ` [ 043/110] usb: dont check pm qos NO_POWER_OFF flag in usb_port_suspend() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 044/110] rculist: list_first_or_null_rcu() should use list_entry_rcu() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 045/110] ASoC: wm8960: Fix PLL register writes Greg Kroah-Hartman
2013-09-25  0:14 ` [ 046/110] ASoC: mc13783: add spi errata fix Greg Kroah-Hartman
2013-09-25  0:14 ` [ 047/110] x86, smap: Handle csum_partial_copy_*_user() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 048/110] Introduce [compat_]save_altstack_ex() to unbreak x86 SMAP Greg Kroah-Hartman
2013-09-25  0:14 ` [ 049/110] pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models Greg Kroah-Hartman
2013-09-25  0:14 ` [ 050/110] x86, amd_nb: Clarify F15h, model 30h GART and L3 support Greg Kroah-Hartman
2013-09-25  0:14 ` [ 051/110] x86/mce: Pay no attention to F bit in MCACOD when parsing UC errors Greg Kroah-Hartman
2013-09-25  0:14 ` Greg Kroah-Hartman [this message]
2013-09-25  0:14 ` [ 053/110] ALSA: hda - Re-setup HDMI pin and audio infoframe on stream switches Greg Kroah-Hartman
2013-09-25  0:14 ` [ 054/110] ALSA: hda - hdmi: Fallback to ALSA allocation when selecting CA Greg Kroah-Hartman
2013-09-25  0:14 ` [ 055/110] ALSA: hda - Add Toshiba Satellite C870 to MSI blacklist Greg Kroah-Hartman
2013-09-25  0:14 ` [ 056/110] pinctrl: at91: fix get_pullup/down function return Greg Kroah-Hartman
2013-09-25  0:14 ` [ 057/110] ext4: simplify truncation code in ext4_setattr() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 058/110] brcmsmac: Fix WARNING caused by lack of calls to dma_mapping_error() Greg Kroah-Hartman
2013-09-25  0:14 ` [ 059/110] ath9k: always clear ps filter bit on new assoc Greg Kroah-Hartman
2013-09-25  0:14 ` [ 060/110] ath9k: fix rx descriptor related race condition Greg Kroah-Hartman
2013-09-25  0:14 ` [ 061/110] ath9k: avoid accessing MRC registers on single-chain devices Greg Kroah-Hartman
2013-09-25  0:15 ` [ 062/110] HID: Correct the USB IDs for the new Macbook Air 6 Greg Kroah-Hartman
2013-09-25  0:15 ` [ 063/110] HID: pantherlord: validate output report details Greg Kroah-Hartman
2013-09-25  0:15 ` [ 064/110] HID: Fix Speedlink VAD Cezanne support for some devices Greg Kroah-Hartman
2013-09-25  0:15 ` [ 065/110] HID: sensor-hub: validate feature report details Greg Kroah-Hartman
2013-09-25  0:15 ` [ 066/110] HID: validate HID report id size Greg Kroah-Hartman
2013-09-25  0:15 ` [ 067/110] HID: picolcd_core: validate output report details Greg Kroah-Hartman
2013-09-25  0:15 ` [ 068/110] HID: ntrig: validate feature " Greg Kroah-Hartman
2013-09-25  0:15 ` [ 069/110] HID: picolcd: Prevent NULL pointer dereference on _remove() Greg Kroah-Hartman
2013-09-25  0:15 ` [ 070/110] HID: battery: dont do DMA from stack Greg Kroah-Hartman
2013-09-25  0:15 ` [ 071/110] HID: hidraw: correctly deallocate memory on device disconnect Greg Kroah-Hartman
2013-09-25  0:15 ` [ 072/110] HID: check for NULL field when setting values Greg Kroah-Hartman
2013-09-25  0:15 ` [ 073/110] HID: usbhid: quirk for N-Trig DuoSense Touch Screen Greg Kroah-Hartman
2013-09-25  0:15 ` [ 074/110] media: exynos-gsc: Register v4l2 device Greg Kroah-Hartman
2013-09-25  0:15 ` [ 075/110] media: exynos4-is: Fix entity unregistration on error path Greg Kroah-Hartman
2013-09-25  0:15 ` [ 076/110] media: s5p-g2d: Fix registration failure Greg Kroah-Hartman
2013-09-25  0:15 ` [ 077/110] media: DocBook: upgrade media_api DocBook version to 4.2 Greg Kroah-Hartman
2013-09-25  0:15 ` [ 078/110] media: hdpvr: fix iteration over uninitialized lists in hdpvr_probe() Greg Kroah-Hartman
2013-09-25  0:15 ` [ 079/110] media: v4l2: added missing mutex.h include to v4l2-ctrls.h Greg Kroah-Hartman
2013-09-25  0:15 ` [ 080/110] media: media: coda: Fix DT driver data pointer for i.MX27 Greg Kroah-Hartman
2013-09-25  0:15 ` [ 081/110] media: mb86a20s: Fix TS parallel mode Greg Kroah-Hartman
2013-09-25  0:15 ` [ 082/110] media: siano: fix divide error on 0 counters Greg Kroah-Hartman
2013-09-25  0:15   ` Greg Kroah-Hartman
2013-09-25  0:15 ` [ 083/110] Btrfs: dont allow the replace procedure on read only filesystems Greg Kroah-Hartman
2013-09-25  0:15 ` [ 084/110] uprobes: Fix utask->depth accounting in handle_trampoline() Greg Kroah-Hartman
2013-09-25  0:15 ` [ 085/110] leds: wm831x-status: Request a REG resource Greg Kroah-Hartman
2013-09-25  0:15 ` [ 086/110] MIPS: ath79: Fix ar933x watchdog clock Greg Kroah-Hartman
2013-09-25  0:15 ` [ 087/110] target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out Greg Kroah-Hartman
2013-09-25  0:15 ` [ 088/110] intel-iommu: Fix leaks in pagetable freeing Greg Kroah-Hartman
2013-09-25  0:15 ` [ 089/110] pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup Greg Kroah-Hartman
2013-09-25  0:15 ` [ 090/110] pidns: fix vfork() after unshare(CLONE_NEWPID) Greg Kroah-Hartman
2013-09-25  0:15 ` [ 091/110] ocfs2: fix the end cluster offset of FIEMAP Greg Kroah-Hartman
2013-09-25  0:15 ` [ 092/110] memcg: fix multiple large threshold notifications Greg Kroah-Hartman
2013-09-25  0:15 ` [ 093/110] mm/huge_memory.c: fix potential NULL pointer dereference Greg Kroah-Hartman
2013-09-25  0:15 ` [ 094/110] proc: Restrict mounting the proc filesystem Greg Kroah-Hartman
2013-09-25  0:15 ` [ 095/110] isofs: Refuse RW mount of the filesystem instead of making it RO Greg Kroah-Hartman
2013-09-25  0:15 ` [ 096/110] amd64_edac: Fix single-channel setups Greg Kroah-Hartman
2013-09-25  0:15 ` [ 097/110] drm/edid: add quirk for Medion MD30217PG Greg Kroah-Hartman
2013-09-25  0:15 ` [ 098/110] um: Implement probe_kernel_read() Greg Kroah-Hartman
2013-09-25  0:15 ` [ 099/110] libceph: unregister request in __map_request failed and nofail == false Greg Kroah-Hartman
2013-09-25  0:15 ` [ 100/110] libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc Greg Kroah-Hartman
2013-09-25  0:15 ` [ 101/110] ceph: Dont forget the up_read(&osdc->map_sem) if met error Greg Kroah-Hartman
2013-09-25  0:15 ` [ 102/110] rbd: fix I/O error propagation for reads Greg Kroah-Hartman
2013-09-25  0:15 ` [ 103/110] mmc: tmio_mmc_dma: fix PIO fallback on SDHI Greg Kroah-Hartman
2013-09-25  0:15 ` [ 104/110] of: Fix missing memory initialization on FDT unflattening Greg Kroah-Hartman
2013-09-25  0:15 ` [ 105/110] mtd: nand: fix NAND_BUSWIDTH_AUTO for x16 devices Greg Kroah-Hartman
2013-09-25  0:15 ` [ 106/110] clk: wm831x: Initialise wm831x pointer on init Greg Kroah-Hartman
2013-09-25  0:15 ` [ 107/110] fuse: postpone end_page_writeback() in fuse_writepage_locked() Greg Kroah-Hartman
2013-09-25  0:15 ` [ 108/110] fuse: invalidate inode attributes on xattr modification Greg Kroah-Hartman
2013-09-25  0:15 ` [ 109/110] fuse: hotfix truncate_pagecache() issue Greg Kroah-Hartman
2013-09-25  0:15 ` [ 110/110] fuse: readdir: check for slash in names Greg Kroah-Hartman
2013-09-25  4:15 ` [ 000/110] 3.10.13-stable review Guenter Roeck
2013-09-26  1:09   ` Greg Kroah-Hartman
2013-09-26  2:25 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130925001329.046577656@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dzickus@redhat.com \
    --cc=jmario@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.