stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Pavel Tatashin <pasha.tatashin@oracle.com>,
	Bob Picco <bob.picco@oracle.com>,
	Steven Sistare <steven.sistare@oracle.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.4 16/90] sparc64: new context wrap
Date: Mon, 12 Jun 2017 17:25:22 +0200	[thread overview]
Message-ID: <20170612152557.062895228@linuxfoundation.org> (raw)
In-Reply-To: <20170612152556.133240249@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Pavel Tatashin <pasha.tatashin@oracle.com>


[ Upstream commit a0582f26ec9dfd5360ea2f35dd9a1b026f8adda0 ]

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/sparc/mm/init_64.c |   81 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 54 insertions(+), 27 deletions(-)

--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -662,6 +662,53 @@ unsigned long tlb_context_cache = CTX_FI
 DECLARE_BITMAP(mmu_context_bmap, MAX_CTX_NR);
 DEFINE_PER_CPU(struct mm_struct *, per_cpu_secondary_mm) = {0};
 
+static void mmu_context_wrap(void)
+{
+	unsigned long old_ver = tlb_context_cache & CTX_VERSION_MASK;
+	unsigned long new_ver, new_ctx, old_ctx;
+	struct mm_struct *mm;
+	int cpu;
+
+	bitmap_zero(mmu_context_bmap, 1 << CTX_NR_BITS);
+
+	/* Reserve kernel context */
+	set_bit(0, mmu_context_bmap);
+
+	new_ver = (tlb_context_cache & CTX_VERSION_MASK) + CTX_FIRST_VERSION;
+	if (unlikely(new_ver == 0))
+		new_ver = CTX_FIRST_VERSION;
+	tlb_context_cache = new_ver;
+
+	/*
+	 * Make sure that any new mm that are added into per_cpu_secondary_mm,
+	 * are going to go through get_new_mmu_context() path.
+	 */
+	mb();
+
+	/*
+	 * Updated versions to current on those CPUs that had valid secondary
+	 * contexts
+	 */
+	for_each_online_cpu(cpu) {
+		/*
+		 * If a new mm is stored after we took this mm from the array,
+		 * it will go into get_new_mmu_context() path, because we
+		 * already bumped the version in tlb_context_cache.
+		 */
+		mm = per_cpu(per_cpu_secondary_mm, cpu);
+
+		if (unlikely(!mm || mm == &init_mm))
+			continue;
+
+		old_ctx = mm->context.sparc64_ctx_val;
+		if (likely((old_ctx & CTX_VERSION_MASK) == old_ver)) {
+			new_ctx = (old_ctx & ~CTX_VERSION_MASK) | new_ver;
+			set_bit(new_ctx & CTX_NR_MASK, mmu_context_bmap);
+			mm->context.sparc64_ctx_val = new_ctx;
+		}
+	}
+}
+
 /* Caller does TLB context flushing on local CPU if necessary.
  * The caller also ensures that CTX_VALID(mm->context) is false.
  *
@@ -676,50 +723,30 @@ void get_new_mmu_context(struct mm_struc
 {
 	unsigned long ctx, new_ctx;
 	unsigned long orig_pgsz_bits;
-	int new_version;
 
 	spin_lock(&ctx_alloc_lock);
+retry:
+	/* wrap might have happened, test again if our context became valid */
+	if (unlikely(CTX_VALID(mm->context)))
+		goto out;
 	orig_pgsz_bits = (mm->context.sparc64_ctx_val & CTX_PGSZ_MASK);
 	ctx = (tlb_context_cache + 1) & CTX_NR_MASK;
 	new_ctx = find_next_zero_bit(mmu_context_bmap, 1 << CTX_NR_BITS, ctx);
-	new_version = 0;
 	if (new_ctx >= (1 << CTX_NR_BITS)) {
 		new_ctx = find_next_zero_bit(mmu_context_bmap, ctx, 1);
 		if (new_ctx >= ctx) {
-			int i;
-			new_ctx = (tlb_context_cache & CTX_VERSION_MASK) +
-				CTX_FIRST_VERSION + 1;
-			if (new_ctx == 1)
-				new_ctx = CTX_FIRST_VERSION + 1;
-
-			/* Don't call memset, for 16 entries that's just
-			 * plain silly...
-			 */
-			mmu_context_bmap[0] = 3;
-			mmu_context_bmap[1] = 0;
-			mmu_context_bmap[2] = 0;
-			mmu_context_bmap[3] = 0;
-			for (i = 4; i < CTX_BMAP_SLOTS; i += 4) {
-				mmu_context_bmap[i + 0] = 0;
-				mmu_context_bmap[i + 1] = 0;
-				mmu_context_bmap[i + 2] = 0;
-				mmu_context_bmap[i + 3] = 0;
-			}
-			new_version = 1;
-			goto out;
+			mmu_context_wrap();
+			goto retry;
 		}
 	}
 	if (mm->context.sparc64_ctx_val)
 		cpumask_clear(mm_cpumask(mm));
 	mmu_context_bmap[new_ctx>>6] |= (1UL << (new_ctx & 63));
 	new_ctx |= (tlb_context_cache & CTX_VERSION_MASK);
-out:
 	tlb_context_cache = new_ctx;
 	mm->context.sparc64_ctx_val = new_ctx | orig_pgsz_bits;
+out:
 	spin_unlock(&ctx_alloc_lock);
-
-	if (unlikely(new_version))
-		smp_new_mmu_context_version();
 }
 
 static int numa_enabled = 1;

  parent reply	other threads:[~2017-06-12 15:38 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-12 15:25 [PATCH 4.4 00/90] 4.4.72-stable review Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 01/90] bnx2x: Fix Multi-Cos Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 02/90] ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 03/90] cxgb4: avoid enabling napi twice to the same queue Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 04/90] tcp: disallow cwnd undo when switching congestion control Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 05/90] vxlan: fix use-after-free on deletion Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 06/90] ipv6: Fix leak in ipv6_gso_segment() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 07/90] net: ping: do not abuse udp_poll() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 08/90] net: ethoc: enable NAPI before poll may be scheduled Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 09/90] net: bridge: start hello timer only if device is up Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 10/90] sparc64: mm: fix copy_tsb to correctly copy huge page TSBs Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 11/90] sparc: Machine description indices can vary Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 12/90] sparc64: reset mm cpumask after wrap Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 13/90] sparc64: combine activate_mm and switch_mm Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 14/90] sparc64: redefine first version Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 15/90] sparc64: add per-cpu mm of secondary contexts Greg Kroah-Hartman
2017-06-12 15:25 ` Greg Kroah-Hartman [this message]
2017-06-12 15:25 ` [PATCH 4.4 17/90] sparc64: delete old wrap code Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 18/90] arch/sparc: support NR_CPUS = 4096 Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 19/90] serial: ifx6x60: fix use-after-free on module unload Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 20/90] ptrace: Properly initialize ptracer_cred on fork Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 21/90] KEYS: fix dereferencing NULL payload with nonzero length Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 22/90] KEYS: fix freeing uninitialized memory in key_update() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 23/90] crypto: gcm - wait for crypto op not signal safe Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 25/90] nfsd4: fix null dereference on replay Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 26/90] nfsd: Fix up the "supattr_exclcreat" attributes Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 29/90] arm: KVM: Allow unaligned accesses at HYP Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 31/90] dmaengine: usb-dmac: Fix DMAOR AE bit definition Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 32/90] dmaengine: ep93xx: Always start from BASE0 Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 33/90] xen/privcmd: Support correctly 64KB page granularity when mapping memory Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 34/90] xen-netfront: do not cast grant table reference to signed short Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 35/90] xen-netfront: cast grant table reference first to type int Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 36/90] ext4: fix SEEK_HOLE Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 37/90] ext4: keep existing extra fields when inode expands Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 38/90] ext4: fix fdatasync(2) after extent manipulation operations Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 39/90] usb: gadget: f_mass_storage: Serialize wake and sleep execution Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 40/90] usb: chipidea: udc: fix NULL pointer dereference if udc_start failed Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 41/90] usb: chipidea: debug: check before accessing ci_role Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 42/90] staging/lustre/lov: remove set_fs() call from lov_getstripe() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 43/90] iio: light: ltr501 Fix interchanged als/ps register field Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 44/90] iio: proximity: as3935: fix AS3935_INT mask Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 45/90] drivers: char: random: add get_random_long() Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 46/90] random: properly align get_random_int_hash Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 47/90] stackprotector: Increase the per-task stack canarys random range from 32 bits to 64 bits on 64-bit platforms Greg Kroah-Hartman
2017-06-12 15:41   ` [kernel-hardening] " Jann Horn
2017-06-12 15:45     ` Jann Horn
2017-06-12 15:25 ` [PATCH 4.4 48/90] cpufreq: cpufreq_register_driver() should return -ENODEV if init fails Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 49/90] target: Re-add check to reject control WRITEs with overflow data Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 50/90] drm/msm: Expose our reservation object when exporting a dmabuf Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 51/90] Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 52/90] cpuset: consider dying css as offline Greg Kroah-Hartman
2017-06-12 15:25 ` [PATCH 4.4 53/90] fs: add i_blocksize() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 54/90] ufs: restore proper tail allocation Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 55/90] fix ufs_isblockset() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 56/90] ufs: restore maintaining ->i_blocks Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 57/90] ufs: set correct ->s_maxsize Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 58/90] ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 59/90] ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 60/90] cxl: Fix error path on bad ioctl Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 61/90] btrfs: use correct types for page indices in btrfs_page_exists_in_range Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 62/90] btrfs: fix memory leak in update_space_info failure path Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 63/90] KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 64/90] scsi: qla2xxx: dont disable a not previously enabled PCI device Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 65/90] powerpc/eeh: Avoid use after free in eeh_handle_special_event() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 66/90] powerpc/numa: Fix percpu allocations to be NUMA aware Greg Kroah-Hartman
2017-07-28 13:53   ` Michal Hocko
2017-07-28 22:41     ` Greg Kroah-Hartman
2017-07-31  6:41       ` Michal Hocko
2017-08-03 19:29         ` Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 67/90] powerpc/hotplug-mem: Fix missing endian conversion of aa_index Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 68/90] perf/core: Drop kernel samples even though :u is specified Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 69/90] drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 70/90] drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 71/90] drm/vmwgfx: Make sure backup_handle is always valid Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 72/90] drm/nouveau/tmr: fully separate alarm execution/pending lists Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 73/90] ALSA: timer: Fix race between read and ioctl Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 74/90] ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 75/90] ASoC: Fix use-after-free at card unregistration Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 76/90] drivers: char: mem: Fix wraparound check to allow mappings up to the end Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 77/90] tty: Drop krefs for interrupted tty lock Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 78/90] serial: sh-sci: Fix panic when serial console and DMA are enabled Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 79/90] net: better skb->sender_cpu and skb->napi_id cohabitation Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 80/90] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 81/90] NFS: Ensure we revalidate attributes before using execute_ok() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 82/90] NFSv4: Dont perform cached access checks before weve OPENed the file Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 83/90] Make __xfs_xattr_put_listen preperly report errors Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 84/90] arm64: hw_breakpoint: fix watchpoint matching for tagged pointers Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 85/90] arm64: entry: improve data abort handling of " Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 86/90] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 87/90] tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline() Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 88/90] usercopy: Adjust tests to deal with SMAP/PAN Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 89/90] arm64: armv8_deprecated: ensure extension of addr Greg Kroah-Hartman
2017-06-12 15:26 ` [PATCH 4.4 90/90] arm64: ensure extension of smp_store_release value Greg Kroah-Hartman
2017-06-12 21:53 ` [PATCH 4.4 00/90] 4.4.72-stable review Guenter Roeck
2017-06-13  0:45 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170612152557.062895228@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bob.picco@oracle.com \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pasha.tatashin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).